In the present study, we propose a transfer learning approach using a 3D U-Net for segmenting pulmonary infiltrates associated with COVID-19 implemented on a single GPU with 11 GB VRAM. We used a transfer learning approach with an 18-layer 3D ResNet pretrained on a video classification dataset serving as encoder for the 3D U-Net, and obtained state-of-the-art results within comparably short training times using an in-house-developed library (github.com/kbressem/faimed3d).
There have been previous efforts to automatically segment pulmonary infiltrates using U-Nets, but few used fully three-dimensional models, while most studies applied a layer-by-layer approach. In our opinion, the metrics obtained from these two approaches are not comparable because the slice-wise approach may introduce selection bias into the data by excluding slices that do not show lung or infiltrates. For 3D models, the input volume shows the entire lung, including healthy and diseased lung tissue as well as portions of the neck and abdomen that do not contain lung tissue. Müller et al. proposed a 3D U-Net with an architecture similar to our model.10 Because of limited training data, they used 5-fold cross-validation during training and reported a mean Dice score of 0.761 on the 5 validation folds. The model of Müller et al. was trained for 130h (more than 10 times longer than the model presented in this work) on a GPU with twice as much VRAM (Nvidia Quadro P6000). However, since the models were evaluated on a proprietary dataset, the obtained Dice scores cannot be compared without reservations, as differences in segmentation ground-truth may exist.
Lessmann et al. developed CORADS-AI, a deep learning algorithm for predicting the CO-RADS grade on non- contrast CT images.15 CO-RADS (COVID-19 Reporting and Data System) is a categorical score between 1–5 that indicates the likelihood of pulmonary involvement, with a CO-RADS score of 1 corresponding to a very low probability of pulmonary involvement and a score of 5 representing a very high probability.16 Interestingly, the interrater agreement on CO-RADS is only moderate, with a Fleiss kappa value of 0.47. CO-RADS grading differs from manual segmentation of pulmonary infiltrates in patients with proven COVID-19 and the kappa values are therefore not transferable. Nevertheless, the question is whether there is also a significant interrater difference in segmentation and how this would affect model performance and comparability between studies. For the RICORD dataset and the dataset provided my Ma et al., each CT volume was annotated by multiple experts, including at least one board-certified radiologist, to reduce bias coming from poor interrater agreement. However, for the MosMed dataset the number of annotators per CT volume is not available.
Ma et al. also developed a data-efficient 3D U-Net model that achieved a mean Dice score of 0.642 in the 5-fold cross validation and a Dice score of 0.443 during interference on the MosMed dataset.8
The highest Dice score achieved with a 3D U-Net architecture was published by Pu et al. with a value of 0.81 for infiltration greater than 200 mm3 on a proprietary dataset [21]. It is important to note, however, that the measurement of Pu et al. differs from other published results as well as from ours because the Dice score is calculated at a per-lesion level and then averaged, rather than at a per-patient level.
Yan et al. proposed a novel adaption of the U-Net architecture to increase segmentation performance for COVID-19 [20]. Their COVID-SegNet achieved a Dice score of 0.726 on the independent hold-out dataset. To achieve this, they used a proprietary dataset of 861 patients (8 times larger than the RICORD dataset and 40 times larger than the data from Ma et al.) and trained their model on six Nvidia Titan RTXs with 24 GB VRAM each.
By comparison, the model developed in this study achieved a higher Dice score than Ma et al. and had substantially shorter training times and lower hardware requirements than previously published studies. However, this comparison should be taken with caution because the datasets, training methods and calculation of metrics differed. Nonetheless, this study demonstrates the added benefit of using a pre-trained encoder for 3D U-Nets, as one can quickly achieve state-of-the-art results with lower hardware requirements and shorter training times. Transfer learning may help to provide better access and use of 3D segmentation models for the diagnostic community and for researches without access to high performance computing clusters.