The institutional review board for human investigations at Massachusetts General Hospital (MGH), Boston, US. approved the study protocol with removal of all patient identifiers from the images, and waived the requirement for informed consent, in accordance with the retrospective design of this study.
Data Description
1) Segmentation. Since anatomic segmentation of lungs is independent of radiographic abnormalities, for training segmentation models, two public datasets; RSNA pneumonia detection challenge dataset 19 and JSRT dataset 20, were used. RSNA pneumonia detection challenge dataset consists of 568 CXRs from tuberculosis chest dataset in department of health and human services (HHS) of Montgomery county and JSRT dataset consists of 257 CXRs from JSRT dataset in Japanese society of radiological technology (JSRT) in cooperation with the Japanese Radiological Society, were used to train segmentation models.
For evaluation of the segmentation model performance, 200 CXRs of 51 patients with COVID-19 pneumonia (Testing set in Table 2) were obtained from three hospitals in South Korea including Kyungpook National University Hospital, Daegu Catholic University Hospital, and Yeungnam University Hospital.
Table 1
Conditions for training different segmentation models.
Model
|
Backbone
|
Pre-
proc.
|
Augmentation
|
Model 1
|
Efficient0
|
N/A
|
DA
|
Model 2
|
Efficient0
|
HE
|
DA
|
Model 3
|
Efficient0
|
HE
|
DA + Gaussian noise (0.5) + gamma correction (0.5) + grid distortion (0.1) + elastic transform (0.1) + affine transform (0.1)
|
Model 4
|
Efficient7
|
HE
|
DA + Gaussian noise (0.5) + gamma correction (0.5)
|
Model 5
|
Efficient7
|
HE
|
DA + Gaussian noise (0.5) + gamma correction (0.5) + grid distortion (0.1) + elastic transform (0.1) + affine transform (0.1)
|
Abbreviations: HE, histogram equalization; DA, default augmentation (horizontal flip: 0.5, rotation: a range of ± 25°, random contrast: 0.1, random brightness 0.1, gamma correction: 0.1, Gaussian noise: 0.1, contrast limited adaptive histogram equalization 0.1). |
Table 2
Demographics of the dataset for carina and left hilum detection.
|
Training set
(n = 551)
|
Validation set
(n = 153)
|
Testing set
(n = 200)
|
Patient
|
124
|
42
|
51
|
Age
|
68.3 ± 14.8
|
59.5 ± 16.2
|
54.3 ± 18.4
|
Male
|
53 (42.7%)
|
16 (38.0%)
|
23 (54.7%)
|
RALE
|
9.9 ± 10.7
|
3.9 ± 6.7
|
4.2 ± 6.2
|
Death
|
43 (34.6%)
|
2 (4.7%)
|
4 (9.5%)
|
2) Detection. The carina and left hilum detection algorithms were trained on another 704 CXRs from 166 patients with confirmed COVID-19 pneumonia (Training and Validation set in Table 2) between February-May 2020, at the same hospitals in South Korea, including Kyungpook National University Hospital, Daegu Catholic University Hospital, and Yeungnam University Hospital. The positions of carina and left hilum, as landmarks to separate upper and lower lungs, were annotated under the supervision of a subspecialty chest radiologist with 13 years of clinical experience in thoracic imaging. An accurate and robust splitting position can be achieved by the consensus of the two positions. For each CXR, a bounding box was placed around the left hilum. The inferior margin of carina was also annotated with a point marker. A bounding box centered at the carina point was used for the training of carina detection algorithm.
3) Correlation. To further validate the proposed 4-region segmentation algorithm, each CXR was evaluated for its RALE score. The RALE score was evaluated by giving extent (0–4) and density (0–3) scores of pulmonary opacities in each region of the lung 12. For each region, the correlation between the mean intensity and the corresponding extent and density scores of pulmonary opacities was analyzed.
Segmentation model
U-net architecture 21 using skip connection was selected to train the segmentation models, which is the most widely used network structure for segmentation in the field of medical imaging. We trained five segmentation models with different conditions including backbones, pre-processing, and augmentation properties as shown in Table 1. EfficientNet v0 and v7 architecture 22 were used as the backbone network in the U-net to train the first to third segmentation models and the fourth and fifth segmentation models, respectively. Gaussian noise and gamma correction were adjusted to improve the robustness of the models to pixel noises from the portable devices. To train segmentation models robust to Anterior-Posterior (AP) CXRs that is not included in the public datasets, morphological transformation methods such as grid distortion, affine transform, and elastic transformation with different parameters were used as augmentation methods 23. Five binary masks were used to generate an ensemble mask based on the majority voting method. Technically, if a half of masks were predicted as a lung for a pixel, the pixel is labeled as a lung region.
In addition, post-processing steps were taken to refine the ensemble mask. All the holes were filled with the dilation operation and the isolated regions were eliminated.
The augmentation methods with different parameters were adjusted during training [5], i.e., real-time based training, in the five models. All models were trained with same hyper-parameters, such as Adam optimizer (learning rate: 0.0001), epochs (200), batch size (8) and same input size at 256×256. Best models were selected at the lowest loss on the validation dataset.
Detection model
We propose a novel and robust method to find a central point for segmentation of the whole region into four-regions such as right upper region (RUR), right lower region (RLR), low upper region (LUR), and left lower region (LLR). Although conventional RALE score described a horizontal line through the origin of the left upper lobe bronchus for the four-segments of lungs, it is difficult to see this point in most patients with COVID-19 with portable CXRs. As a surrogate, the left hilum is the closest landmark for dividing upper and lower regions. However, the left hilum is sometimes difficult to be detected in those patients with advanced disease or patient rotation. On the other hand, carina is clear under most circumstances, and its relative position to the left hilum is stable at approximately 2cm 24 above the left hilum vertically. Therefore, we also used carina to identify the central point for the horizontal lung segmentation into upper and lower regions.
RetinaNet 25 was used to train the detection model for the carina and the left hilum. The central point of prediction box is used as a reference horizontal level that divide the upper and lower lungs. Most of the time, we select the prediction box for the left hilum for dividing the lung into upper and lower regions. However, if the model confidence of the left hilum detection is lower or equal to 0.9, the prediction box for the carina would be used.
To train the robust detection model, augmentation methods 23 such as rotation, translation, shearing, scaling, pixel noise, different range of contrast, brightness, hue, and saturation were used. The best model was selected as the lowest total loss in the validation set (Validation set in Table 2). The model performance was validated in the testing set in Table 2.
Normalization
Intensity normalization is normally used as a pre-processing to reduce statistical distribution of the intensity among input CXRs. Different devices or setting parameters cause CXRs showing brightness differences as shown in Fig. 2. Density scores of Fig. 2(a)-(d) were confirmed at 0 while each showed quite different mean intensities of the lung at 39.8, 34.4, 16.6, and 13.2, respectively. To reduce this variation, intensity normalization was conducted. Pixels inside of the lung were normalized by subtracting their values with the mean intensities outside of the lung regions. The normalized pixels were averaged to obtain each representative value for each region to evaluate its correlation with extent and density scores of pulmonary opacities.
Correlation with RALE score
Extent (0–4) and density (0–3) scores of pulmonary opacities were manually assigned by an experienced radiologist for each region of the lung according to the guideline 12.
The extent and density scores of pulmonary opacities were correlated with the mean intensity corresponding to the same location divided by the proposed algorithm. To evaluate if there is a linear relationship between regional mean intensity and the RALE score, we used the subset of COVID-19 patients (Testing set in Table 2) with a RALE score larger than 0. Pearson correlation 26 was used to test the relationship.
Statistical Evaluation
Model performance comparisons for segmentation were conducted with anonymized dataset (three hospitals in South Korea) in terms of Dice score to select the best segmentation model. Then, we conducted pair-wise comparisons of Dice scores between the ensemble model and others to show significant difference (p < 0.05).
Experimental Environment
Experimental environments were on Ubuntu 16.04 with a Tesla V-100 GPU, CUDA 9.0/cuDNN 7.0 (NVIDIA Corporation), and Keras 2.0 deep learning platform.