Patient characteristics
A total of 1,351 patients (789 moderate and 562 severe patients) were enrolled in the retrospective study from three hospitals (1,183 patients from HSH, 36 patients from MCH and 132 from TTH). The data of 1,183 patients from HSH were formed as training set (946 patients) and validation set (237 patients), including 663 patients who suffered from moderate COVID-19 and 520 who suffered from severe COVID-19. Among all the data, 687 Patients were diagnosed as moderate pneumonia at first admission and 24 patients among them changed to severe during the stay, 496 patients were diagnosed as severe at first admission but only one severe patient changed to moderate.
The patient characteristics in training and validation sets are listed in Table 1. No significant differences were observed between the training and validation set in age (P = 0.273) and sex (P = 0.694). In addition, lnterleukin 6 (IL-6) levels, lymphocyte count (L), C-reactive protein (CRP), procalcitonin (PCT), D dimer (DD), glutamic oxaloacetic transaminase (AST), B-type natriuretic peptide (BNP), lactate dehydrogenase (LDH), creatine kinase (CK), and creatine kinase-MB (CK-MB) differed significantly between moderate and severe pneumonia sets both in training and validation sets (P < 0.05).
Table 1
Clinical characteristics of patients in training and validation set (n=1183).
Variable
|
|
Training set (n=946)
|
|
|
Validation set (n=237)
|
|
|
Moderate pneumonia
|
Severe pneumonia
|
P
|
Moderate pneumonia
|
Severe pneumonia
|
|
P
|
P
|
Age (yr,mean±SD )
|
57.393±13.753
|
63.885±12.189
|
0.000
|
57.765±11.862
|
62.971±13.094
|
|
0.263
|
0.273
|
Sex n(%)
|
|
|
0.213
|
|
|
|
0.069
|
0.694
|
|
|
|
|
|
|
^
|
|
|
Men
|
266(50.1%)
|
226(54.3%)
|
|
64(48.5%)
|
54(51.9%)
|
|
|
|
women
|
265(49.9%)
|
190(45.7%)
|
|
68(51.5%)
|
50(48.1%)
|
|
|
|
IL-6
|
3.051±4.692
|
14.400±97.389
|
0.000
|
2.865±3.586
|
13.323±49.298
|
|
0.000
|
0.768
|
WBC(109/L)
|
5.825±1.808
|
6.431±2.535
|
0.000
|
6.191±2.188
|
6.355±2.294
|
|
0.431
|
1.359
|
L(109/L)
|
1.607±0.604
|
1.368±0.667
|
0.064
|
1.577±0.544
|
1.350±0.634
|
|
0.040
|
0.568
|
N(109/L)
|
3.613±1.574
|
4.437±2.479
|
0.000
|
3.980±1.970
|
4.370±2.225
|
|
0.196
|
0.144
|
HB(g/L)
|
125.254±16.705
|
117.510±23.988
|
0.000
|
123.720±16.022
|
117.872±21.662
|
|
0.100
|
0.670
|
RBC(1012/L)
|
4.059±0.535
|
3.874±0.594
|
0.335
|
3.992±0.540
|
3.844±0.573
|
|
0.587
|
0.408
|
PLT(109/L)
|
235.919±76.644
|
232.786±80.199
|
0.422
|
247.333±81.926
|
227.760±98.226
|
|
0.601
|
2.836
|
CRP(µg/L)
|
3.604±3.608
|
5.467±4.505
|
0.000
|
3.367±3.494
|
5.463±4.059
|
|
0.000
|
0.259
|
PCT(µg/L)
|
1.794±2.615
|
1.456±2.409
|
0.000
|
1.720±2.588
|
1.351±2.333
|
|
0.006
|
1.331
|
PT(s)
|
11.847±2.693
|
12.565±4.911
|
0.474
|
11.934±2.788
|
12.557±3.264
|
|
0.259
|
0.147
|
TT(s)
|
13.777±3.584
|
14.802±5.185
|
0.054
|
13.987±3.627
|
14.444±3.617
|
|
0.514
|
0.125
|
FIB (g/L)
|
3.412±1.068
|
3.487±0.961
|
0.084
|
3.362±1.056
|
3.448±1.035
|
|
0.008
|
0.048
|
DD (mg/L)
|
1.285±1.922
|
1.973±3.319
|
0.000
|
1.360±2.109
|
2.290±2.919
|
|
0.017
|
0.743
|
BS (mmol/L)
|
5.455±2.029
|
6.063±2.736
|
0.000
|
5.451±2.137
|
5.990±2.820
|
|
0.126
|
0.480
|
ALT (u/L)
|
32.753±34.282
|
32.750±29.298
|
0.975
|
30.552±31.704
|
41.450±52.035
|
|
0.048
|
0.069
|
AST (u/L)
|
23.311±13.339
|
27.209±28.423
|
0.000
|
21.962±19.107
|
29.462±20.117
|
|
0.022
|
0.819
|
TP (g/L)
|
65.204±8.089
|
63.332±9.042
|
0.052
|
64.859±7.912
|
64.245±6.695
|
|
0.529
|
0.024
|
ALB (g/L)
|
37.910±4.993
|
35.726±5.433
|
0.099
|
37.629±4.809
|
36.234±4.356
|
|
0.398
|
0.998
|
TBil (µmol/L)
|
10.337±6.448
|
10.697±5.944
|
0.752
|
10.170±4.408
|
11.957±8.098
|
|
0.024
|
1.233
|
D-Bil (µmol/L)
|
3.710±3.996
|
4.152±3.791
|
0.235
|
3.591±1.962
|
4.763±5.033
|
|
0.007
|
0.083
|
BUN (mmol/L)
|
4.609±2.024
|
5.119±2.443
|
0.000
|
4.713±1.542
|
4.972±1.972
|
|
0.153
|
0.506
|
Cre (µmol/L)
|
67.148±30.737
|
68.861±24.245
|
0.230
|
68.086±22.061
|
67.887±22.575
|
|
0.668
|
0.252
|
BNP (pg/ml)
|
13.501±38.781
|
39.294±103.015
|
0.000
|
20.801±108.036
|
48.126±132.048
|
|
0.011
|
6.205
|
Mb(µg/L)
|
6.900±10.081
|
15.497±112.880
|
0.003
|
7.362±8.274
|
15.685±15.685
|
|
0.004
|
0.000
|
Tn(µg/L)
|
3.386±2.797
|
2.954±2.848
|
0.000
|
3.072±2.846
|
2.475±2.830
|
|
0.361
|
2.995
|
LDH (u/L)
|
176.357±55.930
|
223.928±103.029
|
0.000
|
193.202±58.210
|
175.748±72.622
|
|
0.003
|
0.479
|
CK (u/L)
|
56.329±47.304
|
59.656±79.253
|
0.013
|
58.540±50.356
|
82.177±126.670
|
|
0.001
|
12.009
|
CK-MB (u/L)
|
8.987±6.190
|
10.750±8.914
|
0.000
|
9.653±6.454
|
11.906±11.724
|
|
0.004
|
3.058
|
Clinical characteristics and serum biomarkers of patients in the training and internal validation set. IL-6 = Interleukin 6; WBC=white blood cells count, L=lymphocyte count, N=neutrophil count, HB = hemoglobin; RBC = red blood cell; PLT = blood platelet count; CRP=C reactive protein; PCT = procalcitonin; PT = prothrombin Time; TT = thrombin time; FIB = fibrinogen; DD = D dimer; BS = blood sugar; ALT = alanine transaminase; AST = glutamic oxaloacetic transaminase; TP = total protein; ALB = albumin; TBil = total bilirubin; D-Bil = direct bilirubin; BUN = blood urea nitrogen; Cre =creatinine; BNP = B-type natriuretic peptide; Mb = myoglobin;Tn = troponin; LDH= lactate dehydrogenase; CK= creatine kinase; CK-MB = creatine kinase-MB. |
Note: P-value < 0.05 means a significant difference. Differences in clinical factors and serum biomarkers between moderate and severe patient sets are assessed using Mann–Whitney U test or Student's t test for continuous variables depending on the normal test and the χ2 test or Fisher’s exact test for categorical variables. |
The auto-segmentation framework, radiomics feature extraction, feature selection, and predicting model construction were all established on the uAI Research Portal V1.1 (Shanghai United Imaging Intelligence, Co., Ltd.).
Adrenal gland and periadrenal fat auto-segmentation
For adrenal gland segmentation, we manually delineated bilateral adrenal glands from the CT images of 315 patients, 265 of them were used for training, and the rest were used to evaluate the performance. The segmentation model yielded average Dice of 79.48% for the left adrenal gland and 78.55% for the right. By average, an average Dice of 79.02% was achieved. Representative auto-segmentation results are shown in Supplementary Figure 2. We also visually verified the model by proving segmentation results to the radiologists and acquired satisfactory feedback. The segmentation algorithm was then used to segment all the remaining data automatically.
Radiomics feature and clinical indicator selection
In the training set, ANOVA and LASSO analyses were used to select the radiomics features most relevant to the prognosis of COVID-19. To build AM, the number of radiomics features was reduced to 45, including 10 first-order features and 35 texture features (GLCM = 5, GLSZM = 19, GLRLM = 4, GLDM = 5 and NGTDM = 2). In addition, 71 features for PM include 21 first-order features and 50 texture features (GLCM = 14, GLSZM = 22, GLRLM = 4, GLDM = 6 and NGTDM = 4), while 68 features for FM are composed of 12 first-order features and 56 texture features (GLCM=12, GLSZM=31, GLRLM = 4, GLDM=6 and NGTDM = 3).
A total of 30 clinical factors and serum biomarkers were analyzed in our study. They were age, sex, IL-6, white blood cell count (WBC), L, neutrophil count (N), hemoglobin (HB), red blood cell count (RBC), blood platelet count (PLT), CRP, PCT, prothrombin time (PT), thrombin time (TT), fibrinogen (FIB), DD, blood sugar (BS), Alanine transaminase (ALT), AST, total protein (TP), albumin(ALB), total bilirubin (TBil), direct bilirubin (D-Bil), blood urea nitrogen (BUN), creatinine (Cre), BNP, myoglobin (Mb), troponin (Tn), LDH, creatine kinase (CK), and CK-MB (Table 1).Among them, 17 clinical factors and serum biomarkers were selected using univariate logistic regression analysis, 7 indicators, LDH, L, HB, DD, WBC, TT, and TP, were selected using multivariate logistic regression analysis. The relationship between RadScore from FM used in the construction of RN and 7 clinical factors combing serum biomarkers were analyzed using Pearson correlation in training, validation, and two independent-test sets. The difference in RadScores with clinical factors or serum biomarkers was not significant. Supplementary Figure 3 indicates that the radiomics information extracted from onset CT images belonged to another dimension, and this information was not affected by clinical factors and serum biomarkers.
Three radiomics models and clinical model building
We developed three radiomics models (AM, PM, and FM) based on radiomics features and CM based on the seven selected independent predictive clinical indicators. We used three evaluation indicators (AUC, 95% CI, sensitivity [SEN] and specificity [SPE]) to assess AM, PM, FM, and CM for predicting prognosis of patients with COVID-19 in all sets. In general, AM achieved an AUC of 0.755, 0.655, 0.716 and 0.701 in the training set, validation set and 2 independent-test sets, respectively; PM achieved an AUC of 0.796, 0663, 0.692 and 0.652; FM achieved an AUC of 0.828, 0.704, 0.702 and 0.709; CM obtained an AUC of 0.716 ,0.720, 0.716 and 0.792(Figure 3, supplementary table 2).
Box-plots summarized the RadScores and seven clinical indicators in training, validation, and two independent test sets, which directly demonstrated the difference of RadScore and seven clinical indicators between the moderate and severe patient sets (Supplementary Figure 4).
RN construction and validation
Multivariate analysis revealed that RadScore and seven clinical indicators were significant independent factors in predicting disease prognosis of patients with COVID-19. Using collinearity diagnosis, VIF for the radiomics score and seven clinical indicators were from 1.058 to 1.332, indicating no severe collinearity in these factors. Besides, we used the RadScore from FM and seven clinical indicators to construct the RN to assess disease prognosis in patients with COVID-19 (Figure 4). The RN showed satisfactory performance for predicting and assessing the prognosis in patients with COVID-19. The results yield AUC of 0.825 (95% CI, 0.799 to 0.849) in the training set, 0.736 (95% CI, 0.675 to 0.791), 0.749 (95% CI, 0.577 to 0.878) and 0.806 (95% CI, 0.718 to 0.877) in the validation set, the independent-test set 1 and the independent-test set 2, respectively. (Figure 3, supplementary table 2).
DeLong’s test was used to compare the AUCs of the three radiomics models, CM and RN, in the training set. The result showed that the AM, PM, FM and RN were significantly better than CM (P < 0.05) and the difference between FM and RN was not statistically significant (P = 0.699) in the training set. The calibration curve showed an agreement between the predicted and actual values. The Hosmer–Lemeshow test was not significant in the validation set (mean absolute error [MAE] = 0.029), independent-test set 1 (MAE = 0.072) and independent-test set 2 (MAE = 0.048), which suggests that there was no significant departure from actual values (Supplementary Figure 5). DCA was used to evaluate the performance of RN (Supplementary Figure 6). The RN could get more net benefits than FM and CM with a threshold probability higher than 0.3 in the validation set. Similarity, RN can still get more net benefits than FM and CM with a threshold probability was more than 0.5 in the independent-test set 1 and always in the independent-test set 2