1. Description of demographic characteristics
The baseline characteristics of the complete case are shown in Table 1. The average age of the patients with viral pneumonia in this study was 64.40 years, with the majority being elderly, of which 60.14% were male. There were 41 patients with severe or critical illness, accounting for 29.7%. Nearly half of the patients had a history of smoking (49.27%, of which 10.14% had quit smoking before the onset of the disease). A total of 78 (56.52%) patients suffered from comorbidities, of which hypertension was the most common, accounting for approximately 36.96%, followed by diabetes, nephropathy, coronary heart disease, and chronic obstructive pulmonary disease (COPD). The virus types of the patients with viral pneumonia were shown in supplementary Table 1, which mainly consisted of novel coronavirus pneumonia. Multiple logistic regression analysis suggested a significance of 0.879 (greater than 0.5), which had no effect on the risk of severe transformation. Among the patients with viral pneumonia, 50 patients (36.23%) were co-infected with bacteria. The types of bacteria were shown in supplementary Table 2, among which Klebsiella pneumoniae, Acinetobacter baumannii and Escherichia coli were the three most common pathogens. Of all patients, 95 (68.84%) received antiviral treatment, 41
Table1:Population description and comparison
|
Total(N=138)
|
No-Severe(N=97)
|
Severe(N=41)
|
p
|
Gender/Male
|
83(60.14%)
|
56(57.73%)
|
27(65.85%)
|
0.4264
|
Age
|
64.40±16.43
|
59.98±15.63
|
74.63±13.38
|
0.0001
|
Acute-smoker
|
54(39.13%)
|
32(32.99%)
|
22(53.666%)
|
0.0378
|
Quit-smoker
|
14(10.14%)
|
9(9.28%)
|
5(12.20%)
|
Non-smoker
|
70(50.72%)
|
56(57.73%)
|
14(34.15%)
|
Comorbidity
|
Hypertension
|
51(36.96%)
|
28(28.87%)
|
23(56.10%)
|
0.0025
|
COPD
|
24(17.39%)
|
16(16.49%)
|
8(19.51)
|
0.8062
|
Diabetes
|
15(10.87%)
|
9(9.28%)
|
6(14.63%)
|
0.3778
|
Coronary heart disease
|
8(5.78%)
|
5(5.15%)
|
3(7.32%)
|
0.6946
|
Cerebral infarction
|
7(5.07%)
|
3(3.09%)
|
4(9.76%)
|
0.1958
|
renal disease
|
5(3.26%)
|
3(3.09%)
|
2(4.87%)
|
0.6335
|
history of malignancy
|
4(2.90%)
|
1(10.31%)
|
3(7.32%)
|
0.0785
|
Hepatitis
|
2(1.45%)
|
0
|
2(4.87%)
|
0.0867
|
Treatment
|
High flow/noninvasive
|
41(29.71%)
|
12(12.37%)
|
29(70.73%)
|
|
invasive mechanical ventilation
|
12(8.79%)
|
0
|
12(29.27%)
|
|
antiviral treatment
|
95(68.84%)
|
60(68.04%)
|
29(70.73%)
|
|
steroid treatment
|
60(43.48%)
|
35(36.08%)
|
25(60.96%)
|
|
antibiotic therapy
|
59(42.75%)
|
27(27.84%)
|
32(78.05%)
|
|
vasoactive agent
|
15(10.87%)
|
0
|
15(36.59%)
|
|
ICU days
|
2.405±5.81
|
0
|
8.09±8.279
|
<0.0001
|
Hospital days
|
17.86±34.8
|
11.71±5.383
|
32.39±61.15
|
0.0012
|
CD4+
|
414.4±247.9
|
449.8±233.5
|
283.7±192.2
|
<0.0001
|
CD8+
|
261.6±131.3
|
308.6±165.8
|
245.7±111.4
|
0.0242
|
CD4/CD8
|
1.672±0.8903
|
1.865±0.866
|
1.207±0.756
|
<0.0001
|
lymphocyte count
|
1.073±0.758
|
1.167±0.66
|
0.858±0.920
|
<0.0001
|
Multilobar infiltrates
|
46(33.33%)
|
22(22.68%)
|
24(58.54%)
|
0.0024
|
Co-infection with bacteria
|
50(36.23%)
|
21(24.13%)
|
29(70.73%)
|
0.0002
|
Table 1 :Population description and comparison. P-value represented the comparison between Severe group and non-severe group. p-values < 0.05, indicated signifificant difference between Severe group and non-severe group. COPD:chronic obstructive pulmonary disease
received high-flow/non-invasive ventilation, and 12 received invasive ventilation. The average length of hospital stay for patients with viral pneumonia was 17.86 days. T-test and chi-square test were used to analyze the statistical differences in age, smoking, hypertension, ICU days, and hospital days, CD4+, CD8+, CD4+/CD8+, lymphocyte count, multiple lung lobe infiltrations, and bacterial coinfection between severe and non-severe groups of viral pneumonia.
Figure 1: According to the results of the LASSO regression analysis, 9 features were identified as the potential predictors.
2. Determination of Early Warning Model Indicators
Meaningful indicators from preliminary statistical analysis were selected using LASSO regression analysis and logistic multiple regression analysis to screen the model's independent variables. LASSO regression analysis was performed using R code, and as shown in Figure 1, the dashed line on the right corresponds to a parameter of 9, indicating that the coefficients of 9 genes can be used as model parameters. The specific coefficient estimates for each independent variable obtained from the LASSO regression analysis are shown in supplementary Table 3. Among them, the independent variables with non-zero coefficient estimates are the selected independent variables, including age, Coinfection, CD4+, CD4+/CD8+ ratio, multiple lung infiltrates, smoking, hypertension, ICU days, and hospital days. The independent variables with larger coefficient estimates correspond to more important indicators, while those with smaller coefficient estimates can be screened out. Logistic multiple regression was then performed using R language, with severity as the independent variable for this prediction model. The analysis results are shown in supplementary Table 4: except for ICU hospitalization time, there were significant differences (P<0.05) between the other variables and the target variable (severity), including age, coinfection, CD4+, CD4+/CD8+, multiple lung infiltrates, smoking, hypertension, and hospitalization time, which were included in the model construction.
3.Building a Prediction Model and Evaluating its Performance
In this study, we chose to use 70% of the original data as the training set and the 30% as the test set. The model prediction results of the training set and test set were evaluated using ROC value, K-S chart, and Lift chart. As shown in Figure 2, the model result ROC of the training set data is 0.94118(Figure 2A), the K-S curve is more obvious towards the upper left (Figure 2B), and the Lift chart curve is relatively flat (Figure 2C), indicating that the model has high accuracy in predicting severe cases. The test set model prediction results are shown in Figure 2, with a ROC of 0.94397(Figure 2D), which is better than the ROC of the training set model prediction results. The K-S curve is more obvious towards the upper left (Figure 2E), and the Lift chart curve is relatively flat (Figure 2F), indicating that the model prediction performance is stable and has high accuracy in predicting severe cases. Compared with the original MulBSTA score (Figure 2G), this model has higher efficiency in predicting the risk of severe conversion (AUROC=0.94937vs.0.0.8241).
Figure 2: Building a Prediction Model and Evaluating its Performance. A.ROC diagram of the training set model; B: K-S diagram of the training set model ;C: Lift diagram of the test set model. D:ROC diagram of test set model; E: K-S diagram of test set model;F:Lift diagram of test set model;G:ROC curve of MulBSTA score prediction.
4.Production and evaluation of a severe risk scorecard
Based on the above analysis, the predictive model used this time has good predictive ability for whether patients will convert to severe cases. The evaluation card of the
Table2.Results of scorecard model bins
variate
|
Range
|
sample size
|
woe
|
Score
|
age
|
[-Inf,66)
|
40
|
1.156
|
-80
|
[66,70)
|
15
|
1.598
|
-110
|
[70,76)
|
16
|
-0.531
|
37
|
[76,86)
|
12
|
-2.140
|
148
|
[86,Inf)
|
9
|
-0.818
|
56
|
Co-infection
|
No
|
61
|
0.849
|
-60
|
Yes
|
31
|
-1.106
|
78
|
CD4
|
[-Inf,140)
|
10
|
-1.889
|
42
|
[140,180)
|
6
|
-1.041
|
23
|
[180,340)
|
21
|
0.405
|
-9
|
[340,360)
|
5
|
-1.447
|
32
|
[360,Inf)
|
50
|
0.774
|
-17
|
CD4_CD8_ratio
|
[-Inf,0.9)
|
20
|
-1.447
|
76
|
[0.9,1.6)
|
21
|
0.405
|
-21
|
[1.6,1.7)
|
8
|
-0.531
|
28
|
[1.7,Inf)
|
43
|
0.987
|
-52
|
Multi-lobe
|
No
|
63
|
0.512
|
-14
|
Yes
|
29
|
-0.834
|
23
|
smoking
|
No
|
43
|
0.596
|
-32
|
Once
|
9
|
0.211
|
-11
|
Yes
|
40
|
-0.531
|
28
|
hypertension
|
[-Inf,2)
|
57
|
0.506
|
14
|
[2,Inf)
|
35
|
-0.636
|
-18
|
Hospital days
|
[-Inf,8)
|
15
|
-0.348
|
23
|
[8,12)
|
29
|
1.561
|
-101
|
[12,16)
|
26
|
0.394
|
-26
|
[16,22)
|
10
|
-1.447
|
94
|
[22,32)
|
7
|
-0.125
|
8
|
[32,Inf)
|
5
|
-2.428
|
158
|
model is shown in Table 2. The score for whether the patient has a severe risk is set to 600 points. The visualization of the independent variables participating in the model is shown in Figure 3A. Finally this study used the PSI (Population Stability Index) index to evaluate the stability of the scorecard. The evaluation card indicators established in the model were calculated and visualized by PSI using R language, as shown in Figure 3B. The PSI result for whether the patient has severe risk is 0.1776, which is less than 0.25, indicating that the changes are small and the model has good stability.
Figure 3: Evaluation of a severe risk scorecard. A. Model nomogram;B .Evaluation card PSI evaluation result plot.