2.1 Characteristics of COVID-19 patients
Tables 1 to 3 lists the distribution of various parameters including demographic, epidemiology and clinical characteristics of the COVID-19 ARDS and non-ARDS populations.
- Demographics and Epidemiology
In this study, we collected a total of 659 patients from Wuhan and non-Wuhan areas who were confirmed with COVID-19,of which 76 patients (11.5%) developed ARDS. 447 patients (70.9%) had contact with infected persons and 50.9% had a family infection. The median incubation period was 5 days (interquartile range, 3 to 9) and the average time from onset to ARDS and admission to ARDS were 10 days and 3 days, respectively. The median age of the patients was 50 years (interquartile range, 37 to 62) and 50.4% of the patients were male. Patients with ARDS were significantly older than those with non-ARDS by a median of 7.5 years (56.5 years vs. 49 years) and male patients (76.3%) were more likely to develop ARDS. More than 50% of ARDS patients had a BMI greater than 25. However, the exposure histories of the two groups were similar (Table 1).
- Clinical Characteristics and Underlying Diseases
On severity evaluation at admission, 75.4% of COVID-19 patients were assessed as common type while among the patients with ARDS, 80.3% were evaluated as severe or critical. The most common clinical symptoms of COVID-19 patients at the time of onset were fever (66.6%), cough (68.7%), expectoration (39.6%), fatigue (34.2%) and dry cough (29.6%). Encephalopathy (0.5%), hemoptysis (1.6%), vomiting (3.0%) and stuffy nose (3.8%) were uncommon. Compared with non-ARDS patients, ARDS patients had a higher frequency of coughing (80.3% vs. 67.2%) and dyspnea (59.2% vs. 11.6%). The median temperature was 37.4℃. ARDS patients were 0.5 ℃ higher than non-ARDS patients (37.9℃vs.37.4℃), which was statistically significant (P<0.001).
Overall, the presence of any comorbidities was more common among ARDS patients than no-ARDS (56.6% vs. 39.8%). Patients with ARDS had a much higher incidence of hypertension (48.7% vs.23%) and diabetes (17.8% vs.9.5%). Two of the five patients infected with other viruses developed ARDS. ARDS also occurred in one patient who was treated with immunosuppressive agents (Table 2).
- Radiologic, Laboratory Findings and Complications
Table 3 shows the results of radiologic, laboratory findings on admission and complications. 74.7% of the patients presented ground-glass shadows on chest CT images and 28.3% of the patients presented consolidation. The above two imaging features accounted for a higher proportion of patients with ARDS than non-ARDS patients, which were 80.8% vs 73.9% and 53.9% vs 24.7%, respectively. The median number of consolidation in ARDS patients was two.
Within 48 hours of admission, lymphocytopenia was present in 23.4% of the patients and leukopenia in 24.8%. However, among ARDS patients, 19.8% had an increase in the white blood cell count, which indicated that ARDS patients had a secondary infection. The ratio of neutrophils to lymphocytes was greater than 3 in 45.3% of COVID-19 patients and 82.7% in ARDS patients with a Median of 6.11. 47.7% and 32.2% of patients had elevated levels of C-reactive protein and lactate dehydrogenase, respectively. In a small number of patients, levels of alanine aminotransferase (ALT), glutamate aminotransferase (AST), creatine kinase (CK) and D-dimer were elevated. Laboratory abnormalities were more severe in ARDS patients than in non-ARDS patients. Besides, the medians of myoglobin and fasting glucose in ARDS patients were 85.9μg/L and 8.1mmol/L respectively, which exceeded the normal reference range and was significantly different from the non-ARDS group.
During hospitalization, 91.3% of patients were diagnosed with pneumonia, and there was no statistical difference between the ARDS group and non-ARDS group. However, patients with ARDS had a higher incidence of shock and secondary bacterial infection (5.5% and 30.3%) than those with non- ARDS (0 and 4.3%), and 45.2% of them were admitted to ICU (Table 3).
2.2 Prediction of risk factors for COVID-19 ARDS
After removal of variables with missing rate >20%, a total of 98 variables consisting of demographic, epidemiology, clinical symptoms, underlying diseases, complication, CT image features and laboratory results were extracted from the structured and unstructured data of electronic medical record (EMR) according to literature reviews and expert clinician opinions. Then, we selected 19 significant risk factors related to COVID-19 by means of SPSS single factor analysis. Among all risk factors, severity evaluation at admission (odds ratio [OR], 13.206; 95%CI, 8.550-20.397; P<0.001), gender (OR, 3.312; 95%CI, 1.979-5.544; P<0.001), age (≥70 year) (OR,19.811; 95%CI, 4.473-87.741; P<0.001),BMI (<23 vs. >25) (OR, 3.717; 95%CI, 1.966 -7.062; P<0.001), temperature (>39℃) (OR, 5.279; 95%CI, 2.305-12.090; P<0.001), hemoptysis (OR, 7.307; 95%CI, 2.263-23.595; P<0.001), cough (OR, 2.574; 95%CI, 1.429-4.542; P<0.001), shortness of breath (OR, 11.281; 95%CI, 6.883-18.490; P<0.001), hypertension (OR, 4.105; 95%CI, 2.572-6.554; P<0.001), diabetes (OR, 2.176; 95%CI, 1.161-4.078; P<0.001), secondary bacterial infection (OR, 9.686; 95%CI, 5.146-18.323; P<0.001), lung consolidation (OR, 4.264; 95%CI, 2.668-6.815; P<0.001), lymphocyte count (OR, 0.145; 95%CI, 0.080-0.263; P<0.001), neutrophils/lymphocytes ratio (NLR) (<3 vs. ≥3) (OR, 7.211; 95%CI, 3.980-13.064; P<0.001), ALT(≤40 vs. >40 U/L) (OR, 2.710; 95%CI, 1.639-4.482; P<0.001), AST (≤40 vs. >40 U/L) (OR, 5.139; 95%CI, 3.100-8.520; P<0.001), CK (≤185 vs.>185 U/L) (OR, 4.114; 95%CI, 2.312-7.319; P<0.001), lactate dehydrogenase (LDH) (≤250 vs. >250 U/L) (OR, 8.104; 95%CI, 4.733-13.876; P<0.001), C-reactive protein (CRP) (≤10 vs.>10 mg/L) (OR, 5.959; 95%CI, 3.510-10.119; P<0.001) were all strongly correlated with ARDS (Table 4).
2.3 Development and verification of predictive models
Based on the above results of univariate analysis, we determined 19 risk factors including severity evaluation at admission, gender, age, BMI, temperature, cough, shortness of breath, hemoptysis, hypertension, diabetes, secondary bacterial infection, lung consolidation, lymphocyte count, CK, NLR, ALT, AST, LDH, and CRP as inputs to the model to evaluate whether COVID-19 patients would develop ARDS. We tried five algorithms for modeling, including logistic regression (LR), random forest (RF), support vector machine (SVM), decision tree (DT) and deep neural networks (DNN). Table 5 shows the mean ± standard deviation (std.) for 10-fold cross validation with AUC and accuracy. DT,LR and RF all exceeded AUC of 0.85 and the mean accuracy of each algorithm was over 0.8. In order to further verify the accuracy of the models, performances of five algorithms were evaluated on the external test set with each technique. Table 6 and figure 1 show that DT, LR, RF and DNN all demonstrated good performance in term of AUC, accuracy and specificity. The sensitivity of DT and LR was much higher than that of other three models. Considering the unbalance of the actual dataset, we also evaluated the balanced accuracy of each model. The result of DT and DNN was 0.98 and 0.93, respectively. The predictive model established by SVM exhibited the worst performance in five models. It is necessary for ARDS diagnosed tool with high sensitivity and accuracy. The results show that DT marked the best value in each evaluation with AUC of 0.99, accuracy of 0.97 and sensitivity of 1.0 respectively. Therefore, the model constructed by decision tree algorithm was optimum tool for ARDS prediction.