Developing the nomogram for the prediction of in-hospital incidence of acute respiratory distress syndrome in patients with COVID-19

Background: Acute respiratory distress syndrome (ARDS) was the most common complication of coronavirus disease-2019(COVID-19), leading to poor clinical outcomes. However, the model to predict the in-hospital incidence of ARDS in patients with COVID-19 is limited. Therefore, we aimed to develop a predictive nomogram for the in-hospital incidence of ARDS in COVID-19 patients. Methods: Patients with COVID-19 admitted to Changsha Public Health Centre between Jan 30, 2020, and Feb 22, 2020, were enrolled. Clinical characteristics and laboratory variables were analyzed in patients with ARDS. Risk factors for ARDS were selected by LASSO binary logistic regression. Nomogram was established based on risk factors and validated by the dataset. Results: A total of 113 patients, involving 99 in the non-ARDS group and 14 in the ARDS group were included in the study. 8 variables including hypertension, chronic obstructive pulmonary disease (COPD), cough, lactate dehydrogenase (LDH), creatine kinase (CK), white blood count (WBC), body temperature, and heart rate were identied to be included in the model. The specicity, sensitivity, and accuracy of the full model were 100%, 85.7%, and 87.5% respectively. The calibration curve also showed good agreement between the predicted and observed values in the model. Conclusions: The nomogram can predict the in-hospital incidence of ARDS in COVID-19 patients. It helps physicians to make an individualized treatment plan for each patient.


Introduction
Evidence indicated that Severe Acute Respiratory Syndrome Coronavirus-2(SARS-CoV-2), as the pathogen of COVID-19, was a new type of coronavirus (1). It could result in multiple system infections in human, especially in the respiratory system, and some of the cases would develop into acute respiratory distress syndrome (ARDS), even multiple organ failure and death with the processing of the disease (2).
ARDS was one of the most common complications of the COVID-19, and more likely to lead to poor clinical outcomes (3). An observation study showed that in 36 patients with COVID-19 who received care in the intensive care unit (ICU), 75% of them were transferred to ICU due to the developed ARDS(4).
Huang (5) et al reported that 85% of the patients received the care in the ICU because of ARDS. Early identi cation and prediction of the incidence of ARDS are vital. However, the accurately predicted model to perform the incidence of developing ARDS in patients with COVID-19 is limited. Nomogram, as a statistical model constructed based on different clinical and laboratory variables, has been widely applied for physicians to make appropriate clinical decisions in different disorders (6,7). Therefore, in this study, we analyzed the clinical characteristics in COVID-19 patients with ARDS and explored an easily applicable nomogram to provide further guidance on medical treatment.

Patients and study design
This was a retrospective single-center study. The data of patients with COVID-19 admitted to Changsha Public Health Centre between Jan 30, 2020, and Feb 22, 2020, were collected. Changsha Public Health Centre was the only designated tertiary hospital for COVID-19 patients in Changsha. All the patients were diagnosed with SARS-CoV-2 infection based on the world health organization (WHO) interim guidance.
This study was approved by the institutional ethics board of the Second Xiangya Hospital, Central South University. Because of an urgent retrospective study, informed consent was waived.

Study variables
All candidate predictors were collected based on relevant literature reported and associated with clinical evidence. We collected the patient demographics (age, gender), medical history diseases (hypertension, cardiovascular disease, diabetes, cerebrovascular disease, chronic obstructive pulmonary disease (COPD), chronic liver disease, malignant tumor ), symptoms of onset illness, laboratory tests (erythrocyte sedimentation rate, C-reactive protein (CRP), procalcitonin, liver and renal function, blood chemistry, coagulation test, complete blood count, lactate dehydrogenase(LDH) and creatine kinase(CK)), history of Hubei exposure in two weeks, body temperature, systolic blood pressure, diastolic blood pressure, heart rate, and days from illness onset to admission.

Outcomes
Acute respiratory distress syndrome (ARDS) was made by a decrease in the PaO2/FiO2 index below 300 mmHg according to the Berlin de nition (8). Arterial blood gas analysis was performed for patients with a symptom of dyspnea during hospitalization. In-hospital incidence of ARDS was evaluated.

Variables selection and model establishment
In the study, the variables were large, while the number of patients was relatively small. To avoid over tting of the model, the least absolute shrinkage and selection operator (LASSO) algorithm was used to further screen the predictive variables among previously selected variables. The tuning parameter (lambda) selection in the LASSO model used 10-fold cross-validation. A prediction model was established by logistic regression and the nal nomogram prognostic model was performed. Moreover, calibration curves were plotted to improve the perfect nomogram's prediction.

Statistical analysis
All variables of the patients were presented as means (standard deviations) or medians (interquartile ranges, IQR) for continuous variables, and categories variables were presented as frequencies and percentages. The groups for normally and skewed distributed continuous variables used one-way ANOVA and Kruskal-Wallis tests, respectively, and categorical variables used the chi-squared test. The LASSO binary logistic regression analysis performed by the R package "glmet". The nomogram and decision curve were established by the "rms" package and packages R. The receiver operating characteristic (ROC) curves were plotted and the area under ROC curve (AUC) was accessed. We used 500 bootstraps resamples to computed the AUC with a 95%CI. Then we displayed the Sensitivity, speci city, and accuracy of the stepwise model by bootstrap. The statistical analyses were performed with statistical packages R (http://www.R-project.org) and Empower-Stats. A p-value < 0.05 was considered statistically signi cant.

Base clinical characteristics
In this cohort study, the baseline characteristics of the patients were shown in Table 1. The patients who developed ARDS were older, had higher body temperature and fast heart rate, and they also were more likely to have a history of hypertension and symptoms of fever and dyspnea. Laboratory tests were performed when the patients rst admitted to the hospital. The level of CRP, LDH, CK, aspartate aminotransferase was higher in ARDS patients than those without ARDS. Prothrombin time was longer, and the levels of albumin, white blood count (WBC), lymphocyte, and monocyte were lower in ARDS patients.

Feature extraction and selection
The LASSO algorithm was used to extract predictive variables. The best match variables were selected from the value of lambda that gives minimum mean cross-validated error. In the nal, only 8 variables were extracted from the 45 variables ( Figure 1A and 1B), including hypertension, COPD, cough, LDH, CK, WBC, body temperature, and heart rate.

Nomogram construction and validation
To better predict the ratio of developing ARDS, we created a nomogram which could represent different patients' prediction based on their characteristics ( Figure 2A). There were regarded as highly clinically appropriate ARDS occurrences by eight variables. In the nomogram, the total points were obtained by adding the point of each variable, which were corresponding to the incidence of ARDS.
The calibration curve of the nomogram was demonstrating a good t, and presented in Fig. 2B and Fig.   2C. The calibration curve also showed good consistency between the predicted and observed values in the model. The predictive capability of the full model showed in Figure 2C. The speci city, sensitivity, and accuracy of the full model were 100%, 85.7%, and 87.5% respectively. The decision curve showed that the nomogram had superior standardized net bene t and in uence on the patients to develop ARDS ( Figure  2D).

Discussion
Timely evaluating and predicting the in-hospital incidence of ARDS in patients can help to facilitate the clinical prognosis. In this study, we established a model based on 8 variables from the data of 113 patients with COVID-19, including COPD, hypertension, cough, heart rate, body temperature, WBC, LDH and CK, and the model demonstrated relatively better performance.
In our model, high body temperature, fast heart rate, and cough were risk factors. Previous studies showed that high body temperature, cough, and fast heart rate were the common clinical characteristics of pneumonia (9)(10)(11)(12)(13). Moreover, higher body temperature and elevated heart rate in pneumonia were associated with severe in ammation response and hypoxia, which were also signi cantly higher in ARDS patients (13,14). Researches showed that WBC was an indicator of the systemic in ammatory response, and could be a potential marker for evaluating the severity and prognosis in various disorders (15,16).
However, there was a signi cant difference in our model. Our results suggested that the decreased level of WBC was parallel with the increased incidence of ARDS in COVID-19 patients. The reason may be explained by the different types of pathogen infection. Previous studies veri ed that SARS-CoV-2 could invade the respiratory system, impair cells and tissues, induce immune reaction, and result in changes of peripheral white blood cells, leading to decreased leukocytes and lymphocytes (17)(18)(19). In addition, evidence illuminated that patients with comorbidity were more likely to have poor clinical outcomes (20). COPD was one of the maximum features in our model. Experimental and clinical studies veri ed that COPD patients were more susceptible to pathogen infection, resulting in lung function impaired and the incidence of ARDS (21). Our results were consistent with previous publications.
In our study, the relative maximum features were LDH and CK. The levels of LDH and CK signi cantly increased in patients with ARDS. LDH and CK were not only markers of in ammation but also indicators of prognosis in critical illness (22,23). Liu (24) et al reported that the patients with acute lung injury also had elevated levels of enzymes including LDH and CK due to in ammation and oxidative stress.
Esteves (25) et al found that LDH was a diagnostic biomarker of pneumocystis pneumonia in patients, in agreement with our results. Recently studies also reported that in COVD-19 patients, with the level of viral load detected from respiratory tract increasing, the lung function was getting worse (24). Finally, the eight variables were selected as the predictive index by LASSO regression, which may be related to differences in the number of patients.
There is some strength in our study. First, this is the rst study to develop a nomogram for predicting the in-hospital incidence of ARDS in COVID-19 patients. Second, physicians are capable of directly calculating the probability of ARDS by our nomogram and timely adjusting the individualized medical treatment. Worthwhile, our research still has limitations. a) Due to the relatively small samples, the nomogram needs to be validated by a larger population. Considering the incidence of ARDS in the cohort was more than 10%, our research still could provide signi cant clinical reference for predicting ARDS in COVID-19. b) All the patients were Chinese, so cautions must be considered while applying the proposed nomogram to other ethnic patients. c) Our study was retrospective and there might be patient selection biases, which was an inevitable limitation in these types of studies. We retrospectively analyzed all the possible factors including comorbidities, sings, symptoms and lab ndings to mitigate biases.

Conclusion
In the study, the proposed nomogram can predict the in-hospital incidence of ARDS in COVID-19 patients. It helps physicians to make an individualized treatment plan.

List Of Abbreviations
ARDS= acute respiratory distress syndrome, COVID-19= coronavirus disease-2019, SARS-CoV-2=severe acute respiratory syndrome coronavirus-2, ICU= intensive care unit, WHO= world health organization, COPD= chronic obstructive pulmonary disease, CRP= C-reactive protein, LDH=lactate dehydrogenase, CK= creatine kinase, LASSO= least absolute shrinkage and selection operator, IQR= interquartile ranges, ROC= receiver operating characteristic, AUC= area under ROC curve, WBC=white blood count Declarations Acknowledgments We thank all the patients involved in the study.  Abbreviations: SARS-CoV-2, severe acute respiratory syndrome coronavirus-2; ARDS, acute respiratory distress syndrome. Figure 1 The LASSO algorithm and 10-fold cross validation were used to extract the optimal subset of all variables. (A) Optimal variables selection according to AUC value. When the value in (λ) increased to -3.1748, the AUC reached the peak corresponding to the optimal number of predictive variables. (B) LASSO coe cient pro les of the 45 variables. The vertical line was drawn at the value selected by 10-fold cross validation, where the optimal λ resulted in 8 nonzero coe cients