Development and Validation of a Novel Predictive Nomogram for Disease Progression of COVID-19: A Multicenter Retrospective Cohort Study of 4,086 Cases in China

Aim: Coronavirus disease 2019 (COVID-19) has caused an unprecedented healthcare crisis. We aim to develop and validate a nomogram for predicting disease progression based on a large cohort of hospitalized COVID-19 patients. Methods: This is a multicenter retrospective cohort study, with a total of 4,086 hospitalized COVID-19 patients enrolled from two hospitals in Wuhan, China between February 3 rd and Apr 10 th . Nomogram was developed based on a cohort of 3, 022 patients from one hospital, and externally validated in another cohort of 1,064 patients from the other hospital. The calibration was assessed by a calibration plot and the HL test to evaluate the goodness of t, and the Area under the ROC Curve (AUROC) as a measure of discriminative ability. Results: Six independent predictors, including age, dyspnea, platelet count, lactate dehydrogenase, D-dimer and cardiovascular disease, were nally identied for construction of the nomogram for predicting disease progression of COVID-19 patients during hospitalization. The AUROC was 0.877 and 0.817 for development cohort and validation cohort, respectively. The calibration plots AND Hosmer-Lemeshow test showed optimal agreement between nomogram prediction and actual observation. The decision curve analysis showed the performance of the nomograms were better than all univariable models, and had greater net benet. Next, a predictive nomogram for disease severity on admission was formulated and the six independent factors used were similar to that of the nomogram for disease progression, which indicates that those factors play important roles in determining disease severity and the risk of disease progression. Conclusion: In the current study, a nomogram was developed based on generally readily available variables at hospital admission to help predict disease progression of COVID-19.


Development and Validation of a Novel Predictive
showed optimal agreement between nomogram prediction and actual observation. The decision curve analysis showed the performance of the nomograms were better than all univariable models, and had greater net bene t. Next, a predictive nomogram for disease severity on admission was formulated and the six independent factors used were similar to that of the nomogram for disease progression, which indicates that those factors play important roles in determining disease severity and the risk of disease progression.
Conclusion: In the current study, a nomogram was developed based on generally readily available variables at hospital admission to help predict disease progression of COVID-19. A quick index scoring system may help quickly identify hospitalized COVID-19 patients at high risk for worse outcome. Previous studies have found that several factors are associated with progression to severe or critical disease, including advanced age, comorbidities, hyperactivation of the immune system, sex, and other factors, and risk score system were developed [3][4][5][6][7] . However, integration of those risk factors, especially in a nomogram, is still lacking. As pointed out in a recent systemic review, a total of 8 prognostic models have been proposed to predict progression to severe or critical state, and another 23 models for prediction of mortality 8 . However, these models were considered at high risk of bias due to non-representative selection of control patients, high risk of model over tting, vague reporting and other reasons 8 .
We therefore conducted a multicenter, retrospective cohort study involving 4,086 in-hospital COVID-19 patients. Two nomograms predicting disease severity at admission and disease progression were successfully developed in a development cohort of 3022 patients and validated in an external cohort of 1064 patients. The nomogram performed well by the area under receiver operating curve (AUROC) for discrimination and calibration plot together with Hosmer-Lemeshow (HL) test for calibration. Decision curve analysis (DCA) nally supported the greater net bene t of the predictive models. Therefore, the current study established a simple and effective tool to help clinicians identify patients who are easy to deteriorate into critical cases and therefore optimizing treatment strategies, reducing mortality, and relieving the medical pressure.

Study Population and Design
This multicenter retrospective cohort study included a total of 3,022 COVID-19 cases recruited between February 3rd and Apr 10th from Wuhan Huoshenshan Hospital (used as development cohort), and 1,064 COVID-19 cases recruited between February 19th and Marth 18th from Taikang-Tongji hospital with relevant epidemiological and clinical data (used as validation cohort). The diagnosis of COVID-19 patients was based on the World Health Organization interim guidance. 9 The nasal and pharyngeal swab specimens were collected for extracting SARS-Cov-2 RNA from patients suspected of having COVID-19 infection. Real-time reverse transcription polymerase chain reaction assay was then implemented for the laboratory con rmation of the SARS-Cov-2 virus. The epidemiological data was telephone-interviewed using a uniformed questionnaire by two trained physicians. The clinical symptoms and signs, laboratory, progression and outcome information were extracted from the electronic medical records. We double entered and validated the data using EpiData (version 3.1, EpiData Association, Odense, Denmark) software, and disputes were arbitrated by the expert committees composed of experts of respiratory and critical care medicine, radiograph, and epidemiology. The study protocol was approved by the ethics committee of Wuhan Huoshenshan Hospital (HSSLL024) and Taikang-Tongji hospital (TKTJKY2020146), and the informed consent was waived due to dealing with urgent public health concerns.

Study Outcomes and De nitions
Disease progression of COVID-19 was evaluated as the primary endpoint in the current study, and disease severity was evaluated as the secondary endpoint. The disease progression consisted of severity progression, admission to an intensive care unit (ICU), the use of mechanical ventilation, or death. The severity of COVID-19 at admission was determined according to "Guidance for COVID-19 Prevention, Control, Diagnosis and Management" by National Health Commission of the People's Republic of China, which was divided into four categories: mild, ordinary, severe and critical. 10 Mild, ordinary were categized as non-severe, while severe and critical were categized as severe. Fever was de ned as an axillary temperature of 37.3 °C or higher.

Statistical analysis
All statistical analyses were conducted using the R software version 4.0.0 (Institute for Statistics and Mathematics, Vienna, Austria). The reported statistical signi cance levels were all 2-sided, and a level of statistical signi cance set at < 0.05 was considered statistically signi cant. Categorical variables were described using frequency rates and percentages, while continuous variables were described using the median/interquartile range (IQR) values. The missing values of all potential predictors (missing rate of less than 10.0%) were imputed by expectation-maximization (EM) method. Logistic regression analysis was adopted for the estimation of odds ratio (OR) and corresponding con dence interval (CI) of candidate variables. The following procedures were implemented to screen the candidate variables in the development cohort. First, univariate logistic regression analysis was used to screen the potential prognostic factors which reached a P value of less than 0.05. Then, the independent risk factors were derived from a backward stepdown selection process in multivariate logistic regression model. Finally, a predictive nomogram was formulated based on the results of multivariate analysis according to the Akaike information criterion and the random forest procedure. 11,12 Nomogram calibration was assessed by a calibration plot and the HL test to evaluate the goodness of t, and the AUROC as a measure of discriminative ability. The nomograms were applied to an independent cohort (1,064 COVID-19 cases from Taikang-Tongji hospital) to validate and evaluate the prediction e cacy. During the validation of the prognostic nomogram, the total points of each patient in the validation cohort were calculated according to the established nomogram, then logistic regression in this cohort was performed using the total points as a factor, and nally the AUROC and calibration plot were derived based on the regression analysis.
Finally, DCA was used to evaluate the net bene t of the predictive models.

Development and validation of a predictive nomogram for disease progression of COVID-19
We rst constructed the nomogram for predicting disease progression of COVID-19 patients during hospitalization. Supplementary Table 1 presented the results of univariate logistic regression analysis for disease progression of COVID-19 in the development cohort. Six independent predictors were nally identi ed for the construction of the nomogram (  Fig. 1). The calibration plot showed an optimal agreement between the prediction by nomogram and actual observation ( Supplementary Fig. 2), and veri ed by HL test (P = 0.348). For the external validation, the AUROC (0.817, 95% CI: 0.773-0.861) con rmed good discriminating ability ( Supplementary Fig. 3). The calibration plot (Supplementary Fig. 4) and HL test (P = 0.685) showed an optimal agreement between the prediction by nomogram and actual observation. The DCA showed the performance of the nomogram was better than all univariable models, and had greater net bene t (Fig. 2).

Development and validation of a predictive nomogram for severity of COVID-19 at admission
We next asked whether the factors used for predicting disease progression may also play a role in determining disease severity on admission. Therefore, we set up a nomogram for predicting disease severity. Supplementary Table 2 presented the results of univariate logistic regression analysis for severity of COVID-19 at admission in the development cohort. Six independent predictors were nally identi ed for the construction of the nomogram (  Fig. 5). The calibration plot showed an optimal agreement between the prediction by nomogram and actual observation ( Supplementary Fig. 6), and veri ed by HL test (P = 0.949). For the external validation, the AUROC (0.783, 95% CI: 0.753-0.812) con rmed good discriminating ability ( Supplementary Fig. 7). The calibration plot (Supplementary Fig. 8) and HL test (P = 0.578) showed an optimal agreement between the prediction by nomogram and actual observation. The DCA showed the performance of the nomogram was better than all univariable models, and had greater net bene t (Fig. 4).

Discussion
The current study established a nomogram for predicting COVID-19 disease progression in a relatively large cohort of 4, 086 patients. The model displayed well performance with satisfactory accuracy in both the development and external validation cohorts. The variables required for calculation of the nomogram are generally readily available at hospital admission, so the nomogram can be easily used by clinicians to estimate an individual hospitalized patient's disease severity and risk of disease progression during hospitalization.
A number of risk factors have been reported to be associated with COVID-19 severity and worse prognosis, including mortality. The impact of smoking on COVID-19 severity has been previously reported, showing that patients with any smoking history are vulnerable to severe COVID-19 and worse in-hospital outcomes 13 4 . Taken together, the above-mentioned models were based on different populations, often with a small sample size, and different endpoints were selected. Therefore, great caution should be paid to application of those models in clinic.
The nomograms established in the current study have several strengths. First, we have a relatively large sample size, with more than four thousand patients enrolled. Second, it is of note that all patients in both development cohort and external validation cohort have reached the composite endpoints, which enables the current study population a sound representation of hospitalized COVID-19 patients. Third, the nomograms performed well in external validation of an independent cohort, suggesting that it is potentially applicable in different populations. The current study also has several limitations. First, the current study is of retrospective nature and the two nomograms need further validation in prospective studies. Second, due to emergent conditions, some data are missing for certain variables. Third, the data for model development and validation are all from Wuhan, China, which could potentially limit the generalizability of those models in other areas of the world.

Conclusion
In this study, we developed a nomogram to predict the disease progression among patients with COVID-

Consent for publication
The informed consent was waived due to dealing with urgent public health concerns. All authors critically revised the manuscript for important intellectual content and gave nal approval for the version to be published.

Availability of data and materials
Datasets for this research are included in this paper (and its supplementary information les). Further details will be made available to researchers in contact with the corresponding author.

Competing interests
The authors declare that no potential con icts of interest.

Funding
The present study was funded by Outstanding Youth Science Foundation of Chongqing (cstc2020jcyj-jq0129), National Natural Science foundation of China (81672287)  Authors' contributions CG and MX conceived and designed the study. MX, LL and LS drafted the paper and did the statistical analysis. All authors collected the data. All authors approved the nal draft of the manuscript for publication. Figure 1 Nomogram for disease progression of COVID-19 in the development cohort. To use the nomogram, an individual patient's value is located on each variable axis, and a line is drawn upward to determine the number of points received for each variable value. The sum of these numbers is located on the Total Points axis, and a line is drawn downward to the "Risk axes" to determine the likelihood of disease progression of COVID-19.   Decision curve analysis for severity of COVID-19 at admission. Decision curve analysis was performed to compare the performance of the nomogram with that of all univariable models.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.