Demographic data and baseline characteristics of COVID-19patients
The demographic data and baseline characteristics of COVID-19 patients are shown in Table 1. A total of 696 COVID-19 patients (399 males and 297 females) with a average age of 57.8 years, were enrolled in the discovery study. Compared with mild-to-moderate cases, most of the severe-to-critical ill patients were more than 60 years (64.4%) and male (64.4%) with a average age of 62.7 years. SaO2 < 94 were greatly decreased in severe-to-critical patients (82%) compared to mild-to-moderate patients. In severe-to-critical group, the median time from illness onset to hospital admission was 10 days (9.0 ~ 10.0) and time from illness onset to end hospitalization was 24 days (22.0 ~ 27.0), and the above median duration was 8.5 days and 21 days in mild-to-moderate group, respectively. Complications occurred more frequently in severe-to-critical ill patients (69.1%), with hypertension being the most common comorbidity. The most common symptoms on hospital admission was polypnea in severe-to-critical cases compared to mild-to-moderate patients. The above demographic data and baseline characteristics of COVID-19 patients showed nearly similar trends in the validation study.
Correlation analysis of the published variables
Spearman rank coefficient correlation analysis were done to analyze the association between all selected variables. Heatmap of the correlation matrix showed that correlations between these variables were existed regardless of the outcomes in both discovery (Figure 1A-D) and validation studies (Figure 1E-F). For instance, the results showed that neutrophils expression level was highly and positively correlated with leukocytes in mild-to-moderate cases on hospital admission. While when at end hospitalization, we observed a highly and negatively association between albumin and leukocyte, neutrophils, SAA, CRP, LDH (circle size and color indicated relevant strength, dark red and big circles denoted the positively and strong associations, light red and small circles denoted the positively and weak associations). The situation revealed identical characteristic in the validation stage. Correlation analysis showed interaction and co-collinearity were existed between the available variables, and therefore it was improper to predict the severity of COVID-19 by a traditional statistical method.
Dynamic changes or expression levels of laboratory parameters
Figure 2a-r visualized dynamic changes of laboratory parameters in mild to moderate and severe-to-critically ill patients during progression of COVID-19 throughout the clinical course (from hospital admission to end hospitalization) in the discovery stage (as well showed in Table 2). Six indexes, including leukocytes, neutrophils, NLR, LDH, CRP and SAA showed a significant continuous downward trend in severe-to-critical cases, but present an almost identical results in mild-to-moderate except SAA. For lymphocytes, eosinophils, basophils, high-sensitivity cardiac troponin I, prothrombin time, D-dimer, serum ferritin, IL-6 and albumin showed a continuous upward effect in severe-to-critical patients and majority of mild-to-moderate also present a similar result. On hospital admission, regardless of the severity or the outcome, majority of patients present an identical monocytes and platelets, but increased significantly afterwards for platelets in mild to moderate. Whereas severe-to-critical cases started with a high-level and dropped significantly afterwards in PLR, and the situation was almost opposite for PCT.
In the validation stage (data was lacking at end hospitalization), admission levels of leukocytes, neutrophils, basophils, NLR, PLR, LDH, high-sensitivity cardiac troponin I, serum ferritin, IL-6, PCT and CRP were clearly elevated in severe-to- critically ill cases compared with mild-to-moderate patients; comparing to mild-to-moderate group, lymphocytes, eosinophils and albumin were greatly reduced in severe-to-critical groups (Figure 2a-q and Table 2).
Correlation networks analysis of the published variables with the severity of COVID-19
The correlation of all selected indexes were analyzed with Cytoscape software (https://cytoscape.org/). The positive indexes were shown with a red background, the negative ones were shown with a blue background, and the line with thick or fine represented the correlation of strong or weak, respectively. On hospital admission, strong positive correlations between NLR and the severity of COVID-19 were identified, followed by LDH; meanwhile, we observed a strong negative correlation between SaO2 and disease severity both in the discovery and validation stages (Figure 3).
Principal component analysis of variables for explaining proportion to the severity of COVID-19
A biplot via PCA showed the configuration of indexes on hospital admission and at end hospitalization in Figure 4. It is possible to condense several correlated variables into two composite parameters by this method which facilitates prediction of dichotomous endpoints. In this study, we first revealed that PCA is a well-established analytical method for severity stratification of COVID-19 patients. Plots of individual component scores for the first principal component (PC1) versus the second principal component (PC2) were provided. On hospital admission (Figure 4A), PC1 and PC2 showed more marked separated severe-to-critical cases from mild-to-moderate patients. Combinations of PC1 and PC2 could explain 43.4% proportion of the whole variances. Twelve predictors significantly associated positively or negatively with PC1, having the biggest positive and negative contribution, respectively. Notably, the spectrum of indexes in the PCA was consistent with what was shown in the correlation networks in the discovery stage (Figure 3). At end hospitalization, mild-to-moderate patients and severe-to-critical cases present modest separated association (Figure 4B). However, in the validation stage, PC1 and PC2 showed a weak separation of relations between the two groups (Figure 4C).
Independent predictors and ROC analysis on hospital admission
The independent predictors were identified from previous studies5-19. We reanalyzed whether these published predictors could be validated in the present study. As indicated in Figure 5, NLR, PLR, lymphocyte, CRP, IL-6, LDH, serum ferritin, CD4+ T cell, CD8+ T cell, high-sensitivity cardiac troponin I, albumin and D-dimer were found to be independent predictors for COVID-19 severity. ROC analysis of single independent variable was calculated using the expression levels at hospital admission (uni-AuROC). Results showed that the AuROC was 0.782 for NLR, follow by LDH of 0.765 and CD8+ T cell of 0.753. The AuROC of other indexes varied from 0.585 to 0.730 in the discovery stage (Figure 5a-l and Figure 8a). With a cut-off value of 7.0, NLR exhibited sensitivity 72.3%, specificity 74.4%, correct classification ratio (CCR) 73.6%, positive prediction value (PPV) 65.3%, and negative prediction value (NPV) 80.2% in the discovery stage (Table 3), and NLR still demonstrated to be the biggest AuROC of 0.794 in the validation stage (Figure 5a-j and Figure 8a). Then, we tested the different combination of NLR and the other one risk parameter for the prediction of disease (bi-AuROC) (Figure 6). In the discovery study, combination of NLR and SaO2 showed the highest AuROC of 0.901, followed by the combination of NLR and LDH (0.807), then NLR and D-dimer (0.800) (Figure 6a-o and Figure 8b). NLR + SaO2, NLR + complication and NLR + age were the top three highest AuROC of 0.876, 0.830 and 0.819 in the validation stage, respectively (Figure 6a-l and Figure 8b). With a cut-off value of 0.532, NLR + SaO2 exhibited 84.2% sensitivity, 88.4% specificity, 86.8% correct classification ratio (CCR), 81.8% positive prediction value (PPV), and 90.2% negative prediction value (NPV) in the discovery stage (Table 4).
AuROC comparisons of different models in prediction of severe COVID-19
To compare the accuracy of different published models with ours in severity of COVID-19 prediction (NLR + SaO2 as the reference model), different ROC curve analysis was performed and the difference in AuROCs was tested by using our data. The AuROCs of NLR + SaO2, PCA, Zhu et al. model, Jiang et al. model, Bi et al. model and Henry et al. model were 0.901 (95% CI: 0.874 - 0.928), 0.865 (95% CI: 0.830 - 0.899) (p = 0.09), 0.739 (95% CI: 0.697 - 0.781) (p = 0.002), 0.755 (95% CI: 0.677 - 0.834) (p = 0.003), 0.749 (95% CI: 0.708 - 0.790) (p = 0.009) and 0.817 (95% CI: 0.780 - 0.855) (p = 0.02), respectively. It showed that the AuROC of NLR + SaO2 to predict COVID-19 severity was significantly higher than these published models, which demonstrated the excellent predictive power for the severity of disease, and there was no significantly difference between NLR + SaO2 and PCA in the discovery stage (Table 5 discovery dataset, Figure 7a-f and Figure 8c). The nearly consistent direction of effects were revealed in the validation stage (except Henry et al. model showed no significant difference compared with NLR + SaO2) (Table 5 validation dataset, Figure 7g-k and Figure 8c).