NLR combined with SaO2 predict severe illness among COVID-19 patients: a currently updated model

Objectives: The pandemic of the coronavirus disease 2019 (COVID-19) continuously poses a serious threat to public health, highlighting an urgent need for simple and ecient early detection and prediction. Methods: We comprehensively investigated and reanalyzed the published indexes and models for predicting severe illness among COVID ‐ 19 patients in our dataset, and validated them on an independent dataset. Results: 696 COVID-19 cases in the discovery stage and 337 patients in the validation stage were involved. The AuROC of neutrophil to lymphocyte ratio (NLR) (0.782) was signicantly higher than that of the other 11 independent risk indexes in severe outcome prediction. The combination of NLR and oxygen saturation (SaO 2 ) (NLR+SaO 2 ) showed the biggest AuROC calculations with a value of 0.901; with a cut-off value of 0.532, it exhibited 84.2% sensitivity, 88.4% specicity and 86.8% correct classication ratio. Moreover, we rst identied that principal component analysis (PCA) is an effective tool to predict the severity of COVID-19. We obtained 86.5% prediction accuracy with 86% sensitivity when PCA was applied to predict severe illness. In addition, to evaluate the performance of NLR+SaO 2 and PCA, we compared them with currently published predictive models in the same dataset. Conclusions: It showed that NLR+SaO 2 is an appropriate and promising method for predicting severe illness, followed by PCA. We then validated the results on an independent dataset and revealed that they remained robust accuracy in outcome prediction. This study is signicant for early treatment, intervention, triage and saving limited resources. 2020b


Results
Demographic data and baseline characteristics of  The demographic data and baseline characteristics of COVID-19 patients are shown in Table 1. A total of 696 COVID-19 patients (399 males and 297 females) with a average age of 57.8 years, were enrolled in the discovery study. Compared with mild-to-moderate cases, most of the severe-to-critical ill patients were more than 60 years (64.4%) and male (64.4%) with a average age of 62.7 years. SaO 2 < 94 were greatly decreased in severe-to-critical patients (82%) compared to mild-to-moderate patients. In severe-to-critical group, the median time from illness onset to hospital admission was 10 days (9.0 ~ 10.0) and time from illness onset to end hospitalization was 24 days (22.0 ~ 27.0), and the above median duration was 8.5 days and 21 days in mild-to-moderate group, respectively. Complications occurred more frequently in severe-to-critical ill patients (69.1%), with hypertension being the most common comorbidity. The most common symptoms on hospital admission was polypnea in severe-to-critical cases compared to mild-to-moderate patients. The above demographic data and baseline characteristics of COVID-19 patients showed nearly similar trends in the validation study.

Correlation analysis of the published variables
Spearman rank coe cient correlation analysis were done to analyze the association between all selected variables. Heatmap of the correlation matrix showed that correlations between these variables were existed regardless of the outcomes in both discovery ( Figure 1A-D) and validation studies ( Figure 1E-F). For instance, the results showed that neutrophils expression level was highly and positively correlated with leukocytes in mild-to-moderate cases on hospital admission. While when at end hospitalization, we observed a highly and negatively association between albumin and leukocyte, neutrophils, SAA, CRP, LDH (circle size and color indicated relevant strength, dark red and big circles denoted the positively and strong associations, light red and small circles denoted the positively and weak associations). The situation revealed identical characteristic in the validation stage. Correlation analysis showed interaction and cocollinearity were existed between the available variables, and therefore it was improper to predict the severity of COVID-19 by a traditional statistical method.
Dynamic changes or expression levels of laboratory parameters throughout the clinical course (from hospital admission to end hospitalization) in the discovery stage (as well showed in Table 2). Six indexes, including leukocytes, neutrophils, NLR, LDH, CRP and SAA showed a signi cant continuous downward trend in severe-to-critical cases, but present an almost identical results in mild-to-moderate except SAA. For lymphocytes, eosinophils, basophils, high-sensitivity cardiac troponin I, prothrombin time, D-dimer, serum ferritin, IL-6 and albumin showed a continuous upward effect in severe-to-critical patients and majority of mild-to-moderate also present a similar result. On hospital admission, regardless of the severity or the outcome, majority of patients present an identical monocytes and platelets, but increased signi cantly afterwards for platelets in mild to moderate. Whereas severe-to-critical cases started with a high-level and dropped signi cantly afterwards in PLR, and the situation was almost opposite for PCT.
In the validation stage (data was lacking at end hospitalization), admission levels of leukocytes, neutrophils, basophils, NLR, PLR, LDH, high-sensitivity cardiac troponin I, serum ferritin, IL-6, PCT and CRP were clearly elevated in severe-to-critically ill cases compared with mild-to-moderate patients; comparing to mildto-moderate group, lymphocytes, eosinophils and albumin were greatly reduced in severe-to-critical groups (Figure 2a-q and Table 2).

Correlation networks analysis of the published variables with the severity of COVID-19
The correlation of all selected indexes were analyzed with Cytoscape software (https://cytoscape.org/). The positive indexes were shown with a red background, the negative ones were shown with a blue background, and the line with thick or ne represented the correlation of strong or weak, respectively. On hospital admission, strong positive correlations between NLR and the severity of COVID-19 were identi ed, followed by LDH; meanwhile, we observed a strong negative correlation between SaO 2 and disease severity both in the discovery and validation stages ( Figure 3).
Principal component analysis of variables for explaining proportion to the severity of COVID-19 A biplot via PCA showed the con guration of indexes on hospital admission and at end hospitalization in Figure 4. It is possible to condense several correlated variables into two composite parameters by this method which facilitates prediction of dichotomous endpoints. In this study, we rst revealed that PCA is a well-established analytical method for severity strati cation of COVID-19 patients. Plots of individual component scores for the rst principal component (PC1) versus the second principal component (PC2) were provided. On hospital admission ( Figure 4A), PC1 and PC2 showed more marked separated severe-to-critical cases from mild-to-moderate patients. Combinations of PC1 and PC2 could explain 43.4% proportion of the whole variances. Twelve predictors signi cantly associated positively or negatively with PC1, having the biggest positive and negative contribution, respectively. Notably, the spectrum of indexes in the PCA was consistent with what was shown in the correlation networks in the discovery stage (Figure 3). At end hospitalization, mildto-moderate patients and severe-to-critical cases present modest separated association ( Figure 4B). However, in the validation stage, PC1 and PC2 showed a weak separation of relations between the two groups ( Figure 4C).

Independent predictors and ROC analysis on hospital admission
The independent predictors were identi ed from previous studies [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19] . We reanalyzed whether these published predictors could be validated in the present study. As indicated in Figure 5, NLR, PLR, lymphocyte, CRP, IL-6, LDH, serum ferritin, CD4+ T cell, CD8+ T cell, high-sensitivity cardiac troponin I, albumin and Ddimer were found to be independent predictors for COVID-19 severity. ROC analysis of single independent variable was calculated using the expression levels  (Table  3), and NLR still demonstrated to be the biggest AuROC of 0.794 in the validation stage (Figure 5a-j and Figure 8a). Then, we tested the different combination of NLR and the other one risk parameter for the prediction of disease (bi-AuROC) ( Figure 6). In the discovery study, combination of NLR and SaO 2 showed the It showed that the AuROC of NLR + SaO 2 to predict COVID-19 severity was signi cantly higher than these published models, which demonstrated the excellent predictive power for the severity of disease, and there was no signi cantly difference between NLR + SaO 2 and PCA in the discovery stage (Table 5

Discussion
In the present study, NLR was veri ed to be the most independent outcome predictor in patients with COVID-19. For NLR had the biggest AuROC calculations with higher sensitivity, speci city, and CCR compared with the others signi cant published variables in both discovery and validation datasets. Usually a single index does not adequately predict disease severity, but incorporating too many variables would increase the risk of over tting the model when building prediction models as the review on BMJ described. We therefore further performed a correlation analysis on NLR and the others published indexes. The ndings of this study suggested that the combination of NLR and SaO 2 performed better predictability value than other two indexes in the severity of COVID-19 patients. In addition, PCA was identi ed to be an effective tool to predict the severity of COVID-19 cases. To evaluate the effectiveness of NLR + SaO 2 and PCA, we compared them with other predictive models in the same dataset, and validated them on an independent dataset. It concluded that NLR + SaO 2 is an appropriate and promising method for predicting COVID-19 severity, followed by PCA. Application of these models might be bene cial to delay or halt the progression of the disease.
solving the over tting, interaction and co-collinearity problems of the independent variables as the review published on BMJ mentioned. It is a data reduction technique that is better used for multivariate dimensionality reduction; however, PCA is unsuitable for clinical application for too many variables involved. NLR + SaO 2 are the simplicity of the variables combination, the absence of variables interaction and co-collinearity, and the most importantly, the clinical application is easy and quick and of great value.
This retrospective study identi ed several risk factors for COVID-19 patients. In particular, elevated levels of leukocytes, neutrophils, NLR, CRP, IL-6, lactate dehydrogenase, D-dimer and lymphocytopenia etc were more commonly seen in severe-to-critical COVID-19 illness. We found that these above risk factors were associated with the outcomes of COVID-19 and were consistent with studies published 18,22 . The pathological mechanism of COVID-19 has not been fully uncovered. Our ndings were consistent with those of numerous studies on the relationship between elevated NLR and virus infection 18,23 . Elevated neutrophil count was observed in patients with severe illness compared to those with non-severe illness 24 . Raised neutrophil count might result from secondary virus-related in ammatory factors, excessive in ammatory stress or glucocorticoids use and contribute to exacerbate disease progression in patients with COVID-19. Moreover, reduction in peripheral lymphocyte count was also commonly observed in COVID-19 cases, which was considered a possible critical factor associated with disease severity and mortality 25,26 . Lymphocyte plays pivotal roles in human immune response caused by viral infection, whereas systematic in ammation signi cantly depresses cellular immunity. In our study, lymphocytopenia was identi ed in severe COVID-19 cases.
The possible reasons is that lymphocytes are depleted, as the virus is engulfed, or that lymphocytes could be directly infected and destroyed by SARS-CoV-2, because the coronavirus angiotensin-converting enzyme 2 (ACE2) receptor was widely available in lymphocytes 27 . A low oxygen saturation (SaO 2 ) is one of the main criteria for the de nition of a severe case. SaO 2 was found as another candidate marker of progressive severity 28 and also validated in the present study. We comprehensively investigated the relationship between risk factors and the severity of COVID-19 and rst concluded that the integration of NLR with SaO 2 may lead to improved prediction. The applicable thresholds for NLR + SaO 2 were observed using the AuROC. The optimal threshold at 0.532 for NLR + SaO 2 showed superior predictive possibilities to separate severe-to-critical cases from mild-to-moderate patients, which had the highest of sensitivity and speci city and the largest of AuROC. For instance, when NLR = 7 and SaO 2 = 92, 38% of the COVID-19 patients were predicted as severe-to-critical patients. So, these patients must be closely attended by clinicians.
To gain a comprehensive view of the selected parameters, a correlation network analysis was conducted, and identi ed that a modest to high correlations were existed between variables. It provided clues for the limitation of statistical methods used for prediction models in a traditional way. PCA is multivariate techniques to examine the internal structure of complicated datasets and to explore the interrelations among variables. It is used to emphasize variation and reveal strong patterns in a large dataset and allows for the identi cation of uncorrelated eigenvectors to describe large multidimensional datasets. PCA may be valuable in studies of COVID-19 and we rst identi ed that PCA is an effective tool with which to predict the severity of COVID-19. Application of this model might be bene cial to separate severe-to-critical cases from mild-to-moderate patients. The standardized procedure of PCA in the evaluation of the severity of COVID-19 included: (1) compute the composite scores for the combined effects that accurately capture pathological expression; (2) bring out which variables have the most variation in COVID-19.
Subsequent AuROC comparisons analysis were conducted among diferent predictive models, which included NLR + SaO 2 , PCA and published models. It showed that NLR + SaO 2 and PCA could better distinguish mild-to-moderate cases and severe-to-critical patients.
Our study has some limitations. First, because this study was retrospective, other markers such as IL-1Ra, IL-10, IFN-γ-induced protein 10 (IFN-γ-10), monocyte chemotactic protein-3 (MCP-10), were not included in the study due to data unavailability. Therefore, their in uences on outcomes might be underestimated. Second, severe-to-critical cases took a large percentage in the discovery stage, but mild-to-moderate patients played a dominant role in the validation stage, which might bring about selective bias. Finally, large-scale multicenter clinical researches are needed to be further investigated.

Conclusions
The present study showed that NLR + SaO 2 is an appropriate and promising method for predicting COVID-19 severity, followed by PCA. We compared them with other predictive models in the same dataset, and validated them on an independent dataset.

Consent for publication
Not applicable.

Availability of Data and Materials:
The datasets used during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors have no con ict of interest to declare.

Consent for publication
Not applicable.          Dynamic changes or expression levels of of laboratory parameters in COVID-19 patients on hospital admission (HA) and end hospitalization (EH). Medium (P25-P75) value of the two time periods was showed in mild-to-moderate cases and severe-to-critically ill patients in the discovery stage (a-r). In the validation stage (data was lacking at end hospitalization), admission levels of variables were compared between the two groups (a-q). The signi cant difference between mild-to-moderate cases and severe-to-critically ill patients was compared using Mann-Whitney U test, χ² test, or Fisher's exact test, as appropriate. †p < 0.05, ‡p < 0.01.   Receiver operating characteristic (ROC) curves of published indexes based on literatures for the severity prediction of COVID-19. ROC curves of NLR, PLR, lymphocyte, CRP, IL-6, LDH, serum ferritin, high-sensitivity cardiac troponin I, albumin, D-dimer, CD4+ T cell and CD8+ T cell for adverse outcome prediction.