Scores based on neutrophil percentage and lactate dehydrogenase with or without oxygen saturation predict the risk of hospital mortality in severe COVID-19 patients

Background. Risk scores are urgently needed to assist clinicians in predicting the risk of death in severe patients with SARS-CoV-2 infection in the context of millions of people infected, rapid disease progression, and shortage of medical resources. Method. A total of 139 severe patients with SARS-CoV-2 from China and Iran were included. Using data from China (training dataset, n = 96), prediction models were developed based on logistic regression models, nomogram and risk scoring system for simplication. Leave-one-out cross validation was used for internal validation and data from Iran (test dataset, n = 43) for external validation. Results. The NSL model (Area under the curve (AUC) 0.932) and NL model (AUC 0.903) were developed based on neutrophil percentage (NE), lactate dehydrogenase (LDH) with or without oxygen saturation (SaO 2 ) using the training dataset. Compared with the training dataset, the predictability of NSL model (AUC 0.910) and NL model (AUC 0.871) were similar in the test dataset. The risk scoring systems corresponding to these two models were established for clinical application. The AUCs of the NSL and NL scores were 0.928 and 0.901 in the training dataset, respectively. At the optimal cut-off value of NSL score, the sensitivity was 94% and specicity was 82%. In addition, for NL score, the sensitivity and specicity were 94% and 75%, respectively. Conclusion. NSL and NL score are straightforward means for clinicians to predict the risk of death in severe patients. NL score could be used in selected regions where patients’ SaO 2 cannot be tested. LDH, total bilirubin (Tbil), direct bilirubin (Dbil), ALT, AST, total protein, albumin (ALB), activated partial thromboplastin time (APTT), prothrombin time (PT), D-dimer, CRP, blood urea nitrogen (BUN), serum creatinine (Cr), creatinine clearance (CCr), blood glucose, creatine kinase isoenzymes (CKMB), high density lipoprotein (HDL), low density lipoprotein (LDL), total cholesterol (TC), triglyceride (TG), Lipoprotein, Apolipoprotein A (ApoA), Apolipoprotein B (ApoB), serum potassium (K), and serum sodium (Na)). HDL, LDL, TC, TG, Lipoprotein, ApoA, ApoB, HGB, and HCT were not collected in the Iranian population. Information about treatment during hospitalization (antiviral therapy, antibacterial therapy, corticosteroids, and immunoglobulin therapy) and outcome (in-hospital death) were also collected.


Introduction
Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), which is also called coronavirus infectious disease 2019 (COVID-19), a highly contagious and fast-spreading infectious disease, is only four months old but is already spreading in many countries with millions of people being infected [1].
The clinical spectrum of COVID-19 ranges from mild to critically ill cases according to the largest cohort study (44,672 persons with COVID-19) from China [2]. This disease can progress rapidly into acute respiratory distress syndrome (ARDS), multiorgan failure, and even death during the later stages in some severe cases [2][3][4][5][6]. Clinicians should be aware that some serious patients may deteriorate rapidly after admission.
Since the outbreak of COVID-19, researchers and clinicians are acting quickly, but perhaps not fast enough to as compared to the rate of this disease. Previous studies had identi ed that lymphopenia, neutrophilia, elevated serum alanine aminotransferase (ALT), aspartate aminotransferase levels (AST), lactate dehydrogenase (LDH), D-dimer and C-reactive protein (CRP) may be associated with disease progression and death [3-5, 7, 8], however, there is no easy-to-use risk-scoring system for the risk of death in severe patients. Currently, clinicians urgently need a convenient clinical risk assessment tool to assist them in predicting the risk of hospital mortality, in order to select the time and method of medical intervention and to evaluate the effectiveness of treatment strategies.
Therefore, in the current study we utilized data from severe patients with con rmed COVID-19 who were admitted to hospitals in China and Iran to establish straightforward and user-friendly prediction models for clinicians to predict the risk of in-hospital death in severe patients with COVID-19.

Patient population
This multicentric retrospective observational study was based on two datasets of severe patients with con rmed SARS-CoV-2 infection selected by the same criteria [9] from 2 medical centers (West Branch of Union Hospital a liated to Tongji Medical College of Huazhong University of Science and Technology) in China and (Tabriz University of Medical Sciences) Iran. The patients' data from China was used as the training dataset to establish models in predicting the risk of hospital mortality, whereas the patients' data from Iran was used for external validation of the prediction models. As shown in the Figure 1. All severe patients with con rmed SARS-CoV-2 infection in training and test datasets were included if they were adults. Pregnant patients and patients with human immunode ciency virus infection were excluded. This study was approved by the Ethics Committees of all participating hospitals in China and Iran.

Statistical analysis
Continuous variables are reported as means±standard error (SE). Unpaired t-test or the Mann-Whitney test was used to compare two groups of data. Categorical variables are expressed as counts and percentages; Chi-square or Fisher's exact tests were used for comparisons of categorical factors. Feature selection was performed to select the suitable variables to establish the prognostic model using the information gain method. Information gain was calculated by comparing the entropy of the data before and after transformation [10]. Factors with attributes of variables >0.2 were selected for modeling. The establishment of death risk models were based on multivariable logistic regression models using training dataset. The predictive accuracy for the prognostic accuracy of hospital mortality of severe patients was calculated using receiver operating characteristic (ROC) curves. When the sensitivity, speci city and area under the curve (AUC) were basically similar between different models, we selected models for further analysis based on the premise of minimizing the number of factors included in the model. Validity assessment of the predictive models was conducted using internal and external validation. We used leave-one-out cross-validation method for internal validation to limit model over-tting and to assess predictive potential [11]. In external validation, models developed in the training dataset were applied on the test dataset to assess the predictive performance of models. We used calibration plots to show the goodness-of-t of models and plotted nomograms to facilitate the clinical application of both models. In order to simplify the computation of in-hospital death risk estimate, we develop risk scores based on the points system from the Framingham Heart Study methodology [12]. All statistical analyses were performed using STATA (Version 13.0, IBM, New York, USA) and Orange (Version 3.24.1, USA).

Result
Characteristics of the study population There were 96 patients from China in the training dataset and 43 patients from Iran in the test dataset.
The mean age of patients in the training and test datasets were 63.47 and 63.37 years, respectively. The patients in the two datasets differ in several characteristics at the time of admission (Table 1). In total, 49 (51%) male patients in the training and 30 (69.8%) male patients in the test dataset (P=0.039). There were more patients with fever (89.6% versus 46.5%), fatigue (89.6% versus 42.2%) and diarrhea (20.8% versus 2.3%) in the training dataset compared to those in test dataset. in addition, patients in the training dataset had faster respiratory rates (27.24 versus 22.76) than those in the test dataset. The proportion of deaths in the two data sets (32.3% versus 30.2%) was roughly the same.
Feature selection Figure 2 shows the results from information gain ranking, the top 8 of the available 60 variables (LDH, NE, SaO2, LY, NLR, CKMB, D-dimer, and CRP) were selected for modeling according to the criteria (information gain > 0.2). As shown in Supplementary Figure 1A, LDH, NE, SaO2, NLR, CKMB, D-dimer, and CRP were signi cantly higher and LY was lower in the severe patients who died during hospitalization compared to patients who did not die.
Derivation and validation of NSL model and NL model When used individually to predict the risk of death, AUCs of top 8 ranked variables range from 0.763 to 0.880, sensitivities range from 73% to 100%, and speci cities range from 51% to 88% ( Table 2). Each of these indicators had a good prediction ability for the risk of death, but there were some exceptions, such as some patients with normal indicators who also died during hospitalization, so integrated prediction models were needed to reduce the defects of a single indicator in predicting death risk.
In the modeling, we tried to use as few variables as possible to facilitate clinical application. Because the NE and LY had a reciprocal relationship and integrated models were based on the logistic regression method, we established three model groups depending on whether the NE, LY, or neutrophils/lymphocytes ratio (NLR) was added to the model. AUCs of all integrated models range from 0.903 to 0.948, sensitivities range from 77% to 97%, and speci cities range from 77% to 97% (Table 2) Compared with the training dataset, NSL model (AUC 0.910; sensitivity 92% and speci city 96%) and NL model (AUC 0.871; sensitivity 92% and speci city 82%) provided similarly accurate predictability of inhospital death in the test dataset (Table 2 and Supplementary Figure 1C).

Nomogram prediction for in-hospital death of severe patients
In order for clinicians to easily calculate the risk of mortality using the NSL model or NL model, we created two nomograms to provide graphical depictions of all indicators in the NSL model and NL model, respectively ( Figure 3A,B). In both the training and test datasets, the calibration plots of nomograms were consistent between the predicted risk and the observed probability of death ( Figure 3C-F). The Hosmer-Lemeshow tests for NSL model and NL model were not signi cant (P=0.47 and P=0.45), suggesting the NSL model and NL model were correctly speci ed for the prediction of in-hospital death from COVID-19.

Development of risk scoring system for predicting in-hospital death
In addition to providing a nomogram to help clinicians predict the mortality risk of severe patients, we also developed two risk scoring systems based on NSL model and NL model. As shown in Table 3, simple point systems were developed based on the logistic regression coe cients (Supplementary Table 1). and reference values for each signi cant risk factor ( Table 3). The NSL risk score included NE (16 points), SaO 2 (9 points), and LDH (9 points). The total points ranged from 0 to 34. With an increasing total points, the risk of death increased. Points of 0-13 were associated with a less than 10% risk of death and points of 14-20 with a 10-50% risk of death. Finally, points above 20 were associated with an extremely high risk of death over 50%. The cut-off of the NSL risk score for the prediction of death in training dataset is 15 (sensitivity 94% and speci city 82%, Supplementary Table 2). The AUCs of the NSL risk score were 0.928 and 0.901 in the training and test dataset, respectively. In addition, the NL risk score included NE (16 points) and LDH (9 points). The score ranged from 0 to 25. The AUCs of the NL risk score were 0.895 and 0.857 in the training and test dataset, respectively. Points of 0-9 were associated with a less than 10% risk of death, points of 10-15 with a 10-50% risk of death, and points above 16 were associated with an extremely high risk of death over 50%. The cut-off of the NL risk score for the prediction of death in training dataset is 12 (sensitivity 94% and speci city 75%, Supplementary Table 2). In clinical practice, clinicians can calculate the risk scores of each patient at admission based on the points provided in Table 3 and Table 4.

Discussion
To our knowledge, this is the rst study to develop in-hospital death risk scoring systems in severe patients with COVID-19 from China and Iran. The NSL score and NL score described in this study are easy to understand and to use. These two risk scores make it easy for clinicians to predict the risk of death in severe patients and avoid the in uence of personal bias in the course of evaluation. In some regions where medical resources are scarce, the NL score enables medical staffs to predict the risk of death of severe patients with only NE and LDH at the time of admission, which will greatly improve the e ciency of medical resource allocation. The NSL score and NL score developed in a dataset of Chinese patients and was validated in another dataset of Iranian patients. There were several differences in the clinical characteristics of the severe patients in the training and test datasets, but this enhances the reliability of our risk scores, which provides similar predictability across different patient populations.
Lymphopenia, neutrophilia, LDH, D-dimer and CRP may be related to the progression of the disease according to previous studies [3-5, 7, 8]. Among these factors, elevated D-dimer and lymphopenia have been reported to be associated with death [3,4,7]. An SaO 2 rate below 93% (normal range is 95% to 100%) has long been considered a sign of underlying hypoxia and impending organ failure [13,14]. For COVID 19, SaO 2 is also a good indicator for the disease progression [15], which was also con rmed in our study. Previous study found that higher SOFA score, older age, and D-dimer greater than 1 μg/mL at admission were associated with increased risk of death, which could help medical staffs to assess the prognosis of patients [3]. In addition, Ji et al. established a risk score (CALL) based on patients' age, lymphocyte count, serum LDH levels and comorbidities at admission, which could help medical staffs to identify patients with a high risk of disease progression [5]. Outside of the CALL risk score to quantitatively predict risk of disease progression, clincians lack a relevant scoring system to quantitatively predict the risk of death in severe patients. This may lead to an underestimation of the risk of death in some severe patients, resulting in delays in treatment and unnecessary mortality.
Here, in the establishment of the predictive models, we utilized the feature selection method of machine learning and also considered the needs of clinicians. We established two risk scores (NSL score and NL score) only based on NE, SaO2 with and without LDH concentration at admission. An NSL score ≤ 11 is associated with a risk of death is less than 5%, whereas NSL score > 15 and particularly > 20 indicated an increased risk of death, requiring urgent support symptomatic treatment, and careful surveillance for these patients. In particular, the cut-off point of 20 in NSL score offered 71% sensitivity and 94% speci city for death risk prediction in training datasets and 92% sensitivity and 82% speci city in the test dataset. For some regions without lappropriate access to tests for LDH concentrations in patients, the NL score can also be used to predict the risk of death with high risk prediction accuracy. NL score ≤ 8 is associated with a risk of death is less than 5%, whereas NL score > 9 and NL score > 14 indicated the risk of death exceeded 10% and 40%, respectively.
Our study has a few limitations. Firstly, the machines and methods used in China and Iran to detect serum LDH concentrations are different, so the normal range of LDH concentrations is slightly different. In China, LABOSPECT 008 α Hitachi Automatic Analyzer (Hitachi High-Technologies Corporation, Japan) was used to detect serum LDH concentrations, while in Iran, LDH Cytotoxicity Detection Kit (Roche, Germany) was applied to detect serum LDH. Secondly, the sample size is relatively small, especially the test data from Iran. Finally, due to the limitations of data, we did not analyze the effects of different medical interventions on prognosis.
In summary, The NSL score and NL score are straightforward means for clinicians to predict the risk of death in severe patients and avoid the in uence of human factors in the course of evaluation.  Table 3. Algorithm to estimate risk for hospital mortality using total points for risk scores with logistic regression analysis in the severe patients with COVID-19 from training dataset.

Figure 3
Nomograms for integrated models to predict hospital mortality and d the corresponding calibration plots Nomgrams of the NSL model (A) and NL model (B) to estimate the risk of death in severe patients with