Value of laboratory tests in COVID-19 hospitalized patients for clinical decision-makers: a predictive model, using data mining approach

Purpose: Because of the rapid increase in conrmed cases of COVID-19, in particular those with severe or critical status, overwhelming of health systems is a worldwide concern. Therefore, identifying high-risk COVID-19 patients, can help service providers for priority setting and hospital resource allocation. Methods: 4542 adult patients with conrmed COVID-19 admitted in 15 hospitals in Tehran, Iran, from Feb 20 to April 18, 2020 were included in this retrospective cohort study with nal outcomes of survived and died patients. Demographic features including age and sex, and laboratory data measured at admission were extracted and compared between recovered and died patients. Data analysis was performed applying SPSS modeler software using a logistic regression method. Results: Of 4542 hospitalized adult patients, 822 patients (18.09%) died during hospitalization, and 3720 (81.90%) recovered and discharged. Based on logistic regression model, older age, 40-49 (RR= 1.80, CI: 1.13-2.87), 50-59 (RR=2.63, CI: 1.71-4.02), 60-69 (RR= 4.40, CI: 2.92-6.63), 70-79 (RR=7.49, CI: 5.01-11.19), Above 80 (RR=13.85, CI: 9.23-2.77), ALT ≥ 55 IU/ (RR=2.20, CI: 1.69-2.86), AST ≥ 100 IU/L (RR=5.93, CI: 4.75-7.39), ALP ≥ 200 IU/L (RR=2.46, CI: 1.80-3.37), sodium < 135 mEq/l (RR=1.69, CI: 1.35-2.11) or more than 145 mEq/l (RR=7.24, CI: 5.07-10.33), potassium > 5.50 mEq/l (RR=7.53, CI: 4.15-13.64), and calcium < 8.50 mEq/l (RR=3.39, CI: 2.81-4.09), CPK between 307-600 IU/L (RR=2.73, CI: 2.12-3.53) and above 600 IU/L (RR=4.41, CI: 3.40-5.71) in men, and 192-400 IU/L (RR=2.73, CI: 2.12-3.53), and above 400 (RR=4.41, CI: 3.40-5.71) in women, CRP > 3 mg/l (RR=3.22, CI: 1.99-5.20), and creatinine > 1.5 mg/l (RR=6.37, CI: 5.30-7.66) were signicantly associated with COVID-19 mortality. Conclusion: Our ndings suggested less than one in ve hospitalized patients with COVID-19 die mostly due to electrolyte disbalance, liver, and renal dysfunctions. Better supportive care is needed to improve outcomes for patients with COVID-19.


Introduction
In December 2019, an outbreak of Coronavirus disease 2019 (COVID-19) began in Wuhan, China, and has continued to spread globally [17]. COVID-19 can cause fever, cough, fatigue, shortness of breath, and high mortality due to severe respiratory symptoms [5].
The rst o cial announcement of deaths caused by COVID-19 in Iran, was made on Feb 19, 2020 [30]. As of July 15, 2020, 264,561 Iranians have been infected with COVID-19, of whom 13,410 are deceased. Due to the rapidly growing number of con rmed cases of COVID-19, there is a worldwide concern that overwhelmed health systems may face shortages of hospital beds, ICU beds, and ventilators [4], as well as burnout and fatigue [24] of healthcare professionals. Lack of resources in the health sector can lead to discrimination in the optimal distribution of medical services during pandemic diseases.
Prediction of disease severity at the time of a pandemic outbreak is one of the critical issues that in uence physicians' decisions [34]. Previous experience with other outbreaks such as MERS and pandemic in uenza has shown that identifying high-risk patients for hospitalization can help healthcare providers and emergency staffs in nding patients who would bene t the most from early, available treatments. In addition, policymakers can use this information to forecast healthcare needs [4].
There is a dearth of literature on assessing lab tests as risk factors associated with COVID-19 prognosis.
Although, an elevated amount of alanine aminotransferase (ALT), aspartate aminotransferase (AST), Creactive protein (CRP), lactate dehydrogenase, d-dimer, and low serum concentrations of albumin, sodium, potassium and calcium were reported to be associated with disease severity, unfavorable prognosis and mortality rate, most of these articles were limited to a speci c population or a small sample size. Moreover, there are still other potential explanatory variables that require further assessment [28,33,37].
To achieve a robust estimation of related risk factors, having masses of data from different countries with different health-system settings is crucial.
In this study, we analyzed the available clinical and laboratory data of 4542 patients with con rmed COVID-19, obtained from healthcare records of multiple hospitals in Tehran, Iran. Using machine learning approaches that can learn algorithms through modeling can improve the predictive models, such as prediction of disease risk factors and mortality rate [21].

Study design
The present research is a retrospective cohort study of 4542 hospitalized patients with severe symptoms of COVID-19 in 15 hospitals in Tehran, Iran. These patients were later divided into two groups: recovered and discharged patients, and deceased patients. The predictors were demographic characteristics of sex and age, along with different laboratory tests. Also, the oxygen therapy and mechanical ventilation were set as intervention variables. All data analyses were performed using IBM's SPSS modeler 18.0 software.
The SPSS modeler performed data preparation before primary analysis by replacing null values with the mean for continuous elds followed by z-transformation re-scaling. It increases the performance and accuracy of the model through machine learning and arti cial intelligence techniques [14].
The data was then separated into two sections of training (80%) and testing (20%) which provided an infrastructure for the next steps. Due to the large number of model predictors, using the training and testing dataset helps to track over tting. This is done based on the tness criteria of the testing dataset. Additionally, before the principal analysis, a feature selection or variable selection was performed to choose the most relevant model predictors [21]. Next, multivariable and univariable logistic regressions were executed to identify the risk factors of COVID-19. In nal stage of the analysis, a decision model was developed to identify high-risk patients.
This study was approved in the ethics committee of the Shahid Beheshti University of Medical Sciences (IR.SBMU.RETECH.REC.1399.007).

Data collection
Data was collected from health records of 4542 hospitalized adult patients (above 18 years old) with severe symptoms of COVID-19 in 15 hospitals in Tehran, Iran. Patients were hospitalized in the period of 20 February 2020 to 18 April 202. Their data was routinely registered on Hospital Information System (HIS) and the centralized access to them is possible through Hospital Intelligent Management system (HIM). Patients' clinical features including fever (above 38 0 C), shortness of breathing, hypoxia, chest pain, severe cough, and loss of consciousness. Patients were divided into two groups of recovered and discharged, and deceased. The de nitive diagnosis of COVID-19 was made according to Iranian Ministry of Health protocol using throat-swab specimens and the RT-PCR test. Available data included age, sex, inpatient wards, supportive interventions of oxygen therapy and mechanical ventilation, duration of hospitalization, and on admission laboratory test results. These laboratory tests were: 2) Liver function enzymes including ALT, AST, alkaline phosphatase (ALP).

Statistics
Three main steps were taken to perform data analysis. These steps were as follows: STEP 1) A Multivariable Logistic Regression (MLR) for calculating the odds ratio (OR) of risk factors A Multivariable Logistic Regression (MLR) was performed to predict any relationship between predictors and the survival outcome (recovered or deceased).
STEP 2) A univariable logistic regression for calculating the relative risk (RR) of the risk factors Predictors with a signi cant association (p-value < .05) with the MLR outcome were selected and transformed into a set of appropriate categorical variables. This step is done to identify the relationship between different levels (normal or abnormal levels) of laboratory tests and mortality rate due to COVID-19, and calculating RR by univariable logistic regression. Accordingly, some of the laboratory tests were divided into two or three categories (normal range, above, and below the normal range), and some were divided into more than three categories for their extra-outlier values.
STEP 3) Identifying high-risk patients using a decision tree model The predictors were entered into the Chi-square automatic interaction detector (CHAID) model to develop a decision tree model and identify the high-risk and low-risk patients.

Descriptive Statistics
From 20 February to 18 April 2020, 4791 adult patients with de nitive diagnosis of COVID-19 were admitted to 15 university hospitals in Tehran, Iran, 249 patients were excluded due to their missing data.
Of the 4542 hospitalized patients, who were between 18 and 97 years of age and had severe symptoms, 822 (18.09%) died during hospitalization, and 3720 (81.90%) patients recovered and discharged. The mean±SD and median [interquartile range] age of patients were, respectively, 55.55±16.84 and 56.00 [27.00] for the survived group, and 68.72±14.93 and 71.00 [21.25] for the deceased. 2473 (54.44%) patients were male. The demographic characteristics of the patients are displayed in Table 1.
The recruited patients had severe symptoms such as fever above 38 0 C, severe cough, shortness of breath, respiratory rate > 30 time/minute, hypoxia (PO2 < 93%), chest pain, and loss of consciousness. Criteria for discharge from the hospital was 72 hours without a fever and no need for an antipyretic drug, PO2 > 93%, and improvement in the clinical, and respiratory symptoms.
The mean±SD of hospitalization days for the survived and deceased groups were 4.76±4.77 and 6.45±7.21 days, respectively ( Table 1). The treatment program was identical in all hospitals, and according to the COVD-19 treatment guidelines of Iran's Ministry of Health, unless there were concurrent diseases requiring speci c treatments. Table 1: Demographic and hospitalization characteristics of the COVID-19 patients As mentioned in the method section, before entering all the independent variables in the MLR model, a feature selection algorithm was used to de ne the most important predictors associated with the target variable. According to the feature selection results, apart from RBC, platelets, hemoglobin, hematocrit, PDW, and PCO2, all laboratory tests, age, gender, and the need for special care such as oxygen therapy and ventilation for survived patients entered the model as potential variables correlated with disease prognosis ( Table 2). In this step, the SPSS modeler ignored predictors with a lot of missing value. Thus the amount of the missing value in the datasets reached under 15%. Accuracy and AUC index of the testing dataset as tness criteria was 85.87% and 85.00, respectively. So the model has been tted well, without over tting problems [14].
As shown in Table 3    , and very high risk (more than 50% MR). These risk factors were separately de ned for each age group ( Figure 1).
In the age under 40 (4.64% MR), with normal creatinine, an increase of AST ≥ 100 IU/L, with CPK ≥ 307 IU/L in male patients and CPK ≥ 192 IU/L in female patients were associated with 41.67% MR. So, this group is considered high-risk patients. Also, in this age group, we see 75.00% MR correlated with elevated creatinine levels ≥ 1.5 mg/l and an abnormal sodium level.
In the age group 40-59 (11.84% MR), high-risk group patients have been seen: Patients with creatinine ≥ 1.5 mg/l that have 41.52% MR, patients with hypocalcemia, and AST ≥ 100 IU/L that have 56.08% MR, patients with creatinine levels ≥ 1.5 mg/l and hypocalcemia that have 46.67% MR, and patients with creatinine levels of ≥ 1.5 mg/l and AST ≥ 40 IU/L that have 59.55% MR, hence regarded as high-risk COVID-19 patients.
In the age group 60-69 (18.69% MR), elevated AST ≥ 100 IU/L with CPK ≥ 307 IU/L in male patients and CPK ≥ 192 IU/L in female patients were associated with 67.57% MR.
In the age group 70-79 (31.47% MR), six groups of high-risk patients have been seen: Male patients with CPK ≥ 307 IU/L and female patients with CPK ≥ 192 IU/L have 5.00% MR.

Discussion
This study is the rst attempt to investigate the risk factors of COVID-19 in 4542 adult patients in Iran.
Using data mining methods, we identi ed the relationship between a large number of laboratory tests and the risk factors without any over tting problems. According to the results of MLR, and ULR analysis, AST, ALP, ALT, calcium, sodium, potassium, creatinine, CPK, and CRP were correlated with the risk of death in COVID-19 patients. Furthermore, in the decision tree model, high risk patients were identi ed.

Age
In this study, we found a strong relationship between age and COVID-19 Mortality Rate. Results of the present study con rm previous studies that older age is an important risk factor of death in patients with COVID-19 [27]. It can be assumed that age-dependent defects in humoral and cellular immune function could lead to a decrease in number of T cells, phagocytosis and levels of interferon [25].

Liver enzymes
According to the results, elevated AST, ALT, and ALP of above the normal range were associated with increased risk of death in COVID-19 patients in all age groups. Liver injury and subsequently abnormal levels of AST, ALT, and ALP are prevalent in patients with COVID-19 during disease progression and longer hospital stays [6,11]. A serum marker of mitochondrial damage in hepatocytes is high levels of AST. In severe cases of patients with COVID-19, liver damage is associated with liver hypoxia/reperfusion injury and this could induce mitochondrial apoptosis, triggering cell injury, necrosis, and nally elevated AST levels [2]. Abnormal levels of liver enzymes can be caused by viral infections such as COVID-19 [35], consumption of hepatotoxic drug or by an immune-mediated in ammatory response during treatment.
For this reason, more intensive care and speci c therapeutic approaches are needed for severe patients with COVID-19 who have pre-existing liver diseases, and more especially, older patients [36].

Serum electrolytes
Abnormal serum electrolytes (calcium, potassium, and sodium) were also among the risk factors associated with COVID-19 MR. Concerning serum calcium, the herein results are consistent with previous reports on COVID-19 which have stated that patients with hypocalcemia have more severe symptoms and mortality rates than others [28]. This is attributable to the role of calcium in immune response development [12]. It has been demonstrated that calcium homeostasis alteration within the cell could promote the activation of in ammatory pathways leading to increased levels of IL-1b, TNF and IL-6, linked to lung cell damage and edema accumulation [22]. On the other hand, hypocalcemia is associated with higher incidence of organ injury, hyperproteinemia, unbalanced vitamin D and parathyroid hormone.
Therefore, early detection and correction of Ca levels of patients with hypocalcemia may lead to better outcomes as well as a reduction in the severity of symptoms [10].
Moreover, the results showed that both hyponatremia and hypernatremia were correlated with high COVID-19 mortality rate. Lippi et al. reported that in patients with COVID-19, serum electrolytes levels including sodium, is decreased [19]. In viral infections, low sodium levels may cause tissue damage and increase the severity of the viral disease [3]. Previous studies show that sodium is an essential factor regulating the expression of angiotensin-converting enzyme-2 (ACE2) in the body. Also, COVID-19 enters the host cell by connecting to this receptor, and hyponatremia may cause an overexpression of the ACE2 receptor in the long-term. Consequently, hyponatremia may cause more severe conditions of COVID-19 [20]. In addition, one of the clinical symptoms of COVID-19 is diarrhea and water loss [13], which can cause hypernatremia in the body [9]. It has seen that hypernatremia and hyponatremia had some adverse effects on various organ functions such as the central nervous system and, in turn, an increased mortality rate [16].
High-level serum potassium is another risk factor which is found to be associated with COVD-19 mortality rate. Hyperkalemia may be a result of other underlying diseases such as kidney dysfunction, and can lead to cardiac arrest. Evidence also suggests that immunode ciency against viral infections could lead to hyperkalemia [23]. However, hyperkalemia, for whatever reason, is dangerous in itself and must be carefully monitored.
These ndings emphasize electrolyte monitoring of patients with COVID-19 and their role in appropriate management of severely ill patients.

CPK
Elevated CPK is another risk factor that is shown to be associated with a high mortality rate in the patients. Increased CPK is caused by damage to muscles (mainly skeletal muscles) [8]. Previous studies show that patients with in uenza type A have increased CPK levels due to skeletal muscle involvement [15]. Also, it has been reported that rhabdomyolysis and elevated CPK can occur in COVID-19 patients, too [29]. The pathogenesis of viral-induced rhabdomyolysis has several possible mechanisms: i) direct viral invasion; ii) cytokine storms and damaging muscle tissue following immune response; and iii) destroying muscle cell membrane due to direct action of viral toxins. The mechanism of COVID-19-induced rhabdomyolysis, however, has not yet been understood [1]. The only known fact is that elevated CPK ≥ CRP Finally, the results show that elevated CRP of above 8 mg/dl, is associated with an increase in COVID-19 mortality rate. Hence, and in parallel with previous studies, elevated CRP, as an in ammatory biomarker, could indicate lung damage and therefore represents the severity of the respiratory tract involvement in patients with COVID-19 [31].

Creatinine
Aligned with the results of this study, recent reports have shown that an elevated level of creatinine in patients with COVID-19 is seen and associated with mortality rate. These results demonstrated that creatinine levels can be a predictor of renal impairment, one of the most prominent causes of death among patients [32]. Furthermore, Cheng et al. demonstrated that a high level of creatinine increases the probability of admission to the intensive care unit, and these patients have a higher risk of deterioration [7]. ACE2 receptors expression in human kidneys and bladder could be a potential binding route of Coronavirus [18]. Since researchers have isolated Coronavirus from urine samples of some patients, the association of kidney impairment with COVID-19 appears to be possible [7].
In this study, the risk factors of COVID-19 were analyzed through logistic regression, and nal predictors were identi ed displaying robust results. According to the results, abnormal levels of some laboratory tests appeared in patients with COVID-19, which can be due to non-respiratory multi-organ involvements seen during the course of COVID-19. These include the liver, heart, gastrointestinal tract and kidney [6]. These abnormalities are not speci c to the novel Coronavirus infection [26], nevertheless prompt action for their management as potential risk-factors for COVID-19 severity may reduce its morbidity and mortality rate.
Considering the fact that there is currently no approved treatment for COVID-19, our ndings support the idea that in patients with COVID-19, identifying high-risk patients and providing supportive care to prevent organ damage and homeostasis abnormalities have a crucial role in the nal outcome of the therapeutic interventions.

Limitations
As a retrospective cohort study, this research had some inherent limitations. Some patients had incomplete data in their medical records. Also, slightly different in-hospital procedures led to a lack of order or records of some laboratory tests. A large sample size of the study, however, may overcome these drawbacks.

Declarations
Funding: This study has no funder.