Development and Validation of a multivariable prediction model using Machine Learning to predict the outcome of admitted COVID-19 patients at the time of admission

Data of 1393 admitted patients (Expired – 8.54%) was collected from six Apollo Hospital centers (from April to July 2020) using a standardized template and electronic medical records. Over 50 Clinical and Laboratory parameters were studied based on the patient’s initial clinical state at admission and laboratory parameters within the first 24 hours. The Machine Learning (ML) modelling was performed using Gradient Boosting Algorithm. ‘Time to event’ using Cox Proportional Hazard Model was used and combined with Gradient Boosting Algorithm. The prospective validation cohort was selected of 977 patients (Expired - 8.3%) from six centers from July to October 2020. The Clinical API for the Algorithm is being used prospectively.


Introduction
In Coronavirus disease 2019 (COVID- 19), early identification of patients with a high risk of mortality can significantly improve triage, bed allocation, timely management, and possibly, outcome.
The study objective is to develop and validate individualized mortality risk scores based on the anonymized clinical and laboratory data at admission and determine the probability of Deaths at 7 and 28 days.
Methods: Data of 1393 admitted patients (Expired -8.54%) was collected from six Apollo Hospital centers (from April to July 2020) using a standardized template and electronic medical records. Over 50 Clinical and Laboratory parameters were studied based on the patient's initial clinical state at admission and laboratory parameters within the first 24 hours. The Machine Learning (ML) modelling was performed using Gradient Boosting Algorithm. 'Time to event' using Cox Proportional Hazard Model was used and combined with Gradient Boosting Algorithm. The prospective validation cohort was selected of 977 patients (Expired -8.3%) from six centers from July to October 2020. The Clinical API for the Algorithm is http://20.44.39.47/covid19v2/page1.php being used prospectively.

Conclusion:
The model for Mortality Risk Prediction provides insight into the COVID Clinical and Laboratory Parameters at admission. It is one of the early studies, reflecting on 'time to event' at the admission, accurately predicting patient outcomes.

Strengths
1) The study is conducted with Indian data, a retrospectively trained model with its prospective validation following appropriate Institutional Ethics Committee approval.
2) The study results are congruent with International Literature for Clinical and Laboratory Parameters studied at the time of admission 3) The developed model -AICOVID and its corresponding Applicationis being utilized for triaging, allocating resources, putting patients into an appropriate clinical protocol, and patient and family education at admission Limitations 1) The study did not include the radiology imaging at admission. Specific laboratory parameters like D-dimer and Interleukins could not be studied due to the scarcity of data at admission. 2) The model is developed with predominantly the data from Indian centers of a large healthcare system. It would require further validation and calibration before being extrapolated to other geographies and ethnic populations.

| INTRODUCTION
The current COVID -19 pandemic caused by SARS-CoV-2 is associated with high mortality and morbidity. [1] In India, over 10 million individuals have been affected by the virus (till Mid January), with over 150 thousand people losing their lives, at a mortality rate of 1.44%. [2] However, the 30 days mortality rates at tertiary care hospitals in the US are far higher at 9.06% to 15.65% [3]. Various studies have been conducted to determine the mortality risk factors in COVID 19 [4 -6]. Understanding the clinical and laboratory predictors at admission can lead to appropriate determinants of mortality and improve triaging, bed and resource allocation, and improved patient management throughout health systems.
The datasets of COVID-19 patients can be integrated and analysed by Machine Learning (ML) algorithms to improve diagnostic speed and accuracy better and potentially identify the most susceptible people based on personalized clinical and laboratory characteristics [7]. These methods activate early insights of patient's outcome with the predictors at the time of admission. Existing studies have used Machine Learning Algorithms (MLA) to determine COVID -19 mortality. [8 -10] Due to the absence of similar studies in the Indian population, this research work was undertaken to develop and validate MLA based on the anonymized clinical and laboratory data to predict the outcome (Expired or Recovered) from retrospective evaluation of patients admitted with COVID. Additionally, the algorithm determines the probability (risk) of Events (defined as Death or Expiry of Subjects), predicting mortality at 7 and 28 days. Secondarily this would provide clinical insights on various clinical and laboratory parameters. These are clinically and statistically relevant and help develop a Clinical API (application and programming interface) tool used by clinicians taking care of admitted patients even in lowcost settings.

| METHODOLOGY
This study is designed as a multicenter, retrospective -prospective, observational, noninterventional study.
Source of data -The retrospective data (for the development cohort) is collected from the anonymized clinical and laboratory records at admission from the discharge summaries of the patients and a standardized template (Annexure 1) from six Apollo Hospitals in India for the period of April to June 2020. These Hospitals are from Bangalore Chennai, Delhi, Hyderabad, Kolkata, and Navi Mumbai. On the prospective validation arm, the data elements at admission were provided by the site investigator in the electronic template (API - Figure  4). Additional data were collected from discharge summaries, and the standardized template described. For discharged summaries, we used appropriately coded ICD-10 diagnosis data. The validation cohort was collected from the same six hospitals in the period between August to October 2020.  Participants -The participants were admitted to the hospital with symptoms and history suggestive [11 -12] of Coronavirus Disease (COVID) with subsequent laboratory confirmation through Reverse Transcription Polymerase Chain Reaction (RT-PCR tests). The data were collected during admission at Emergency Room and following admission (within 24 hours). The eligibility criteria for inclusion of the patients includeda) patients presenting with COVID -19 related symptoms (with or without a history of contact/travel to geographical hot spots in the community), b) subjects who comply with category B2 / C in the Apollo Hospitals COVID protocol (See Annexure 2). The exclusion criteria were a) patients admitted for other disease conditions and were subsequently found to have COVID during the hospital stay. The study did not include any specific intervention or treatment provided to the patient during admission or within its 24 hours.
Outcome -The study's primary outcome was to develop comparable models with improved accuracy parameters, which would yield a risk predictor for mortality (in the next 7 and 28 days) at admission. Each predictor (clinical and laboratory variables) are studied for their odds and hazard ratios.

Predictors
The Clinical Variables included patient's basic information, including Age and Gender, Exposure and Travel history, and the number of days of symptoms before admission. It also included different symptoms like fever, cough, respiratory distress (shortness of breath), etc. Sample size The study included a total population of 2370 patients, including 1393 in the Development Cohort and 977 in the Validation cohort. The sample size was determined using an estimated 10 million COVID Cases in India (October 2020) at 95% Confidence Level and Confidence Interval 2.

Missing data
The initial development cohort was 1435. Forty-two patient's data were dropped owing to missing fields. No imputations were used in the development or validation cohort. As described earlier, certain laboratory predictors were not selected due to the smaller sample size and missing values.

Statistical analysis and Modelling approach
The clinical and laboratory parameters were selected based on the odds ratios of the initial cohort. The parameters were subsequently run through the Propensity Matching for the binary classification of Event (Death -1) and Non-Events (Survival -0). The population is randomly divided into training (70%) and test (30%) in the development model. The 23 parameters were then put through the three models for maximizing the K-fold crossvalidation AUC using Python (3.7) to determine the performance of logistic regression, random forest models, and gradient boosting algorithm.
However, we considered the XGB model, as the function of this model is an approximation of the data distribution considering the errors: Where y i is the predicted value and X i are the input values. F 1 (x i ) is a function, and the relationship between X and y is not fully described.
1) We initialize the model by solving the following equation for the 23 input parameters : F 0 (x) = arg min ∑ L(y i n i=1 , γ); then we get Where n is the total number of observation, i.e., 1393. F 1 (x i ) is function a weak learner, and the relationship between X and y is not fully described 2) For no of iterations -m = 1 to M Gradient with respect to predicted value, By solving this equation we can get: squared error is used as the loss function, and the gradient of the loss function can be calculated as follows: Python language is used to code the program. Python ML packages were used, namely Sklearn, numpy and pandas library is used for this work. The code snippet for GBM model = Gradient Boosting Classifier( n estimators=1997, learning rate=0.2, max depth=5, random state =42) After training model and successful testing, the model is pickled and saved. This pickled model is hosted and served as the back end for requests from the front end user. The user's values will act as input for the model, and the predicted response would be output.

Risk Stratification
On the input of the individual data to the algorithm, the machine returns the value in percentage of the risk of mortality in the next 28 days. The risk thresholds between 0 to 15% are associated with low mortality rates (<1%), while the moderate risk category 15 to 30% had 1-5% Mortality and high-risk category (>30%) at >5% Mortality. Further, the Cox Proportional Hazard model's addition returned the probability of mortality in 7 and 28 days and has been used to display in the output.

Predictors Analysis
Hazard Ratios are calculated for each predictor, based on their accepted clinical and laboratory thresholds, as applicable. (See Table 1 for Thresholds) Kaplan Meir curves and Violin plots were used to analyze the effect of individual variables on overall mortality.

Performance Evaluation
All models are evaluated based on their ability to discriminate between outcomes for the development cohort and the Gradient Boosting Model for the validation cohort with the corresponding confidence intervals (CI). The AUC, accuracy, sensitivity -specificity, precision, predictive value, and likelihood ratios are computed for validation cohort with a standard threshold.

Patient and Public Involvement
The study has been conducted with retrospective data from April -July 2020, and hence development of the research question and outcome measures were not directly communicated to the patients in the development cohort. However, in the validation cohort, they were informed about the study design and outcome.

Clinical and Laboratory Variables at Admission (Participants)
The average observed mortality rates are 8.54% (N = 1393) in the development cohort and 8.3% in the validation cohort (N = 977) for six different hospitals in April to July 2020 and August October 2020, respectively.

Comparison
When the Gradient Boosting algorithm is compared with the other models, the comparative results are given in Figure 4. There is an overall improvement in AUC ROC Scores and Sensitivity in the Gradient Boosting Model compared to the other two models studied  Table 2 Hazard Further to the Hazard Ratios analysis of individual clinical and laboratory predictors, we looked at the predictors' feature in survivability analysis using Kaplan Meier (KM) Plots. KM Plots were prepared for both development and validation cohorts and studied for Comorbidities like Diabetes, Hypertension, and existing Heart Diseases (Coronary Artery Diseases). Heart disease has a seemingly better outcome in the validation cohort, probably due to early intervention and education among patients. In the validation cohort, we can see a considerable change in -Respiratory Distress, Respiratory Rates > 24 /min, and Oxygen Saturation below <90%, which reflects that despite best efforts, patients with significant respiratory failure at admission continue to have a poorer prognosis in 28 days. In Laboratory parameters, LDH, INR, Ferritin, and Red cell Distribution Width show almost similar trends, while low Lymphocyte% contributes higher mortality predominantly in Development Cohort. Figure 5.  The model considers Patient's Age above 60 years as an important predictor and High Risk of Mortality, significantly shown here with higher hazard ratio in the validation cohort (2.31; CI 1.52 -3.53). [13] Interestingly, CDC (Dec 2020) finds similar range of mortality comparison between patients below and above 65 years in US population. [14] Further, male gender has hazard ratios (1.27, CI 0.7 -2.23) signifying higher risk of mortality in male population, which is congruent to other international studies [15] where odds of death has been (1.39; 95% CI 1.31 -1.47) comparable. Respiratory Symptoms like distress, higher rate of respiration (>24/min) or silent hypoxemia detected through lower oxygen level (<90%) have shown higher risk of mortality (see Table 2) and find their due weightage in the Gradient Boosting algorithm. The reason to include seemingly similar attributes as they represent different aspects of symptoms and vital physiological parameters at admission. This has also been studied extensively in US patients [16]. The study does not include the High and Moderate Risk patients' subsequent mechanical ventilation and their overall outcome. However, the Kaplan Meir plots ( Figure 5) show outcomes for patients with severe respiratory symptoms at both development and validation cohorts.
The pooled prevalence of Diabetes and Hypertension are 26% and 31.22%, respectively, in the development cohort. The odds of mortality in these comorbid conditions are provided in Table 1, comparable with the meta-analysis conducted by Kumar et al. [17].  [20 -21] In lab features conducted during admission, lymphopenia (Lymphocyte <12%) had a hazard ratio (1.99, CI 1.23 -3.2), which is consistent with the systemic review and meta-analysis [22]. Other significant factors included pro-inflammatory markers like LDH (cut off ->250 U/L), Ferritin (cut off ->450 microgram/L), and CRP (cut off -> 48 mg/L) (See Hazard Ratio - Table 2). This is consistent with various international studies with slightly modified cut-off values [23 -25]. Though the study did not attribute different features like Neutrophil -Lymphocyte Ratio or CRP / Albumin Ratio, it did look into the Hazard Ratios separately. [25] As discussed above, the model takes all the above features, reasonably congruent with the published international studies, and provides the prediction using a Gradient Boosting method. The Gradient Boosting method is a popular technique that recursively fits multiple predictors and a weak learning system with the residual to increase the model's accuracy using various iterations. The model inherently identifies the complex data structureincluding their interaction and nonlinearity in the context of multiple predictors. For further use, the model would be calibrated with the data provided in the API in Figure 6 and continuously improve over a period of time. The Risk Model is available for clinicians at http://20.44.39.47/covid19v2/page1.php Figure 6 -The API Generated from the Algorithm which provides the Risk Score for an individual based on their Clinical and Lab Parameters at admission Implication All measures have been taken to reduce bias, including socio-economic aspects, as Apollo Hospitals admitted patients from all society sections with moderate to severe diseases. Potential clinical use of the Clinical API developed from the AICOVID Algorithm is currently being used in Apollo Hospitals. It can be used in the Indian Subcontinent hospitals with the current accuracy and precision. In the prospective validation study, patient consents were obtained through an appropriately designed consent form (Annexure 4), paving the way for further research of using consent in Clinical AI-enabled tools and their appropriate patient information.
Similar models have been created at MIT Sloan School [10] and Tencent AI Lab [27] using 7 and 10 clinical features and similar accuracy and precision. However, these tools are more specific and accurate to the population whose data is being used to build the models. The steps of the research is conducted in accordance with TRIPOD Checklist. [28] Limitation The model is prepared with Clinical and Laboratory features that are available at the time of admission in Emergency Room or other clinical settings. One of the study's limitations is that it does not include imaging tests done at the time of admission. Due to logistical issues, certain clinical parameters like Body Mass Index (BMI) and follow-up information on patient's ventilation details were not obtained. Furthermore, due to the unavailability of adequate datacertain laboratory markers like D-dimer and Interleukins were excluded. The research team is currently undertaking the analysis of the follow-up care of these patients (survivors).
Geographical and Ethnic acceptance -The Apollo Hospitals included in the study are from Bangalore Chennai, Delhi, Hyderabad, Kolkata, and Navi Mumbai. This provides comprehensive coverage of the Indian population, looking at possible zones. However, further studies are required for validation and calibration purposes in Western and Eastern (including North Eastern) zones. Beyond India, further research is needed to calibrate the model when used in other population like the US, Europe, Middle East, and South East Asia