Competing risk between in-hospital mortality and recovery: An application of DeepHit on COVID-19 clinical data

Background On 11, 2020, WHO characterized COVID-19 as a pandemic. A plethora of studies on this pandemic is being carried out using various statistical and mathematical models. Though most of them are focusing on building predictive models, concentrating on the length of hospital stay can improve decision making and treatment plans. While modeling the length of stay, possible outcomes observed are either discharge or mortality. Objective The study aimed to analyse the survival data of COVID-19 patients with the competing risk methodology. The DeepHit, a deep traditional statistical


Introduction
The coronavirus disease 2019(COVID-19) outbreak is unfortunately still under progression. The virus has affected most countries across the world. The severity of the disease is associated with a large number of co-morbidities. From the previous studies, it has been found out that gender, age, and pre-existing chronic diseases such as cardiovascular disease, cancer, respiratory disease, etc. are associated with increased risk for adverse outcomes. 1,2 In addition to this information, it is important to focus on the length of hospital stay and multiple outcomes of the COVID-19 patients. Because of the lack of clinical data, not many studies have come out with these objectives.
Survival analysis considers the problem of analyzing data where the target variable is time until the occurrence of an event. Events could be anything such as death, hospitalization, hospital discharge, etc.
The challenge underlying the scenario is to learn the model parameters from survival time data while handling censoring. The concept of censoring comes into the picture since the survival time is unknown for some of the subjects due to lost to follow-up or culmination of the study period. 3,4 The common objective of lifetime data analysis is understanding the relationship between the covariates and time-to-event. 5 The traditional format of survival data consists of a tuple, for each of the subjects. is the time until the occurrence of the rst event or censoring, and indicates the censoring status, and is the covariates. While dealing with time-to-event data, it is common to observe more than one outcome occurring among the subjects, which is referred to as competing events. Competing risk setting is ubiquitous in epidemiological studies and clinical trials since the subjects are likely to experience multiple possible adverse events. Since the problem is challenging to address, most of the existing models treat one of the events as the event of interest and others as right censoring. 6 This approach is inadequate as the occurrence of an event of interest is often obscured by other competing events. 7 Fine-Gray model is one of the survival techniques which has looked into this competing risk aspect. Hence, it is considered as a benchmark in competing risk analysis. 8 With the increasing acceptance of deep learning models, deep learning-based competing risk models are emerging as an important predictive tool in medicine.
These models attempt to give empirical estimates of the true cumulative incidence functions. [9][10][11][12][13][14] DeepHit is one such deep-learning technique developed to handle the competing events with good empirical performance. 6,14 We explored the use of these models to analyse the survival characteristics of 1863 COVID-19 patients admitted to hospitals across the globe from December 10, 2019, to March 30, 2020. In this data, discharge, and death being two events, we framed the competing risks setup. [15][16][17] Implementing this kind of predictive models help clinicians to know the possible outcome of the patients in advance which aid in improving treatment policies during the outbreaks or epidemics.
In the rest of the paper, we present the data, explain the methods, and discuss the results.

Methods
The performance of a deep learning-based competing risk model(DeepHit) on a COVID-19 clinical data is compared with a traditional statistical model(Fine-Gray). These are the two benchmarks for the competing risk problem in the current literature. Both of these models are based on the cumulative incidence function(CIF). Cause-speci c CIF gives the probability that the event occurs on or before time conditional on the covariates . i.e.,

Fine-Gray model
Fine-Gray 8 model is the most commonly used statistical method in competing risk problems. The traditional proportional hazards model is modi ed by the direct transformation of the CIF. This approach focus on the subdistribution of a competing risk. For each failure type, the model provides a direct interpretation in terms of survival probabilities. The cmprsk package in R is used to t the Fine-Gray model. We used the materials provided by Nemchenko et al. 7 to t the model and nd the performance metric.
DeepHit DeepHit 6 is a deep neural network that learns the distribution of survival times directly without making any assumption on the underlying stochastic process. The model trains a multi-task network to learn the estimate of the joint distribution of the rst hitting time and competing events. DeepHit can be used to predict the competing risks, discharge from the hospital (event 1), and death prior to discharge (event 2).
Since we are considering 2 competing events, the network consists of 4-layers. The rst layer is a fullyconnected layer for the shared subnetwork, followed by two fully-connected layers for each cause-speci c sub-network. The output layer is a softmax layer. ReLU(Recti ed Linear Unit) activations are used in all three layers. The network training is done by back-propagation via Adam optimizer with a batch size of 50 and a learning rate of . The dropout probability of 0.1 and Xavier initialization was applied for all the layers. We used the pycox package to implement the model. 13 For training, testing, and validation, 60%, 20%, and 20% of the data are randomly separated. For evaluation, 5-fold cross-validation is being applied.

Evaluation metric
The time-dependent concordance index is used to evaluate the discriminative ability of the models. 18 The principle of concordance is that the predicted survival probability of a subject who experienced the event should be less than those who have survived longer. The value ranges between 0 and 1. As the metric approaches one, better the performance of the model. The concordance index evaluates a method's discriminative performance.
A group of researchers collected epidemiological data from different research labs. The data was extracted from online resources and national health portals released by state/local health o cials and hospitals of different countries. The dataset consists of a subject ID, date of hospital admission, gender, age, the onset date of symptoms, outcome(death/discharge), death/discharge date, history of chronic disease, symptoms, location, and travel history.
Time to the events is calculated directly by subtracting the hospital admission date from the date of the outcome. Observations without the admission dates, covariate information, and outcome were removed from the study. Discharge from the hospital and in-hospital mortality is event 1 and event 2 respectively.
Censoring time is obtained from the last available date. Since the outcomes were well de ned, there was no complication in de ning death, discharge, and censoring. We have included only the patients who got admitted till March 30, 2020, to avoid massive censoring. We have considered the covariates: age, gender, chronic disease history, latitude, and longitude for the analysis. Table 1 presents outcome-wise descriptive statistics. All the percentages are calculated based on the remainder of 1863. The schematic plot for competing-risk time-to-event data for ve hypothetical subjects can be seen in Fig. 1.

Results
The time-to-event data of 1863 COVID-19 patients were analysed. Among these, 162(8.7%) discharges and 59(3.2%) deaths were reported. The socio-demographic descriptions of the events in the COVID-19 cases are depicted in Fig. 2. Mortality was marginally higher among the males( Fig. 2A.), higher among the elderly (Fig. 2B.), and among those with pre-existing chronic diseases (Fig. 2C.). Whereas, the discharge was marginally higher among males, higher among age group and among those without preexisting chronic disease condition.
CIF curves obtained for the events are depicted in Fig. 3. The estimated joint distribution of survival time with two competing events is trained with DeepHit. Fine-Gray model encodes the linear effects of patient covariates on both the events. The average discriminative indexes of both the models are given in Table 2. DeepHit outperformed the Fine-Gray model, indicates the better discriminative power of DeepHit. Due to the small number of observations(3.2% of the entire dataset), the performance of the models for event 2 is variable compared to event 1. This happens when a lesser number of event 2 is being sampled in the validation and testing sets.

Discussion And Conclusions
As several studies pointed out, we also observed that elders and patients with chronic diseases are more prone to die than other age groups and the gender and age of the patients have a direct effect on their recovery time. [19][20][21] Few studies have also reported that being old or male, the probability of hospital discharge is lower. 22 Cox proportional hazards model has been used to study the mortality and recovery of COVID-19 patients. The bias in estimating the hazard ratio of death due to COVID-19 in the presence of competing events has been investigated. 23 These studies point out the need of considering recovery and death as competing events to avoid a substantial risk of misleading results. 20,24 The upper hand of DeepHit over the other competing risk models has already been established in many studies. 6,14 Due to the lack of availability of clinical data, competing risk modeling of COVID-19 data are less explored to the best of our knowledge. 25 This is the rst study to use a deep learning-based competing risk model on COVID-19 clinical data with discharge and death being two competing events. There is a huge scope for further analysis when detailed data with other potential risk factors are available. An interesting expansion of the study includes the development of more sophisticated deep learning algorithms to measure the impact of clinical and prognostic factors on potential survival.

Con icts of interest/Competing interests
The authors declare that there is no con ict of interest.

Funding
Not applicable.