A Deep Learning Post-Discharge Mortality Prediction Method Considering Diagnosis Information for ICU Patients

Background and Objective : Mortality prediction is widely used to strat-ify patients into diﬀerent risk categories and to provide prognosis evaluation. Nowadays, scoring systems, which predict mortality with some scores reﬂecting the severity of diseases and physiological states of patients in ICU, have been widely applied for in-hospital mortality prediction. Many research works which focus on designing better machine learning models and algorithms for mortality prediction also have achieved great performance. However, it is not enough to make post-discharge prognosis of mortality only with the aid of better models and algorithms while richer patient related information can make big diﬀerences. In this study, we propose a deep learning method considering patient diagnosis information for post-discharge prognosis of mortality. This method can help to signiﬁcantly improve the performance of prediction. Further more, we propose a method of calculating disease Shapley values to evaluate the mortality risk brought by disease factors. Methods : Deep learning models including long short-term memory (LSTM) and temporal convolutional network (TCN) are trained with patient physiological time series data and diagnosis information of diﬀerent prevalence to predict post-discharge mortality risk of dif-ferent time windows. Disease Shapley values to evaluate the mortality risk brought by disease factors are the weighted average of marginal contributions of diseases to patient mortality. Experiments of several post-discharge mortality prediction tasks of diﬀerent time windows are conducted on the large freely accessible MIMIC-III dataset. To provide more suﬃcient comparison, the diagnosis information is also introduced for traditional machine learning models. Results : In our experiment, LSTM achieves highest AUROC and the improvements of which are 8.67%, 9.68%, 13.33%, 12.32% and 12.25% with the help of diagnosis information for ﬁve post-discharge mortality prediction tasks of diﬀerent time window. Several patient examples are shown to present the mortality risk brought by disease factors, of which the analysis results are in line with clinical experiences. Conclusions : In general, our proposed method can improve performance of ICU patient post-discharge mortality prediction and help to evaluate how much do diﬀerent kinds of diseases which a patient suﬀers from increase his mortality, thus providing support for clinical decisions. which the event is recorded; the name of the event; and a numerical value of the event. We group the events of each patient into one-hour intervals by timestamp. The time window of patient data is set as 48 hours, which means for each physiology variables, 48 values (may include missing values) are used. The diagnosis information in MIMIC-III is recorded in the form


Introduction
In the medical system, intensive care unit (ICU) is specially used to treat severely ill or unconscious patients. To provide life monitoring and support for these critically ill patients, ICUs are equipped with the most comprehensive treatment and monitoring equipment, such as ventilators, electrocardiogram monitors, blood gas analyzers, etc. Even with advanced equipment and professional medical staff, the mortality rate of ICU patients is still very high. Mortality prediction, which is an important patient outcome prediction task, could help to quantify the severity of the patient's physiological condition [1]. By reflecting the severity of diseases or the prognosis of patients, mortality prediction can help clinicians to make better clinical decisions and help patients to better know the physiological status of themselves [2]. Hence, ICU patient mortality prediction has been a very hot research topic in the field of medical informatics.
In early studies, experts developed a series of scoring systems to assess the in-hospital mortality of ICU patients, such as Logistic Organ Dysfunction system (LODS) [3], Sequential Organ Failure Assessment (SOFA) [4], Acute Physiology, Chronic Health Evaluation (APACHE) [5], etc. However, due to the non-specific characteristic and stringent linear constraint of model, predictions made by scoring systems can be inaccurate for individual patient [6]. Nowadays, massive volumes of data recorded in electronic health records (EHRs) also supported researchers to design models and algorithms for in-hospital mortality prediction which aim at improving the predicting performance and facilitating the clinical decision making. Outperformance of mortality prediction methods based on machine learning models have been shown by many works [6][7][8][9][10][11][12][13]. Some of them show better prediction performance of machine learning models than traditional scoring systems [7,8] and some of them develop various models and machine learning algorithms for mortality prediction [6,[9][10][11][12][13]. Further more, the deep learning models achieve particularly satisfactory performance in ICU mortality prediction tasks due to their strong ability of capturing non-linear patterns hidden in data [14]. Many researchers develop deep learning models which can deal with various modals of data and provide valuable clinical decision support [15][16][17][18][19][20].
Studies mentioned above have made progress in in-hospital mortality prediction, while post-discharge mortality is also of concern to both clinicians and patients [21]. Because of the complex illness and disease history of patients, the prognosis of post-discharge long-term mortality is a big challenge [22]. However, without the limitation of making prediction in hospital, the post-discharge mortality prediction models can take more patient related information into consideration. For example, Grnarova et al. propose an automatic mortality prediction method based on the unstructured textual content of clinical notes, which brings improvement to the difficult problem of post-discharge mortality prediction [23]. Although the extent to which disease events before ICU admission affect prognosis has been debated [24], the information about the diseases patients suffering from is of potentially value for post-discharge mortality prediction. Christiansen et al. reveal that morbidities have impact on mortality among ICU patients by analysing the mortality of patient groups with Charlson Comorbidity Index (CCI) levels [22]. Nielsen et al. investigate how the disease history of different length of years affects mortality prediction and reveal that taking account of long-term and short-term disease history can give more precise prognostic estimates than scoring systems in which only a small number of comorbidities are included in the computation [25]. Dai et al. conduct stastical analysis with MIMIC-III database, revealing closely correlations between diseases patients suffer from and mortality of patients [26]. For those reasons, we consider investigating the effect of diagnosis information during the ICU admissions for post-discharge mortality prediction task with deep learning models. The mian difference between our work and [25] is that they aggregate a long history of diseases while we consider the diagnosis information during admissions for mortality prediction. And they investigate effect of different length of disease history while we innvestigate the diagnosis information of different prevalence for post-discharge mortality prediction. Additionally, because the causes of death of post-discharge patients are cared about by clinicians [27], we provide a method to evaluate mortality risk brought by disease factors based on each individual's physiological condition and the diagnosis information.
In this article, we propose a deep learning post-discharge mortality prediction method considering diagnosis information for ICU patients. In our proposed method, easily available patient vital signs data and diagnosis information are fused to predict post-discharge mortality for ICU patients, which can greatly improve the predicting performance of the deep learning models. Besides, the computation of disease Shapley values is introduced to evaluate mortality risk brought by different disease factors.
The main findings and contributions are summarized as follows: (1) The proposed post-discharge mortality prediction method considering patient diagnosis information during the admissions can significantly improve the prediction performance of deep learning models and traditional machine learning models.
(2) The method of computing disease Shapley values, is used to evaluate the hazrad the diseases cause to a single patient. With the aid of disease Shapley values, individual patient's disease condition can be analyzed to help doctors clarify the priority of multiple diseases and facilitate subsequent treatment.
(3) The proposed methods are tested on MIMIC-III dataset on the following five post-discharge mortality prediction tasks: will a patient die within 30 days , within 90 days, within 180 days, within 365 days and within 5 years. Our method is presented to improve the prediction performance for both deep learning models and traditional machine learning models. The evaluation results with disease Shapley values are in line with clinical experiences.
The remainder of this study is organized as follows. In Section 2, the materials and methods part, we first expound detailed process of data extraction and preprocessing. Then we introduce our deep learning method of predicting ICU patient post-discharge mortality considering diagnosis information and the calculation of disease Shapley values. The experimental results and analysis are presented in Section 3. Finally, some conclusions, limitations and future work discussions are presented in Section 4.

Materials and methods
In this section, the preparation of data and the proposed method are introduced in detail. An overview of our proposed method is presented in Figure 1. In the following part, we first give an introduction of vital signs and diagnosis information data extraction and preprocessing with MIMIC-III database in Section 2.1. In Section 2.2, the method of fusing patient physiological data and diagnosis information is introduced first. Then, deep learning models including long short-term memory (LSTM) and temporal convolutional network (TCN) are introduced. In Section 2.3, we introduce the method of computing disease Shapley value.

Data extraction
In this study, we conduct our experiments and analysis on the publicly available medical information mart for intensive care III (MIMIC-III) dataset [28], which can be used to conduct research after application. This dataset contains information from the electronic health record (EHR), including patient demographic information, vital signs, lab events, diagnosis information, etc. In MIMIC-III, each patient is unique with a SUBJECT ID. Each SUBJECT ID corresponds to one or more HADM IDs which means a patient has one or more hospital admissions. And for each HADM ID, one or more ICUSTAY IDs can be matched, which means a patient may be admitted to the ICU one time or several times within an admission. Beginning with exhaustive clinical data of over 60,000 ICU stays of 40,000 patients, we apply a similar patient cohort selection inclusion criteria following previous benchmark research [29]. First, admissions with multiple ICU stays or transfers between different ICU units or wards are excluded, which aims to reduce the ambiguity of outcomes associated with hospital admissions rather than ICU stays. Then, we exclude admissions which correspond to patients died in hospital because we aim at training the model for patients who discharge alive. And admissions which have ICU stays less than 48 hours are excluded because we will extract data of 48 hours for mortality prediction. Finally, considering the differences between adults and pediatric physiology, we drop the patients younger than 18. After above process, we get a cohort with 18,324 ICU stays. Then, we extract basic vital sign data and diagnosis information for patients who meet the selection inclusion criteria. The vital sign data of each admission are represented as a sequence of EHR events with timestamps in MIMIC-III database. There are three most important elements in each event: a timestamp at which the event is recorded; the name of the event; and a numerical value of the event. We group the events of each patient into one-hour intervals by timestamp. The time window of patient data is set as 48 hours, which means for each physiology variables, 48 values (may include missing values) are used. The diagnosis information in MIMIC-III is recorded in the form of International Classification of Diseases (ICD). In MIMIC-III database, the code version is ICD-9 . For each patient, we extract a list of ICD-9 codes from DIAGNOSES ICD table.

Patient mortality label preparing
After identifying the patient cohort for modeling and analysis, we prepare the mortality labels for all of them. According to the discharged time and death time recorded in the database, we calculate the time interval between discharge and death of patients. The prediction time window in our study is set as 30 days, 90 days, 180 days, 1 year and 5 years referencing some studies [30][31][32]. For each patient, we assign 5 labels which mean whether a discharged patient will die in 30 days or not, in 90 days or not and so on. The binary labels of each patient i can be represented as y 30d i , y 90d i , y 180d i , y 1y i and y 5y i , respectively.

Preprocessing of patient vital sign data and diagnosis information
Vital sign multivariate time series. The vital sign raw data are extracted as form of EHR events with timestamp. In this study, we select 17 predictors, including capillary refill rate, diastolic blood pressure, fraction inspired oxygen and so on, for mortality prediction. The statistical summary of the predictors is shown in Table 1. With observation window of 48 hours opted, all the events occur in which are partitioned into 48 one-hour periods according to the timestamps. The last observation value is kept if several values of some predictor exist in the same one-hour period. The missing values are imputed with the most recent value of the predictor if there is one and pre-specified normal values otherwise. Binary mask inputs for each predictor at each timestamp are also provided to represent whether a value is a real one or an imputed one. And the categorical variables are processed into one hot vector form. The numeric predictors are dealt with standard normalization. After above process, a matrix X i = (x (i) nt ) N ×T is used to represent values of the N variables during the T hours of patient i. N and T are 76 and 48 in our study, respectively. The element x (i) nt in the matrix is the nth variable value at timestamp t. For deep learning models such as LSTM and TCN, the matrix of shape time-steps×features (48×76 in our study) can be directly received as input data.
Diagnosis information. In our study, the diagnosis information of patients is used to facilitate the post-discharge mortality prediction of ICU patients. After the data extraction process, we get a list of ICD-9 codes for each patient admission. There are more than 14,000 diagnosis codes in ICD-9. Representing each disease of patients as binary variable will cause the problem of extremely high dimension and sparsity. To overcome that problem, we use the clinical classification software (CCS) developed by healthcare cost and utilization project (HCUP) to cluster diagnoses into manageable number of clinically meaningful categories [33]. We use single-level diagnosis classification scheme to achieve the clustering, which aggregates illness and conditions into 275 mutually exclusive categories. Most of the categories are clinically homogenous. The descending numbers of patients of CCS categories (which we deem as disease in the following part) with prevalence more than 1% in our dataset are shown in Figure  2 and the information of top 10 CCS categories with highest prevalence is presented in Table 2. The complete correspondence between ICD-9 codes and CCS categories are presented in Supplementary material. There are 34 diseases with prevalence more than 10%, 65 diseases with prevalence more than 5% and 140 diseases with prevalence more than 1%. Figure 3 shows top 20 diseases with highest prevalence in patient cohort who died in different time window and the prevalence of those diseases in survival patients in the dataset are also presented for comparison. It can be found that the prevalence of most diseases in died patients (red bar) are higher than survival patients (blue bar). From Figure 3 we can find that the categories of top 20 diseases are similar among the five mortality time windows but prevalence of diseases shows some differences. Figure 4 shows top 20 diseases (within the 140 diseases) with highest mortality rate in different time window. We can find that the patients who suffer from critical illness such as secondary malignancies (CCS 42), leukemias (CCS 39), cancer of bronchus and lung (CCS 19) and son on, are more likely to die in a year from discharge. The diseases of highest after-one-year mortality are quite different from those in-one-year. For example, diseases like other diseases of bladder and urethra (CCS 162), Parkinson's disease (CCS 79), nephritis, nephrosis, renal sclerosis (CCS 156) and so on, have a highest after-one-year mortality. These diseases are chronic diseases which are not so fatal in a short term but also affect the live quaility of patients.  As patients with different life length suffer from different kinds of diseases and patients suffering from different kinds of diseases have different mortality rate, we consider taking diagnosis information as part of patient features for mortality prediction. After clustering the large number of ICD-9 codes with CCS, we can use a vector with binary entries to represent patient diseases, in which each entry represents whether a patient suffers from certain disease or not. In this study, we select diseases which meet some prevalence threshold, thus obtaining disease vectors of different length. The threshold of prevalence is set as 10%, 5% and 1%, the corresponding disease vectors of patient i are x represents whether a patient suffers from some disease dis j or not.

Fusion of patient physiological data and diagnosis information
After the patient vital sign data preprocessing, each patient gets a vital sign multivariate time series matrix for deep learning mortality prediction models. And after the preprocessing of patient diagnosis information, each patient gets a binary vector representing the diseases he suffers from. In this study, we adopt an easy way to fuse the patient physiological data and diagnosis information data. We pad the same disease vector to each timestamp, thus obtaining three matrices X , which can be concatenated with vital signs multivariate time series matrices through time axis.

Mortality predicting deep learning models
Our proposed mortality prediction method is based on deep learning models. Recurrent neural network (RNN) and convolutional neural networks (CNN) are two outstanding types of deep learning models which have been widely applied to many clinical studies such as outcome prediction [15], readmission prediction [34], critical illness prediction [35] and disease progression prediction [36]. The former one is good at data with temporal properties and the latter is good at data with spatial properties. We implement two representative models long short-term memory (LSTM) [37] and temporal convolutional network (TCN) [38] which can both capture patterns in sequential data. For LSTM, the ability of capturing patterns in sequential data is achieved by recurrently updating the hidden layer state. The hidden layer state can be calculated with the following equations: where x t is variable vector of The architecture of our LSTM model is shown in Figure 5, which consists of 2 hidden layers with 256 units each. The dropout rate and recurrent drop out rate are set as 0.2 in both hidden layers. For TCN, its special architecture can take a sequence of any length and map it to an output sequence of the same length just as with an RNN. The basis of TCN is dilated convolution, which can increase the receptive field of model and even with fewer parameters. The detailed structure of our TCN is presented in Figure 6. For both LSTM and TCN, the loss function is binary cross entropy represented as follows: Hyper-parameter estimation is conducted via a trial and error way. We optimize the parameters aiming at minimizing the prediction error. We start with initial range of coarse values, measure prediction error, and then adjust hyperparameters in order to get the least loss on the validation dataset. Finally, the number of epochs is set as 100 (early stopping is also set based on validation loss with patience of 20 to avoid over-fitting), batch size is set as 64 and Adam (adaptive moment estimation) [39] is chosen as optimizer. And because the labels of all five tasks are im-balanced, we conduct over sampling for training data.  Fig. 6 Architecture of our temporal convolutional network and taking all possible interactions between players into consideration, Shapley values can achieve a fair way of allocating the total output of coalition. In our study, Shapley value is used to evaluate the mortality risk in different time window brought by disease factors. Under this circumstance, the output of the coalition is the mortality of a single patient and the members in the coalition are diseases the patient suffering from. The disease Shapley values φ j i (f (·), X i , d i ), which represent the impact of disease d j i for patient i, can be calculated with trained models f (·), the vital sign matrix X i and disease set d i as follows:  i , d 2 i are the second and the fifth diseases in the CCS disease category list, respectively. We can conduct a loop process to calculate each disease Shapley value for each single patient. In the figure, we only present the process in the condition that j = 1. When calculating the Shapley value of disease j, we first generate 2 m−1 disease subsets from d i in which d j i is excluded. Then we obtrain 2 m−1 corresponding subsets by adding d j i to each subset generated above. Then a series of disease vectors x j − i and x j + i can be obtained through the mapping from generated disease subsets. Finally, the Shapley value of disease j can be calculated with Eq. (7). By repeated above process,

Experimental results and analysis
In this section, we conduct some experiments on MIMIC-III dataset to investigate the effect of taking consideration of diagnosis information in patient post-discharge mortality prediction. Besides, we take some patients with high mortality predicted by the model as example to present the hazard diseases bring to patient mortality.

Design of experiments
In this study, there are 5 tasks separately to predict whether a patient will die in 30 days, in 90days, in 180 days, in 365 days and in 5 years. Mainly two part of experiments are conducted on the several tasks.
Firstly, we compare the mortality prediction performance of traditional scoring systems, traditional machine learning models and deep learning models. Scoring systems implemented include SOFA and LODS. Trditional machine learning models including logistic regression (LR), support vector machine (SVM) [42], random forest (RF) [43] and XGBoost [44] are implemented. Those models can not take multivariate time series data as input, so we extract hand-engineered statistical features of patient vital signs. The statistical features including mean, standard deviation, maximum, minimum, skewness and kurtosis are calculated within 7 sub timeseries including full 48 hours, first 5 hours, first 12 hours, first 24 hours, last 24 hours, last 12 hours and last 5 hours. For each patient i, a vector (x i,mean1 , x i,std1 , ..., x i,maxn , x i,minn , x i,kurtn , x i,skewn ) T is obtained. The length of the vector is 714 (generated from 17 predictors, 7 sub timeseries and 6 statistical features) in our study. And the data in training set are also dealt with missing value imputation, standard normalization and over sampling.
Then, we present the results of deep learning models taking diagnosis information of different prevalence into consideration. To further prove the effect of adding diagnosis information for mortality prediction, we also implement traditional machine learning models taking diagnosis information into consideration. By changing the threshold of prevalence of diseases in our dataset, we can adjust the number of disease categories (namely the length of disease vector) added as patient features. In this study, we set the threshold values of prevalence as 10%, 5% and 1% which correspond to disease vectors of size (34, 1), (65, 1) and (140, 1).
To avoid accidental results of experiment, we conduct 5-fold crossvalidation. The original dataset is divided into 5 partitions of roughly equal size and the ratio of positive and negative classes in each partition is roughly the same as original dataset. Random seeds are kept to eliminate the randomness brought by the splits of dataset. The overall performance is obtained by averaging the results of each fold, i.e., where P K is the performance metric, which in our study is area under the receiver operating characteristic curve (AUROC) score and area under precision-recall curve (AUPRC) score, which are adopted to evaluate the discrimination ability of the model in the condition that data are imbalanced labeled. The higher the AUROC and the AUPRC are, the better the model is; K is the number of folds, which is 5 in our study. Our experiments are conducted with Python 3.7.3. The logistic regression, support vector machine and random forest are implemented with scikit-learn library of 0.23. XGBoost is implemented with xgboost library of 1.3.3. The LSTM and TCN models are implemented with Keras 2.3.1 and Tensorflow 2.1.0.

Results and analysis
In this section, we first present the results of the several mortality prediction tasks with different models and different volume of diagnosis information. Then, several patients died in five time windows are shown to present the evaluation of a single patient mortality risk brought by disease factors.

Prediction performance comparison
The experimental result comparison of traditional scoring systems, traditional machine learning models and deep learning models are shown in Table 3. We present the mean value and standard deviation of AUROC and AUPRC of the 5 folds of each task. For each task, the highest two AUROC and AUPRC are marked in bold. From the table, we can find that most machine learning models have better performance than scoring systems. SVM shows weak performance in the 30-days, 90-days and 180-days mortality prediction, but performs well in the rest tasks. Random forest performs well in the 180-days, 1-years and 5years mortality prediction. As for deep learning models, TCN performs well in 30-days and 90-days mortality prediction and LSTM achieves the best results except for 5-years mortality prediction.
The experimental results of traditional machine learning models and deep learning models using different volume of diagnosis information are compared in Table 4. Instead of marking the highest AUROC and AUPRC, we mark the highest AUROC and AUPRC of the best two models in bold. For example, the best models for 30-days mortality prediction task are TCN and LSTM and the highest AUROC of the two models are TCN and LSTM using diagnosis information of prevalence more than 5% and 10%, respectively. Besides, for each model, the best result are marked with an asterisk. For example, the best result of logistic regression on the task of 30-days mortality prediction is obtained when using diagnosis information of prevalence more than 1%. From the table, we can find that adding diagnosis information improves the AUROC and AUPRC for both traditional machine learning models and deep learning models. For 30-days mortality prediction task, the AUROC scores are improved up to about 9.74%, 6.56%, 4.22%, 6.66% and 8.67% for logistic regression, SVM, RF, XGBoost, TCN and LSTM by using disease information, respectively. There are also improvements for AUPRC scores for models using diagnosis information. As for longer term post-discharge mortality prediction tasks, the effect of adding diagnosis information is more significant both in AUROC and AUPRC for all models, which means vital sign data from relatively short period of time can not reflect the physiological state of patients sufficiently in longer term, while the diseases patients suffer from can provide more information about it. Additionally, diagnosis information helps to improve the performance more for deep learning models than traditional machine learning models, which is probably due to the strong ability of dealing with the unstructured time series data and capturing potential non-linear relationships between information of patient vital signs and diseases of deep learning models. With the help of diagnosis information, deep learning models can outperform all traditional machine learning models including SVM and random forest which even has better performance than deep learning models without diagnosis information.
In general, our method can improve the post-discharge mortality prediction performance to a greater extent, which means patient diagnosis information can help to better evaluate the physiological state of patients. We believe methods focusing on designing better model structure and algorithm for mortality prediction are likely to achieve better prediction performance with the help diagnosis information.

Evaluation of mortality risk brought by disease factors
For each task, we take one fold of experiment using LSTM as predicting model and the length of disease vector is determined according to the AUROC scores on the validation set to conduct the evaluation of mortality risk brought by disease factors. For each task, we present three patients who are predicted with low mortality without diagnosis information and high mortality with added diagnosis information in Figure 8- Figure 12. All of the patients presented actually died in the corresponding time window.
In Figure 8- Figure 12, the bar plot represents the computed disease Shapley values, which reflect the expected change value in the predicted risk of the model when a patient suffers from a disease versus when he doesn't. The line chart is predicted mortality with generated patient disease vectors to reflect the cumulative effect of diseases. The y-value of each dot represents the cumulative mortality. It is predicted with the cumulative disease set which includes the corresponding disease of the dot and those on the left. In Figure 8, patient 1 and 2 suffer from acute illness such as acute myocardial infarction and acute and unspecified renal failure. the disease Shapley values of these diseases for these patients are high, which means the model predicts these patients of high mortality largely due to these diseases. As for patient 1∼3, the predicted mortality is relatively low without diagnosis information, which means their vital signs don't show very bad physiological condition. The diagnosis information associated with vital signs reveal the high mortality risk of the patients. As for patient 13 in Figure 12, the predicting mortality is relatively high without the diagnosis information, which means his vital signs data reflects a not optimistic

Discussion
A series of experiments are conducted and the results are presented in this section. We find that our proposed method can improve the mortality prediction performance tasks for both traditional machine learning models and deep learning models. The degree of prediction performance improvement varies with the tasks and model chosen. In general, the prediction performance improvement is more significant for deep learning models such as TCN and   LSTM than traditional machine learning models such as logistic regression, SVM, random forest and XGBoost, which is probably because deep learning models have stronger ability of capturing potential non-linear relationships between patient vital signs and disease information. The prediction performance improvement is obviously significant for longer term tasks such as 1-year and 5-years post-discharge mortality prediction, which is because the snapshot of patient vital signs are not enough to reflect his longer term physiological state while the diseases patients suffer from can provide more information about it. The experimental results are in line with clinical experience that the patient physiological state is closely related with the diseases he suffers from.
In addition to using diagnosis information to predict the mortality of patients, we also provide a way of evaluating the hazard of diseases brought to a single patient mortality. By computing disease Shapley values, we can learn to what extent different diseases affect different patient's physiological state. As for the patient examples shown in our study, acute critical illness such as acute myocardial infarction, acute and unspecified renal failure and so on bring significant risk to patient mortality in one year. And illness such as secondary malignancies, various kinds of cancers also have serious bad effect on patients mortality. Although our method can not provide evaluation in a causal way, we can better predict post-discharge mortality and better evaluate the long term patient physiological state, which could help doctors to adopt therapeutic interventions and help patients to better know the condition of their health status.

Conclusions, limitations and future work
In this study, we propose a deep learning post-discharge mortality prediction method considering diagnosis information, which is tested on MIMIC-III dataset and proven to improve the performance in several mortality prediction tasks for deep learning models. This method also works for traditional machine learning models. Besides, we also provide a way of evaluating mortality risk brought by disease factors to a single patient by disease Shapley values. With the aid of the disease Shapley values, individual patient's disease condition can be analyzed to help doctors clarify the priority of multiple diseases and facilitate subsequent treatment. There are some limitations and future work in our study. Firstly, we only represent the patient diagnosis information with a binary vector in a simplest way, which can not reflect the severity of patient diseases and relations between diseases. We will try to represent the diseases patients suffer from referring to methods which can better represent the relationships between diseases. With more detailed information about the severity of diseases patients suffer from, the physiological state of patients can be evaluated more accurately. Then, the mining of unstructured textual content of clinical notes of patients are not investigated in this study, which may further facilitate the post-discharge mortality prediction. Although there are some limitations of our research yet, we think this study is valuable for ICU clinical post-discharge mortality prediction and for the following up studies on ICU mortality prediction.