Learning to Predict in-hospital mortality risk in intensive care unit with attention-based Temporal convolution network


 Background: Dynamic prediction of patients’ mortality risk in ICU with time series data is limited due to the high dimensionality, uncertainty with sampling intervals, and other issues. New deep learning method, temporal convolution network (TCN), makes it possible to deal with complex clinical time series data in ICU. We aimed to develop and validate it to predict mortality risk using time series data from MIMIC III dataset. Methods: 21139 records of ICU stays were analyzed and in total 17 physiological variables from the MIMIC III dataset were used to predict mortality risk. Then we compared the model performances of attention-based TCN with traditional artificial intelligence (AI) method. Results: Area Under Receiver Operating Characteristic (AUCROC) and Area Under Precision-Recall curve (AUC-PR) of attention-based TCN for predicting the mortality risk 48h after ICU admission were 0.837(0.824 -0.850) and 0.454. The sensitivity and specificity of attention-based TCN were 67.1% and 82.6%, compared to the traditional AI method yield low sensitivity (<50%). Conclusions: Attention-based TCN model achieved better performance in prediction of mortality risk with time series data than traditional AI methods and conventional score-based models. Attention-based TCN mortality risk model has the potential for helping decision-making in critical patients.


Background
The in-hospital mortality of patients in intense care unit (ICU) is relatively high, which ranged from 6.7% to 44.0% worldwide (1,2). With the development of critical care medicine, larger amounts of data helped doctors to make decisions, however sometimes overwhelm doctors in reverse. Thus, tools which help doctors make decisions based on large amounts of monitoring and clinical data are badly needed.
In the past, score-based models, such as simpli ed acute physiology score (SAPS II), Acute Physiology and Chronic Health Evaluation II (APACHE II), were commonly used for patients' evaluation based on prediction of mortality risk (3,4). When applied to larger populations, the diagnostic performances of score-based models are relatively poor (1,2,(5)(6)(7)(8). Recently, methods based on arti cial intelligence (AI), including conventional machine learning (ML) methods and deep learning methods have been applied to help doctors decision-making by predicting patients' mortality risk (9)(10)(11). Comparing with statistical score-based models, methods based on AI usually had better model performance, which may be related to that AI methods had the advantages of dealing with complex non-linear relationships between variables and patients' outcome.
However, there are some limitations of the referred researches. One of the biggest problems were that the worst records of a period time or statistical variables (maximum, minimal or others) replaced the repeated measured variables such as vital signs to predict the mortality risk. In ICU reality, overall trend and coupling of changes between different physiological variables may provide prognostic information, which will also help to elevate the accuracy of prediction model (12). The ideal tool to help doctors' decision-making requires optimum use of all the associated routine variables especially time series data and realizing dynamic prediction. However, due to the complex of the time series data, studies about dynamic prediction using temporal clinical data are limited.
The challenges of predicting mortality risk in ICU were summarized by Ikaro et al (12): First, measurements of time series data vary from each patient meanwhile the time interval is irregular; Secondly, the choice of measurements and the trends of time series data correlate with each other. As for time series deep learning models, the Long Short-Term Memory (LSTM) (13) and its derivatives Gated-Recurrent Unit (GRU) (14), were used to predict the mortality risk of ICU patients which achieved better area under Receiver Operating Characteristic (AUCROC) and area under Precision-Recall curve (AUC-PR) than conventional score-based models. However, due to that data is processed sequentially over time, the LSTM and GRU had the shortcomings of enormous computing, time consuming, high hardware requirements and vanishing gradients which result in di culties of dealing with big data and applying to clinical popularization. It is widely accepted that deep learning has the shortcomings of lower explanation and large computing. Attention mechanism simulates the data processing of the human brain, and nowadays were combined with LSTM or other deep learning methods to improve computational e ciency or the interpretability (7,15,16). However, the limitations of ine cient particularly when processing long sequences stilled existed due to the characteristics of the method itself. A better deep learning method overcomes the current limitations is seriously needed. Recently, a new deep learning method, temporal convolution network (TCN) was developed, with the characteristics of parallelism, xed gradient, and smaller memory of training. Furthermore, Bai et al (17) reported that the TCN has even better performance than LSTM or GRU. Developing attention-based TCN model may not only elevate the interpretability, reduce the computation complexity, but also extend the clinical use due to its higher e ciency for long sequences. Therefore, we aimed to develop an attention-based TCN model to predict the in-hospital mortality risk 48h after admission in ICUs with time series data and compare the model performances with conventional ML methods, namely random forest (RF), logistic regression (LR), decision tree and support vector machine (SVM).

Ethics and data extraction
Ethical approval for this study was approved after completion of the National Institute of Health(NIH) web-based training course named "Protecting Human Research Participants" by the author Yu-wen Chen (Certi cation Number: 28341490). Data used for the prediction of mortality risk were extracted from the Multi-parameter Intelligent Monitoring in Intensive Care (MIMIC) database. All the data from the database had been treated with data masking to protect patients' privacy. So, there was no requirement for written informed consent in the current study. There were 61532 records of ICU stays in Beth Israel Deaconess Medical Center ICUs, including clinical notes, physiological waveforms, laboratory measurements, and nurse-veri ed numerical data (18). The exclusion criteria were as following: any hospital admission with multiple ICU stays or transfers between different ICUs or wards, which would reduce the ambiguity of outcomes associated with hospital admissions rather than ICU stays; patients younger than 16; patients' whose initial ICU stay was missing or less than 48 hours; ICU events with no events in the initial 48 hoursThere were 18094 were for nal analysis. As shown in Fig.1, In order to avoid over tting, we split the dataset into training set (15331patients, 17917 ICU stays) and testing set (2763 patients, 3222 ICU stays). Five-fold cross-validation was performed on the training set to determine the model parameters. We obtained the best model parameters after cross-validation on the training set and got the score of the model on the testing set.

Data preprocessing
We use 17 physiologic variables (shown in Tab.1) representing a subset from the Physionet/CinC Challenge 2012 (12). Up to 17 variables were recorded at least once during the first 48 hours after admission. Not all variables were available in all cases. We used all raw values for time series measurements included in the score. For Glasgow Comma Score (GCS), we included GCS (Verbal response), GCS (Motor response), GCS (Eye opening) and GCS (total) as different features. The rest of the variables included weight, height, temperature, respiratory rate (RR), heart rate (HR), diastolic blood pressure (DBP), Mean blood pressure(MBP), systolic blood pressure(SBP), fraction inspired oxygen (FiO 2 ), oxygen saturation (OS), pH, glucose, and capillary refill rate (CRR). When value was more than three standard deviations away from each individual mean value, it would be removed.
Twelve of them were continuous and five discrete. All the time series variables were resampled into hourly rate starting from ICU admission. When there was a continuous variable that was missing at a point in time, we filled the data with the nearest neighbor value. When the indicator had no record data during the observation time,we assumed that the nurse did not measure the attribute and that the indicator was considered normal so that we filled the data using the normal value of the attribute. . For discrete variables, we performed one-hot encoding. For continuous variables, we performed Z-score normalization to scale the feature values. Each patient's record was summarized into a visualization data matrix 59×48 for 48-hour observation period as the input for deep learning.

Model construction for Attention-based TCN
In this work, we developed an attention-based TCN model to predict the mortality risk of ICU patients with time series and static data. The TCN is convolutional network, which is composed of causal convolution, diluted convolution, and residual connections. The causal convolution makes the TCN a strict temporal model by using data from time t and earlier in the previous step to predict the status at time t when model trained. TCN allows the input of convolution to be sampled at intervals to broaden the field of perception (make the most of information) due to use of the dilated convolution. The residual connections enable the network to transmit information across layers, which are usually used to train deep network. In addition, TCN adds Dropout to each hole in the residual module to achieve regularization. Attention mechanism was introduced into the TCN model to elevate the efficiency and the interpretability.
The structure of attention-based TCN model was shown in Fig.2: patients' raw data were preprocessed as data flow for model in-put; The TCN (Temporal Convolutional Network) (17) was directly applied to process the ICU patient's temporal data. The network was referenced to the basic structure of the literature(17) without corresponding structural optimization. Since the number of kernels was 3 and the number of attributes for the patients was 59, the stacked temporal convolutional attention layer was set to 7. When the network layer was set to 7, the receptive field of the network exactly covered all the patients' input data. The patient's vital signs data are extracted by 7-level TCN, then connected to the attention layer; finally, the mortality risk were predicted by linear layer.

Non-time series model construction
We also predicted the mortality risk by non-time series ML methods such as RF (19), LR, Decision Tree (DT) and SVM. Due to the limitation of these ML methods, the in-put for these models were not time series data but results of feature extraction (statistical variables, such as the minimum, maximum, average of the variables).
Model performance was assessed by overall performance, discrimination, and calibration. The overall performance is determined by the F1 score. The F1 score is defined as the harmonic mean of accuracy and recall, which considers both the precision and the recall equally. Discrimination is the capability to distinguish between those who survival and those who do not 48h after admission in ICU by AUCROC and the area under the Precision-Recall curve (AUC-PR). The AUC-PR is sensitive for the imbalance distribution of the negative and positive data especially for an extreme small portion of positive data.
Calibration is assessed by Brier score via calculating the averaged squared deviation between the predicted probability and the actual outcome.

Statistical analysis
The statistical analyses were carried out using SPSS software for Windows, V.19.0 (SPSS). Quantitative variables are presented using basic descriptive statistics: mean and SD (for normal distribution data), or median and IQR (for non-normal distribution data).
Comparisons among datasets were performed using the chi-square test or Fisher's exact test, or Kruskal-Wallis test. All statistical tests were two sided, and P values less than 0.05 indicated statistical significance.

Data distribution
Finally, there were 18094 patients for analysis. Patient demographics and characteristics of the three datasets were presented in Tab.3. There were no statistically signi cant differences in age, gender, and ICU length of stay between them. The mortality rate of our cohort was 15.4%. Though the mortality rate of patients in the testing dataset was signi cantly lower than that of training datasets, the mortality rate of patients in test dataset was similar to that of patients in our whole cohort.

Model performance of time series and non-time series models
We evaluated the new model in 3 aspects. First, we compared attention-based TCN with traditional scorebased methods, secondly compared with models which do not use time series data, and nally compared with LSTM which used time series data. The purpose of the comparison with traditional ML models was not to use complex models to compare with simple models, but to show that models based on patient time series data are effective in improving the accuracy of predictions compared to models not using time series data. The purpose of the comparison with traditional machine learning models was not to use complex models to compare with simple models, but to show that models based on patient time series data are effective in improving the accuracy of predictions compared to models not using time series data. As shown in Tab.4 and Fig.3 A, comparing with the statistical methods, AI methods had larger AUCROC and AUC-PR which indicated a better capacity of discrimination. Though the AUCROC and AUC-PR of attention-based TCN were smaller than that of non-time series ML methods, which also had an acceptable ability of discrimination. Furthermore, comparing with non-time series ML methods, the attention-based TCN had the highest sensitivity (67.1%) and F1 score (0.46). Models with high speci city but lower sensitivity, resulting in missing patients potentially at risk, which would violate our initial purpose of helping doctors dynamically evaluating the mortality risk of patients. As for other time series methods, the sensitivity of attention-based TCN was much higher than that of model by LSTM (46.1%) based on the same database (7), with only a little difference in the AUC-PRs between them. It indicated that models developed by attention-based TCN had higher accuracy and lower omission diagnose rate than those by LSTM, which may be related to the difference between the input variables. As for model calibration, the brier score of attention-based TCN was higher than that of the other conventional ML models, which may be associated with the high dimension of time series data. Taking the purpose and clinical application into consideration, due to the high sensitivity, F1 score and relative satis ed ability of discrimination, model performance of attention-based TCN was the best among the listed methods in Tab.4.

Visualization of attention weights at different time points
By visualizing the attention weight, we could clearly see that what the variable and time points were paid attention to when predicting the risk of death. The typical heatmaps for attention weight of non-survival and survival patients were shown in Fig.3 B and C. The larger portion of the colored area in the heatmap of non-survival patient suggested that the patient was less instable. The value of the variable at the time point represented by these colored areas contributed more than other values to the patient's death. The time point with most of the variables colored may correspond to rescue in the clinical reality. Except for the good model performance, the attention-based TCN methods may also have the potential advantage of better interpretability.

Discussion
There are several score-based models for predicting the mortality risk, such as SAPS (3), APACHE (20), OASIS (21) and Sequential Organ Failure Assessment (SOFA) (22). These models are all non-time series and based on statistical methods, the input data are static data or statistical data, such as comorbidities and the minimum of systolic pressure in the rst 24h, which make it impossible to predict the mortality risk in the rst 24h or to update data for predicting long-term mortality risk. Despite the AUCROCs of the score-based models were satis ed, either the sensitivity or the speci city was poor (23,24). It's not suprising that these models have been modi ed several times to improve their predictive performance since they rst being published (25). Recently, for the complex, non-linear relationship between clinical variables and the outcome, non-time series AI methods, such as Arti cal neural work (ANN), SVM, DT , RF, Naive Bayes, projective adaptive resonance theory (PART) and AutoTriage, were used to predict the mortality risk of patients in ICUs (5,11,24,26,27) with relatively satis ed model performance. However, due to the non-time series methods, all the variables are static or extracted from time series data, which makes it impossible to realize dynamic prediction. Herein, the AUCROCs and AUC-PRs of attention-based TCN model were larger than that of conventional score-based models in the same database according to Harutyunyan et al's study (8). It is a pity that Harutyunyan et al did not show the sensitivity and speci city of conventional models. Regardless of the slight difference in AUCROCs and AUC-PRs among attentionbased TCN and other non-time series ML methods, the sensitivity of attention-based TCN was much higher than that of the others. In clinical work, when decision-making happens, doctors should take medical history, physical examination and trend of vital signs into consideration. The ideal model for predicting mortality risk is taking both time series and static clinical data into consideration, moreover simultaneously realize dynamic prediction. Furthermore, due to the instable status of ICU patients, the sensitivity seemed more important than the speci city, for missing the potential patients at risk might be fatal. In brief, attention-based TCN method was preferable to non-time series methods in predicting the motarlity risk of ICU patients. In addition, Hao et al (28) tried to apply attention-based TCN to language models resulting a signi cant elevation of model performance, which suggests attention-based TCN is a promissing method for sequence modeling.
Recently, Yu et al (7), Harutyunyan et al (8) and Song et al (16) combined two AI methods (including one time series method) to predict the mortality risk of ICU patients with large AUCROCs and AUC-PRs but lower sensitivity (the variables and sensitivity were not presented in Harutyunyan's study). Despite the low sensitivity, there were other shortcomings of these studies. At rst, Yu et al's and Harutyunyan's methods were based on LSTM, which deals with time series data sequentially from beginning to end, while TCN can do parallel processing by causal convolutions in the architecture (17). Due to the limitations of LSTM, attention-based TCN methods would be more proper for higher dimension and amounts of data and require less in hardware, which would be more appropriate for clinical extension. Secondly, Yu et al's study included vital signs, namely, HR, SBP and temperature, while ours included RR, HR, DBP, MBP, SBP and temperature. Nowadays MBP and DBP are widely accepted as important predictors for ICU patients (29-31). So, it may be insu cient to predict the mortality risk without MBP and DBP. Moreover, some of the variables such as urinary output in Yu et al's study, which are sum or mean of clinical data in a set period time and have a longer acquisition time interval than that of vital signs. Vital signs in our study were more reasonable and easier to obtain than those in Yu et al's, meanwhile variables more frequently collected would help more for dynamic prediction. Thirdly, Harutyunyan et al's and Song et al's study focused on the algorithms, the clinical value was a bit overlooked. Fourthly, these three studies combined attention mechanism was mainly intended to elevate the e ciency of computing rather than interpretability. Comparing with time series methods combined with other AI methods for predicting mortality risk of ICU patients, our attention-based TCN method also had advantages of higher e ciency, better interpretability and easier for promotion.
As shown in Fig.4, we drew a diagram for clinical use of predicting the mortality risk of ICU patients by attention-based TCN methods: For a new critical patient, patient's baseline information and monitoring data would be put into the attention-based TCN model as data ow after automatically data preprocessing; Then the mortality risk will be predicted at different time points according to the patient's speci c condition (here we predict the mortality risk 48h after ICU admission); If the estimated mortality risk is high, the patient will receive intensive monitoring and intensive treatment; if the estimated mortality risk is low, the patient will receive intensive monitoring and routine treatment. In brief, the whole process is Warning →Intervention →Warning →Intervention→……→Patient outcome.
There are several limitations in this study. First of all, though the variables in our study were routine and most of them were time series, some more routine and frequently collected variables, such as lactic acid and results of arterial blood gas analysis, should be included to help elevate the prediction accuracy. Secondly, clinical data are extracted from one medical center, so the generalization ability of the model and its possibility of clinical application is not validated. Prospective multi-center studies should be carried out to investigate the clinical value of combing TCN with attention mechanism to predict patient's mortality risk using temporal clinical data.

Conclusion
Attention-based TCN methods achieved better performance in predicting mortality risk with time series data than non-time series models, which suggested it might be potential for decision-making in ICU by dynamic prediction of mortality risk with continuous data ow.

Availability of data and materials
The data that support the ndings of this study are available from MIMIC III dataset but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data  Tables Please see the supplementary les section to view the tables. Figure 1 Data partition and veri cation. The structure of the attention-based TCN model for prediction of mortality risk in ICU.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.