Dynamic predict in-hospital mortality risk in intensive care unit with a new deep learning of articial intelligence

Dynamic prediction of patients’ mortality risk in ICU with time series data is limited due to the high dimensionality, uncertainty with sampling intervals, and other issues. New deep learning method, temporal convolution network (TCN), makes it possible to deal with complex clinical time series data in ICU. We aimed to develop and validate it to predict mortality risk using time series data from MIMIC III dataset.


Abstract Background
Dynamic prediction of patients' mortality risk in ICU with time series data is limited due to the high dimensionality, uncertainty with sampling intervals, and other issues. New deep learning method, temporal convolution network (TCN), makes it possible to deal with complex clinical time series data in ICU. We aimed to develop and validate it to predict mortality risk using time series data from MIMIC III dataset.

Methods
Finally, 21139 records of ICU stays were analyzed and in total 17 physiological variables from the MIMIC III dataset were used to predict mortality risk. Then we compared the model performances of attentionbased TCN with traditional arti cial intelligence (AI) method.

Results
The Area Under Receiver Operating Characteristic (AUCROC) and Area Under Precision-Recall curve (AUC-PR) of attention-based TCN for predicting the mortality risk 48 h after ICU admission were 0.837(0.824-0.850) and 0.454. The sensitivity and speci city of attention-based TCN were 67.1% and 82.6%, compared to the traditional AI method yield low sensitivity (< 50%).

Conclusions
Attention-based TCN model achieved better performance in prediction of mortality risk with time series data than traditional AI methods and conventional score-based models. Attention-based TCN mortality risk model has the potential for helping decision-making in critical patients.

Background
The in-hospital mortality of patients in intense care unit (ICU) is relatively high, which ranged from 6.7-44.0% worldwide (1,2). With the development of critical care medicine, larger amounts of data helped doctors to make decisions, however sometimes overwhelm doctors in reverse. Thus, tools which help doctors make decisions based on large amounts of monitoring and clinical data are badly needed.
In the past, score-based models, such as simpli ed acute physiology score (SAPS II), Acute Physiology and Chronic Health Evaluation II (APACHE II), were commonly used for patients' evaluation based on prediction of mortality risk (3,4). When applied to large populations, the diagnostic performances of score-based models are relatively poor (1, 2, 5-8). Recently, methods based on arti cial intelligence (AI), Loading [MathJax]/jax/output/CommonHTML/jax.js including conventional machine learning (ML) methods and deep learning methods have been applied to help doctors decision-making by predicting patients' mortality risk (9)(10)(11). Comparing with statistical score-based models, methods based on AI usually had better model performance, which be related to that AI methods had the advantages of dealing with complex non-linear relationships between variables and patients' outcome over score-based models. However, there are some limitations of the referred researches. One of the biggest problems were that the worst record of a period time or statistical variables (maximum, minimal or others) replaced the repeated measured variables such as vital signs to predict the mortality risk. In ICU reality, overall trend and coupling of changes between different physiological variables may provide prognostic information, which will also help to elevate the accuracy of prediction model (12). The ideal tool to help doctors' decision-making requires optimum use of all the associated routine variables especially time series data and realizing dynamic prediction. However, due to the complex of the time series data, studies about dynamic prediction using temporal clinical data are limited.
The challenges of predicting mortality risk in ICU were summarized by Ikaro et al (12): First, measurements of time series data vary from each patient meanwhile the time interval is irregular; Secondly, the choice of measurements and the trends of time series data correlate with each other. As time series deep learning models, the Long Short-Term Memory (LSTM) (13) and its derivatives Gated-Recurrent Unit (GRU) (14), were used to predict the mortality risk of ICU patients which achieved better area under Receiver Operating Characteristic (AUCROC) and area under Precision-Recall curve (AUC-PR) than conventional score-based models. However, due to that data is processed sequentially over time, the LSTM and GRU had the shortcomings of large computing, time consuming, high hardware requirements and vanishing gradients which result in di culties of dealing with big data and applying to clinical popularization. It is widely accepted that deep learning has the shortcomings of lower explanation and large computing. Attention mechanism simulates the data processing of human brain, and nowadays were combined with LSTM or other deep learning methods to improve computational e ciency or the interpretability (7,15,16). However, the limitations of ine cient particularly when processing long sequences stilled existed due to the characteristics of the method itself. A better deep learning method overcome the current limitations is badly needed. Recently, a new deep learning method, temporal convolution network (TCN) was developed, with the characteristics of parallelism, xed gradient and smaller memory of training. Furthermore, Bai et al (17) reported that the TCN has even better performance than LSTM or GRU. Developing attention-based TCN model may not only elevate the interpretability, reduce the computation complexity, but also extend the clinical use due to its higher e ciency for long sequences. Therefore, we aimed to develop attention-based TCN model to predict the in-hospital mortality risk 48 h after admission in ICUs with time series data and compare the model performances with conventional ML methods, namely random forest (RF), logistic regression (LR), decision tree and support vector machine (SVM).

Methods
Loading [MathJax]/jax/output/CommonHTML/jax.js Ethics and data extraction Ethical approval for this study was approved after completion of the National Institute of Health(NIH) Web-based training course named "Protecting Human Research Participants" by the author Yu-wen Chen (Certi cation Number: 28341490). Data used for the prediction of mortality risk were extracted from the Multi-parameter Intelligent Monitoring in Intensive Care (MIMIC) database. All the data from the database had been treated with data masking to protect patients' privacy. So there was no requirement for written informed consent in the current study. There were 61532 records of ICU stays in Beth Israel Deaconess Medical Center ICUs, including clinical notes, physiological waveforms, laboratory measurements, and nurse-veri ed numerical data(18). The exclusion criteria were as following: any hospital admission with multiple ICU stays or transfers between different ICU units or wards, which would reduce the ambiguity of outcomes associated with hospital admissions rather than ICU stays; patients younger than 16; patients' whose initial ICU stay was missing or less than 48 hours; ICU events with no events in the initial 48 hours.
As shown in Fig. 1A, there were 18094 were for nal analysis. We divided enrolled patients randomly into three datasets, namely training dataset (12565 patients), validation dataset (2766 patients) and testing dataset (2763 patients).

Data preprocessing
We use 17 physiologic variables (shown in Table 1) representing a subset from the Physionet/CinC Challenge 2012 (12). Up to 17 variables were recorded at least once during the rst 48 hours after admission. Not all variables were available in all cases. We used all raw values for time series measurements included in the score. For Glasgow Comma Score (GCS), we included GCS (Verbal response), GCS (Motor response), GCS (Eye opening) and GCS (total) as different features. The rest of the variables included weight, height, temperature, respiratory rate (RR), heart rate (HR), diastolic blood pressure (DBP), Mean blood pressure(MBP), systolic blood pressure(SBP), fraction inspired oxygen (FiO 2 ), oxygen saturation (OS), pH, glucose, and capillary re ll rate (CRR). When value was more than three standard deviations away from each individual mean value, it would be removed. Twelve of them were continuous and ve discrete. All time series variables were resampled into hourly rate starting from ICU admission. Forward imputation was conducted for missing values. When an entire measurement was missing during the observation time, mean value from the dataset was used. For discrete variables, we performed One-Hot encoding. For continuous variables, we performed Z-score normalization to scale the feature values. Each patient's record was summarized into a visualization data matrix 59 × 48 for 48-hour observation period as the input for deep learning.  Fig. 1-3). The causal convolution makes the TCN a strict temporal model by using data from time t and earlier in the previous to predict the status at time t when model training. TCN allows the input of convolution to be sampled at intervals to broaden the eld of perception (make the most of information) due to use of the dilated convolution. The residual connections enable the network to transmit information across layers, which Loading [MathJax]/jax/output/CommonHTML/jax.js are usually used to train deep network. In addition, TCN adds Dropout to each hole in the residual module to achieve regularization. Attention mechanism was introduced into the TCN model to elevate the e ciency and the interpretability. Detailed information about TCN and attention mechanism was presented in the supplementary method.
The structure of attention-based TCN model was shown in Fig

Non-time series model construction
We also predicted the mortality risk by non-time series ML methods such as RF (19), LR, Decision Tree (DT) and SVM. Due to the limitation of these ML methods, the in-put for these models were not time series data but results of feature extraction (statistical variables, such as the minimum, maximum, average of the variables).Then the preprocessed data were used for model construction and evaluation.

Model evaluation
Model performance was assessed by overall performance, discrimination, and calibration. The overall performance is determined by F1 score. The F1 score is de ned as the harmonic mean of accuracy and recall, which considers both the precision and the recall equally. Discrimination is the capability to distinguish between those who survival and those who do not 48 h after admission in ICU by AUCROC and the area under the Precision-Recall curve (AUC-PR). The AUC-PR is sensitive for the imbalance distribution of the negative and positive data especially for an extreme small portion of positive data. Calibration is assessed by Brier score via calculating the averaged squared deviation between the predicted probability and the actual outcome.

Statistical analysis
The statistical analyses were performed using SPSS software for Windows, V. 19.0 (SPSS). Quantitative variables are presented using basic descriptive statistics: mean and SD (for normal distribution data), or median and IQR (for non-normal distribution data). Comparisons among datasets were performed using the chi-square test or Fisher's exact test, or Kruskal-Wallis test. All statistical tests were two sided, and P values less than 0.05 indicated statistical signi cance.

Data distribution
Finally, there were 18094 patients for analysis. Patient demographics and characteristics of the three datasets were presented in Table 2. There were no statistically signi cant differences in age, gender, and ICU length of stay among them. The mortality rate of our cohort was 15.4%. Though the mortality rate of patients in the testing dataset was signi cantly lower than that of training and validation datasets, the mortality rate of patients in test dataset was similar to that of patients in our whole cohort. Model performance of time series and non-time series models As shown in Table 3 and Fig. 2A, comparing with the statistical methods, AI methods had larger AUCROC and AUC-PR which indicated a better capacity of discrimination. Though the AUCROC and AUC-PR of attention-based TCN were smaller than that of non-time series ML methods, which also had an acceptable ability of discrimination. Furthermore, comparing with non-time series ML methods, the attention-based TCN had the highest sensitivity (67.1%) and F1 score (0.46). Models with high speci city but lower sensitivity, resulting in missing patients potentially at risk, which would violate our initial purpose of helping doctors dynamic evaluating the mortality risk of patients. As for other time series methods, the sensitivity of attention-based TCN was much higher than that of model by LSTM (46.1%) based on the same database (7), with only a little difference in the AUC-PRs between them. It indicated that models developed by attention-based TCN had higher accuracy and lower omission diagnose rate than those by LSTM, which may be related to the difference between the input variables. As for model calibration, the brier score of attention-based TCN was higher than that of other conventional ML models, which may be associated with the high dimension of time series data. Taking the purpose and clinical application into consideration, due to the high sensitivity, F1 score and relative satis ed ability of discrimination, model performance of attention-based TCN was the best among the listed methods in Table 3.  Except for the good model performance, the attention-based TCN methods may also have the potential advantage of better interpretability.

Discussion
There are several score-based models for predicting the mortality risk, such as SAPS (3), APACHE (20), OASIS (21) and Sequential Organ Failure Assessment (SOFA) (22). These models are all non-time series and based on statistical methods, the input data are static data or statistical data, such as comorbidities and the minimum of systolic pressure in the rst 24 h, which make it impossible to predict the mortality risk in the rst 24 h or to update data for predicting long-term mortality risk. Despite the AUCROCs of the score-based models were satis ed, either the sensitivity or the speci city was poor (23,24). It's not suprising that these models have been modi ed several times to improve their predictive performance since they rst being published (25). Recently, for the complex, non-linear relationship between clinical variables and the outcome, non-time series AI methods, such as Arti cal neural work (ANN), SVM, DT, RF, Naive Bayes, projective adaptive resonance theory (PART) and AutoTriage, were used to predict the mortality risk of patients in ICUs (5,11,24,26,27) with relatively satis ed model performance. However, due to the non-time series methods, all the variables are static or extracted from time series data, which makes it impossible to realize dynamic prediction. Herein, the AUCROCs and AUC-PRs of attention-based TCN model was larger than that of conventional score-based models in the same database according to Harutyunyan et al's study (8). It is a pity that Harutyunyan et al did not show the sensitivity and speci city of conventional models. Regardless of the slight difference in AUCROCs and AUC-PRs among attentionbased TCN and other non-time series ML methods, the sensitivity of attention-based TCN was much higher than that of others. In clinical works, when decision-making happens, doctors should take medical history, physical examination and trend of vital signs into consideration. The ideal model for predicting mortality risk is taking both time series and static clinical data into consideration, moreover simultaneously realize dynamic prediction. Furthermore, due to the instable status of ICU patients, the sensitivity seemed more important than the speci city, for missing the potential patients at risk might be fatal. In brief, attention-based TCN method was better than non-time series methods in predicting the motarlity risk of ICU patients. As shown in Figure.3, we drew a diagram for clinical use of predicting the mortality risk of ICU patients by attention-based TCN methods: For a new critical patient, patient's baseline information and monitoring data would be put into the attention-based TCN model as data ow after automatically data preprocessing; Then the mortality risk will be predicted at different time points according to the patient's speci c condition (here we predict the mortality risk 48 h after ICU admission); If the estimated mortality risk is high, the patient will receive intensive monitoring and intensive treatment; if the estimated mortality risk is low, the patient will receive intensive monitoring and routine treatment. In brief, the whole process is Warning →Intervention →Warning →Intervention→……→Patient outcome.
There are several limitations in this study. First of all, though the variables in our study were routine and most of them were time series, some more routine and frequently collected variables, such as lactic acid and results of arterial blood gas analysis, should be included to help elevate the prediction accuracy. Secondly, clinical data are extracted from one medical center, so the generalization ability of the model and its possibility of clinical application is not validated. Prospective multi-center studies should be carried out to investigate the clinical value of combing TCN with attention mechanism to predict patient's mortality risk using temporal clinical data.

Conclusion
Attention-based TCN methods achieved better performance in predicting mortality risk with time series data than non-time series models, which suggested it might be be potential for decision-making in ICU by dynamic prediction of mortality risk with continuous data ow.