Using Gated Recurrent Units Models for Early Prediction of Sepsis in the Intensive Care Unit

Background: Sepsis is one of the major causes of mortality in hospitalized patients. Therefore, a reliable means of predicting sepsis onset is of great importance. The purpose of this study was to develop a gated recurrent unit (GRU) based model and explore whether it can improve model performance in predicting sepsis for up to 6h from the time of admission in intensive care units (ICU) compared to traditional sepsis prediction methods. Methods: The data used for model development in this study were from retrospective MIMIC-III dataset, restricted to intensive care units (ICUs) patients aged between 15 and 89. Model performance of GRU model were compared to logistic regression (LR), support vector machine (SVM), random forest (RF), and extreme gradient boost (XGBoost). The area under the receiver operating characteristic (AUROC) measures the performance of the prediction capability of the models. Results: A total of 31297 MIMIC-III cases are included in this article and 4008 cases had encountered sepsis while 27289 had not. As for the AUC (0.801 (95% CI): 0.760-0.841), 0.782 (95%CI: 0.743-0.821), 0.775 (95%CI: 0.736-0.813), 0.771 (95%CI: 0.732-0.809), 0.749 (95%CI: 0.711-0.886),) results of the models, GRU performed best in predicting sepsis. Conclusions: The present study concluded that by using GRU deep learning method, a more accurate prediction model can be established. The GRU deep learning method we build can prove clinically helpful and assist physicians in tailoring accurate management and treatment for patients with sepsis. a comparison with the model of predicting sepsis. to demonstrate the latent of utilizing GRU algorithms to conduct predictions for the HS of patients. to the laboratory results, GRU presents a higher AUC compared to traditional ML methods in all computed scenarios. This result implicates that the application of GRU can improve the prediction performance of sepsis in ICU and it can further decrease the mortality of sepsis.


Introduction
Millions of patients are admitted to intensive care units (ICUs) around the world every year. The health information of patients including vital signs, laboratory test results and demographic details are recorded in ICUs to support medical personnel make life-saving proposals [1]. However, high levels of indeterminacy and severe time restriction make the decisions derived from tremendous volume of complicated clinical data tend to be inaccurate. The clinical data can be integrated and interpreted by the artificial intelligence algorithms which can deal with repetitious patient evaluations in real-time far more effectively thus improve timely and targeted diagnosis [2].
The early prediction of sepsis remains challenging in the ICUs settings. Sepsis is a long-standing and life-threatening issue that emerges due to the patients' response to infection, which can lead to tissue damage, organ dysfunction, and even death [3]. It happens when the immune system of patients releases chemicals into the bloodstream system to cope with infection and leads to inflammation all over the patients' body. To this day, sepsis presents high morbidity and mortality and is the major cause of mortality in the USA, besides, it's the most expensive circumstance associated with in-hospital stay, which accounts for nearly $24 billion annually [4]. During the second decade in the 21st century, the sepsis incidence worldwide has reached 0.43% and approximately 6 million deaths every year were caused by the development of sepsis. Furthermore, 4.2 out of 30 sepsis cases occurred in newborns and kids annually [5].
It's a crucial section to make an early prediction and antibiotic treatment for improving sepsis consequences such that a few moments of delayed intervention led to a considerable rise in mortality. However, the prognostic of sepsis in its early stages is tough for medical personnel due to the heterogeneous nature of infectious insults and the diversity of host responses. In 2016, the whole definition of sepsis was renewed as The Third International Consensus Definitions for Sepsis [6](Sepsis-3) to clarify the state of sepsis and therefore to boost earlier prediction of sepsis, however, the previous effort cannot clean the hurdle of early prediction and treatment of sepsis.
Previous researchers have defined amount of disease scoring systems and diagnostic criteria to detect the emergence of patients in the hospital. One of the early proposed sepsis diagnostic criteria is Systemic Inflammatory Response Syndrome [6](SIRS) (Figure 1), SIRS is confirmed when at least two of the four symptoms in Figure1 are present. Sepsis-3 is a redefinition of sepsis that has been presented lately which posed importance on the priority of the non-hemostatic host response to infection, the potential mortality tremendously over a normal infection, and the urgent demand for the prognostic. A scoring system called Sequential Organ Failure Assessment [7](SOFA) was proposed as a screening mechanism in 1996 which aimed to record the patients' condition in the ICUs and calculate the organ function based on Sepsis-3(Table1). Several medical indexes are utilized in SOFA to conduct the prediction: the mean arterial pressure, serum glucose, bilirubin, PaO2/FiO2 ratio, platelets, and creatinine. If more than two points in the SOFA changes acutely, the consequence of organ failure can be concluded due to the infection throughout the patient's body. The severe SOFA score can result in a rise in the probability of death.
These methods utilize tabulation of vast patient vital signs, demographics, and laboratory test results to calculate risk scores. However, most of these score systems were proposed a long time ago. The outstanding performance of them is based on the population or medical level at that time, which means, the change of the medical settings leads to a decline of its ability of prediction. More than that, some researchers have found that when facing the risk of in-hospital death of sepsis cases, these score systems perform badly and can lead to misdiagnosis, even death [8].
In considering the poor performance of the precious score systems, some other models have been presented for predicting the risk of in-hospital death amongst ICU patients with sepsis. Machine learning (ML) methods are widely used in the early prediction of the risk of sepsis. In 2018, Shamim Nemati [9] presented an interpretable machine learning model for accurate prediction of sepsis in the ICU, a total of 65 variables forms the electronic medical record, and real-time clinical data were collected and they were used as input features to a Weibull-Cox proportional hazard model. The corresponding area under the receiver operating characteristic curves (AUROC) reported by the model are higher than 0.79. Calvert [10] developed a high-performance early sepsis prediction method for the general patient population called InSight. InSight is a machine learning-based workflow for sepsis prognostic which computes, in realtime, the risk that a patient will develop sepsis. InSight uses vital signs, age, Glasgow Various existing limitations can be found in the aforementioned prediction models.
Firstly, most of the previous researches does not account for the temporal developments and real-time data assessment of the sepsis and rely on conventional modeling schemes, such as support vector machine and artificial neural network. Secondly, the latest released definition Sepsis-3, the criterion introduced by Singer, should be utilized instead of older gold standards such as SIRS. Finally, the development of sepsis in a long period is subtle that previous works can hardly learn about the discriminative patterns of sepsis and analyze the informative development during a patient's stay, for example, only a few changes can be noticed on white cell volume and the temperature of patients at the early stage of sepsis.
To clear these limitations, we present a brand-new method for real-time early prediction of sepsis onset for patients admitted to ICUs Our main contributions are summarized as follows.

i)
To the best knowledge of our knowledge, this article is the first attempt to use a Gated Recurrent Unit method to conduct prediction to a progressive -sepsis and make comparisons between it against other traditional ML methods.
ii) We run our models with a vast of variables as input which can achieve a more accurate prediction performance such as vital signs, demographics, and laboratory results while most prior research mainly focused on part of these features.

iii)
We apply the newest definition of sepsis, Sepsis-3, as our gold standard for our predictive algorithm which makes the prediction more valuable for the clinicians and patients. Research Participants", the Record ID of the Completion Report is 40043867.

Gold Standard
The gold standard used in our study is the sepsis definition promulgated by Singer et al in 2016 [6]. This study defined sepsis as "life-threatening organ dysfunction caused by a dysregulated host response to infection signified by an acute change in total SOFA score >2 points consequent to the infection." As for the MIMIC-III database, we utilized the International Classification of Diseases, ninth revision, Clinical Modification (ICD-9) [14] diagnosis codes to label the patients who had been infected. However, ICD-9 is only recorded for a limited amount of complications and tremendously different diseases can often share the same code once they have the same cost. Hence, ICD-9 may present false positives of septic and can hardly demonstrate the real condition of patients. As a result, it has been widely argued that ICD-9 codes cannot be used for determining worthy gold standards for various diseases. Other than that, the SOFA score system was regarded as another criterion to label the patients with sepsis from the infected group.

Inclusion Criteria
The criteria shown in Figure were a step to construct a plausibility filter to the MIMIC-III dataset. Firstly, we set a threshold on the age that the patients aged ranging from 18 to 89 can be included. Next, the patient's stay in ICU should be more than 12 hours to ensure the value of the data. Finally, in the MIMIC-III dataset, there exists missing data for some reasons, to deal with this problem, the variables with data missing rate higher than 20% should be excluded.

Data Collection and Data Preprocessing
As for the feature extraction, this study selected a total of 46 variables as the input features for the sepsis early prediction models, including demographics, laboratory data, vital signs, and others ( Table 2). We also summarize a variety of features from the MIMIC-III dataset to calculate some scores for prediction such as SOFA. All the data are extracted from the MIMIC-III dataset using Python3.7 (Anaconda Inc.; Austin, Texas USA) and its packages. For the missing data, we conducted a "carry-forward" method that the most recent value is carried forward to fill the empty place.

LR
LR is a traditional statistical method utilized to model the probability of a certain category [15]. Mathematically, a logistic regression model owns a dependent feature with two different potential values, where can be labeled as "0" and "1". The relationship between the predictor variables and the logit of the event can be expressed as the following equation (where ℓ is the logit, b is the base of the logarithm, and ßi are variables of the models): and the odds can be recovered by exponentiating the logit:

SVM
The support vector machine is a kind of supervised learning model using associated learning algorithms to analyze variables from the dataset for classification and regression analysis [16]. Other than performing linear classification, SVM can effectively deal with a non-linear classification with the so-called kernel trick [17,18], slightly mapping its input features into high-dimensional feature spaces. SVM can deal with the unlabeled data as well. It's able to find out natural clustering of the data to categorize and map the data to distinct groups.
In this study, to deal with a non-linear classification, we utilized kernel trick of the SVM and we assume a kernel function which satisfies ( , ) = ( ) • ( ).

RF
Random forest is an ensemble learning method applied to conduct classification, regression, and other tasks [19,20]. RF constructs amounts of decision trees at the training period, which is a weak classifier of machine learning methods and outputs the category result of the classification or prediction of the individual decision trees. The random forest can correct for decision trees' drawback of overfitting which may pose an impact on the accuracy of the outcome, so it has been widely used in medical prediction and diagnostic decision making. As for tree growing, it uses the Gini index as the baseline to select proper features. Moreover, RF is a classic black-box machine learning model [21] due to its ability to generate accurate results with a huge range of variables and its little need for intervention in packages like scikit-learn.

XGBoost
Extreme Gradient Boosting (XGBoost) is a kind of ML method which possesses the effective capability of dealing with missing variables and merging weak prediction algorithms to build a strong one [22]. Since it was invented XGBoost has been widely utilized as the benchmark algorithm in amounts of ML and data mining competitions.
The model construction software we used in this article is Python 3.7(Anaconda Inc.; Austin, Texas USA) and its packages, we propose a hypothesis that the GRU algorithm performs better than other ML methods in predicting sepsis.

Statistical Analysis
All the cases involved in this study were split into two groups depending on whether it meets the criteria of sepsis. All models used the train_test_split function in Python 3.7(Anaconda Inc.; Austin, Texas USA) sklearn.model_selection library to divide the data set into a training set and a test set, the test ratio was 0.3 (7:3), and the random seed was set to 5 [25], which means the patients are randomly divided into the training set for training prediction models, and testing set for testing the performance of models. Continuous variables are depicted as median value and interquartile range (25%-75%), categorical variables are depicted as numbers or percentages.
The performance of the ML models in predicting the cases was evaluated based on the confusion matrix of the process. Some of the metrics were measured such as error rate, sensitivity, precision, specificity. Besides, the receiver operating characteristic

Baseline Characteristics
After excluding the cases that didn't meet the determined criteria of sepsis or miss clinical data, a total of 31297 HS patients are included in this study, 4008 cases had encountered sepsis and 27289 cases had not respectively. Figure 2 is a flow chart that depicts the procedure for case selection. The average age of the cases with sepsis and the cases without sepsis were 61.12+/-11.24 and 59.11+/-13.91 years old respectively.
2052 cases with sepsis are male, which accounted for 51.2% of all the cases, 14081 cases without sepsis are male, which accounted for 48.8% of all the cases without sepsis.
Thus, it can be seen that age is a relatively significant variable for sepsis that compared with the healthy people the people with sepsis may be several years older, in contrast, gender is not statistically significant in this study. Table 3 is a summary concluding the comparisons of the statistic results between the patients with and without sepsis from MIMIC-III database. Table 4 shows the AUC, specificity, recall, precision, and error rate according to each model in this research. Figure 4 shows all the confusion matrix of all the models.

Discussion
This study extracted the features of the cases that potentially encountered sepsis forms the MIMIC-III database. Then, four ML models, SVM, RF, LR, and XGBoost was developed to make a comparison with the GRU model regarding the capability of predicting sepsis. The present study aims to demonstrate the latent possibility of utilizing GRU algorithms to conduct predictions for the HS of patients. According to the laboratory results, the GRU presents a higher AUC compared to traditional ML methods in all computed scenarios. This result implicates that the application of GRU can effectively improve the prediction performance of sepsis in ICU and it can further decrease the mortality of sepsis.
So far, sepsis is a life-threaten disease with high mortality and draws lots of attention from physicians and scientists [26]. The incidence of severe sepsis has been increasing around the world. However, the existing prediction methods of sepsis all have certain limitations. The early prediction of sepsis is a long last challenge task due to its multifactorial characteristic [27][28][29]. Other than that, the definition of sepsis ML method is a prevalent computational method to process large data and complex relationships between the features [30]. ML can construct a model from labeled data, learn from the data through algorithm iteration, train the model with the data, and then apply the model to make a prediction for a certain project. In the 21st century, ML methods have been widely applied in medical prediction [24,31]. In 2019, ML methods were used by Hidehisa Nishi [32] to predict results of patients with anterior circulation LVO who undergo mechanical thrombectomy and indicated it more accurate than previously presented pretreatment scoring systems. In 2017, a comprehensive decision support system based on ANN and Fuzzy_AHP were presented by Oluwarotimi Williams Samuel [33] to conduct heart failure risk prognostic, which achieved a high prediction accuracy using a total of 297 cases obtained from latent heart failure patients datasets. Philipp Kickingereder [34] presented an ANN based tool to make quantitative tumor response evaluation of MRI in neuro-oncology and proved it can serve as a blueprint for the application of ML methods in radiology. There also exist amounts of researches on making a prediction based on ML methods such as SVM, Random Forest,

Limitations
There exist some limitations in this study. Firstly, the data used to train and verify the model were all from the MIMIC-III data set, which could not guarantee that the model could be generalized to the real data of the patients around the world, and the generalization ability of the model needed to collect a lot of new data to improve.
Secondly, the research we set does not take into account the impact of timing factors on the predicted results [41], in future work we will set different control groups on the different time intervals before sepsis onset. Finally, in the era of weak artificial intelligence, many deep learning algorithms are still black boxes algorithms, such as RNN. We can only control the input and output, and we can't grasp the relationship between the internal variables. It's clear that an interpretable model can better cooperate with physicians to make prognostic and treatment to the patients. In future research, we should pay attention to understanding the process of model building and make it interpretable.

Conclusion
The prediction of sepsis is a long last challenge and we proposed a novel prediction model framework for sepsis using GRU deep learning method. By concluding the result, the present study shows that the GRU deep learning method is proved to serve as a resultful predictor which utilize readily available cases for its prediction. In the future prognostic of sepsis, deep learning methods can assist clinicians to maximize the patient's opportunity of survival.