In this study, we developed and validated three deep learning models to dynamically predict the probability of severe hemorrhage occurring at three points in time following trauma, based on vital signs data of trauma patients from a large-scale public database. It was further validated in the Trauma database of the University Teaching Hospital. Moreover, we provide an open and accessible data interface for the public to use and to validate our model. Our predictive models can help pre-hospital or in-hospital clinicians in the early identification, dynamic prediction, and decision making regarding patients with severe hemorrhage from trauma, thus saving more lives.
There are already some scoring systems for TASH. For example, the TASH score [8], PWH score [10], traumatic bleeding severity score (TBSS) [20], and modified TBSS [21] require clinical assessment, laboratory values, and ultrasound assessment. Scores such as the Hsu [22], Larson [23], McLaughlin [24], and Vandromme scores [25] require clinical assessment and laboratory values. The ABC score [9] requires clinical and ultrasound assessment. The above scoring systems, which require results of laboratory values or ultrasound assessment, are more complex and often require patients to arrive at the hospital to calculate the score results; thus, it is time-consuming for in-hospital evaluation and not suitable for pre-hospital evaluation [26]. Furthermore, most of these scoring systems are static evaluations using a single measurement, and the long detection intervals and invasive operations of laboratory or ultrasound examinations make dynamic monitoring difficult to achieve.
The TASH dynamic predictive models developed in our study only depend on vital signs, which can be easily obtained in pre-hospital or in-hospital environments, and medical staff can easily record the data regularly. Simple feature selection also ensures that the predictive models can be continuously and automatically recalculated before or during hospitalization, providing valuable information on whether patients are responding to treatment, thus making it easier for medical professionals to modify their treatment plans. In addition, the simplicity of the input and output improves the interpretability of the predictive models, thus increasing the possibility that health care providers trust their predictions [27–32].
Comparing the evaluation indexes of each model based on the MIMIC-IV database, in general, the GRU-D model is better than the GRU model; the GRU model is better than the AdaBoost, RF, and SVM models; there is no obvious difference among the AdaBoost, RF, and SVM models; and these three models are better than the LR model. The reason for the difference between the above six models may be that the LR, SVM, RF, and AdaBoost models are traditional machine learning algorithms, and the input of the models is a five-dimensional vector. The GRU and GRU-D models belong to the deep learning algorithm, and the input data is a time series of five-dimensional vectors comprising five vital signs. Moreover, the GRU model is a variant of the traditional recurrent neural network, which solves the problem of gradient disappearance. The GRU-D model is a variant model based on GRU proposed by Che et al. in 2018, which can deal with irregular sampling time series data with missing values. Its input includes the time series data, the mask, and time interval information. Then, in the process of training, it processes the time interval information between the two recorded data before and after, captures the relationship between the time series data, fills in the missing data, and makes predictions at the same time. In GRUD, data filling and prediction of results are both conducted in the process of neural network training; thus, the parameters related to data filling will be continuously optimized in the process of training and make the predictive result better [19].
The GRU-D model, which has the best performance in our study, is compared with the traditional scoring systems. The shock index has been recommended to predict massive blood transfusion and emergency operation after trauma and is widely used in pre-hospital and battlefield environments [33, 34]. The Vandromme score was put forward by Vandromme and his colleagues in 2012, which was used to identify patients with massive blood transfusion risk [25]. The Larson’s score was put forward by Larson and his colleagues in 2010 based on a combat database, which was used to predict the massive blood transfusion needs of combat casualties [23]. The OASIS, SAPS Ⅱ, and SOFA scores are commonly used severe illness scoring systems at present, which are often used to predict the severity of patients' illness or hospitalization mortality. Based on the MIMIC-IV cohort, our study compares the GRU-D model with the above scoring systems. The GRU-D model has the highest AUC, which reflects the advantages of the GRU-D model in the dynamic prediction of TASH.
The results of our study were compared with those of other studies of the same type. For example, Brockamp et al. used the TraumaRegister DGU database to compare six scoring systems and algorithms for predicting persistent hemorrhage and blood transfusion needs after trauma, namely the TASH, PWH, Vandromme, Larson, Schreiber, and ABC scores. The corresponding AUCs were 0.889, 0.860, 0.840, 0.823, 0.800, 0.763, respectively. The performance of the TASH score in that study was better than those of the other scoring systems [35]. Compared with our study, the performance of the above TASH score was not as good as that of the GRU-D model in the MIMIC-IV cohort. In addition, Mitra et al. used data from the Alfred Trauma Registry to compare three scoring systems for predicting massive blood transfusion after trauma: the TASH score (AUC 0.899), PWH score (AUC 0.842), and ABC score (AUC 0.782). The performance of the TASH score was better than that of the other two scoring systems [36]. Compared with our study, the performance of the TASH score was not as good as that of the GRU-D model in the MIMIC-IV cohort. By comparing the AUC of predictive models among different studies, the GRU-D model based on vital signs still has a good predictive effect, which confirms the advantage of the GRU-D model in predicting TASH.
In addition to internal validation, we validate the model externally based on the Trauma database of the General Hospital of the PLA. As shown in Figures 4a, 4b, and 4c, the AUC of the GRU-D model is larger than that of the other models, indicating that our models have significant generalization ability and clinical value. To help clinicians use our models, we have developed a web-based predictive system, which provides a user-friendly interface. After entering the variables, the probability of severe hemorrhage occurring at three time points after trauma is shown. These results will help clinical decision-makers understand the condition of patients and prepare appropriate treatment strategies.
This study had certain limitations. First, the study population in this study only included adult patients, and further study population division based on age was not considered. However, age plays an important role in predicting the risk of severe hemorrhage. The age of the experimental group was significantly younger than that of the control group in the Trauma dataset in our study. Some studies have shown that elderly patients are more likely to have severe hemorrhage [20]. In future studies, we will divide the patients into different subgroups based on age for further discussion [37]. Second, the severe hemorrhage predictive models can only guide the doctor's clinical decision-making process and cannot replace the doctor's clinical judgment and other diagnostic tests. Finally, this is a retrospective observational study. Although the quality of the MIMIC-IV and Trauma databases is very high, there are still data losses and input errors. Therefore, prospective validation is still needed in the future. In future studies, it is also necessary to determine whether the use of dynamic predictive models for TASH reduces the waiting time before massive blood transfusion or damage control surgery and its impact on the prognosis of trauma patients.