It is critical to identify individuals at high risk for SAP and to further tailor timely prophylactic interventions or therapeutic antibiotics. However, for now, the early prediction of SAP in sICH populations is challenging due to the lack of widely accepted prediction tools, which are important for modern precision medicine and evidence-based medicine (EBM) in this field. Thus, we aimed to derive more effective and automatic sICH-SAP prediction tools in this work. The novel ML prediction models were derived and validated as an attempt to combine AI medical engineering and clinical practice in this field. The major findings were as follows. (1) The incidence rate of sICH-SAP was close to 30%, and the sICH-SAP events significantly contributed to prolonged hospital stays, increased hospital costs, and higher mortality. (2) Six independent predictors for sICH-SAP were identified — nasogastric feeding, airway support, unconscious onset, surgery of EVD, sICH volume, and ICU stay. (3) ML prediction models were successfully derived and showed better performance metrics than traditional scoring systems from previous studies; the GNB and LR models showed the highest AUC values of 0.861 (95% CI: 0.793–0.930) and 0.867 (95% CI: 0.812–0.923) on the internal and external validation datasets, respectively. (4) There was no certain single algorithm with dominant ability and robustness in cross- and external validations, while the ESVM was considered averaged in metrics and robust in different populations after multiple validations.
Various predictors for SAP were identified in prior literature [4, 5, 8–16]. This work screened for independent variables for sICH-SAP events by using univariate and multivariate analyses in the FAHFMU subcohort. Nasogastric feeding, airway support, and unconscious onset were identified as strongly associated risk predictors, which overlapped with the results of previous studies [4, 8–16]. Nasogastric feeding and airway support measurement were recognized as SAP predictors, which might bring about secretion disturbances in nasal/oral/tracheal cavities, decreased air filtrations, and even aspiration events [4, 8, 15, 16]. These early interventions were secondary to the manifestation of unconsciousness. Previous studies mainly included the ranked variable of the Glasgow Coma Scale (GCS) score and rarely adopted the onset manifestations [4, 10, 11, 14–16]. In this work, the admission GCS score and unconscious onset were simultaneously introduced into the analyses, and the categorical variable of unconscious onset was independently significant for sICH-SAP. The predictors of sICH volume and ICU stay were also reported in previous studies [4, 11, 15] and contributed the least to predicting SAP in this work. The sICH volume resulted in SAP being a primary factor influencing stroke severity, and ICU stay was a comprehensive intervention secondary to stroke severity and resulted in infectious environments. These aforementioned predictors are usually uncontrollable for actively preventing SAP in clinical practice. However, there were still novel findings in the subgroup analysis that only the surgery of EVD was a significant independent predictor (P < 0.001 in FAHFUM/P = 0.001 in external subcohorts) of all surgical approaches in this work, while EVD was only previously reported as a univariate factor for overall infections [4]. On the other hand, the surgery of sICH catheter evacuation did not significantly contribute to SAP events in any univariate analyses (both P = 0.089 in FAHFUM/external subcohorts), which was in accordance with the undifferentiated non-neurologic infections in the MISTIE III trial [26]. This suggests that we should continuously focus on the stratification of surgical approaches in the prospective cohort of the Risa-MIS-ICH study for convincing evidence.
We observed that there was populational heterogeneity from multiple centers, which might result in variations in demographic features, stroke severity, and even baseline laboratory results and further inconsistently significant results in univariate analyses. Interestingly, heterogeneity was not found in these six independent variables, which were effectively predictive in external validation. Therefore, these six independent variables were considered robust in different populations, and robustness evaluations for further ML models became possible in this work.
To date, none of the SAP prediction models have been widely available in clinical practice, and only the ICH-APS score has been developed for sICH populations as a mature SAP prediction model [8–13]. The ICH-APS score also included early indicators, and the AUC value was 0.74 (95% CI: 0.72–0.75) on its original validation dataset from the China National Stroke Registry (CNSR). In this work, our optimal ML prediction models achieved higher AUC values of 0.861 (95% CI: 0.793–0.930) and 0.867 (95% CI: 0.812–0.923) in the internal and external validations, respectively. Our ML prediction models showed greater predictive abilities than the ICH-APS score on their original validation datasets.
Li et al developed ML models to predict SAP events in Chinese AIS populations, which presented better performance with the highest AUC value of 0.843 (95% CI: 0.803–0.882) than other AIS-SAP prediction scores (0.835 for A2DS2, 0.786 for PNA, 0.785 for AIS-APS, and 0.78 for ISAN scores) [27]. According to metrics from the literature and this work [27–30], the ML prediction models for SAP showed better performance metrics than traditional scoring systems in both sICH and AIS populations. However, due to incomplete variable collections, horizontal comparisons of different prediction models on the same validation dataset were not possible. Despite the defects, the prediction models usually performed better in internal validation than in external validation due to the intrinsic consistency of original datasets and populational heterogeneity, and the comparisons on their respective original validation datasets usually explained the significance [31].
In this work, we used six separate algorithms and an ensemble model for SAP predictions, each of which has been validated and compared to identify the optimal prediction tool. For the six basic algorithms, LR, GNB, RF, and XGB generally showed better performance metrics than KNN or SVM in the internal/cross-/external validations, and LR and GNB required less training time than the other algorithms. It is worth mentioning that there was no certain model with the dominant ability and robustness among internal/cross-/external validations, and the indeterminacy restricted the previous model selection and implementation in clinical practice. Therefore, a general and robust model is required for stable predictive ability. The traditional research mainly focused on the mutually separated algorithms, and only the optimal algorithm was chosen as the option, while ensemble ML models were reported as successful classifiers with greater performance outcomes in the literature [29, 30]. In the real world, there would be no fault-tolerant chance of model selections due to ethical considerations. The predictive ability of one single algorithm was uncertain due to the inconsistent ML algorithmic performance outcomes among the internal/cross-/external validations. Thus, we additionally derived ESVM based on a soft voting system incorporating six basic ML algorithms, which was moderate but surprisingly robust in each metric. Notwithstanding that the occupied machine sources of the ESVM equals the summation of the six basic algorithms, this disadvantage could be ignored by timed training and then pro re nata invoking.
A practical ML prediction model requires high accuracy and automation in the real world, which might represent the main directions of cross-nested algorithm deepening and AI medical engineering development in the coming decades. The synthetic minority oversampling technique (SMOTE) and principal component analysis (PCA), which aimed to adjust imbalanced classifications and reduce data dimensions to reduce overfitting and the training time, respectively, were tried but abandoned for reasons in this work [32, 33]. However, our failed attempts and other undiscovered advanced AI techniques could be employed in other AI medical research and clinical practice in the near future. By dynamically evaluating the keyed-in clinical manifestations, the resulting values from the laboratory information system (LIS) and the captured data from the picture archiving and communication system (PACS), the internally installed sophisticated algorithms in the EMR system would ceaselessly learn and then calculate the prediction for high-risk individuals in the prospect. The attempts and prospects in this work might be considered as the progression of automatic clinical evaluations and AI-assisted decision-making.
We have strengths that deserve comments. An external subcohort and multiple forms of validation were introduced in this work. Therefore, there were populational and algorithmic robustness of convincing results. Based on the aforementioned circumstances, we derived novel ensemble models for generalizability, which showed moderate but robust predictive abilities in different populations and were fit for real-world practice. However, there are limitations that should be acknowledged in this work. First, the observational retrospective design might introduce unmanageable bias. Uncontrollable baseline characteristics in the observational study might confound SAP risks and further model derivations/validations. Second, some important variables were missing due to the retrospective collection of data in this work. The National Institute of Health Stroke Scale (NIHSS) score, uniform CT scan parameters, scanning timing, and other unrecorded details were unreachable in the retrospective cohort of the Risa-MIS-ICH study and resulted in the inability to perform horizontal comparisons with external models in this work. Third, there are defects of the deep analyses for SAP. The subgroup analyses on pneumonia severity, radiological features, or pathogenic agents were all absent. A simple overall SAP analysis might be rather rough for complex and heterogenetic pulmonary infections. Future studies on our ongoing prospective cohort might resolve the aforementioned problems.
To the best of our knowledge, this was the first reported attempt of ML prediction model for sICH-SAP. The authors not only aimed to derive superior statistical models for SAP prediction but also attempted to combine AI medical engineering and clinical practice in this field. We truly anticipate that this technique will be developed as an effective and automatic tool for predicting sICH-SAP in the near future.