In this retrospective cohort study, we selected 9 acceptable features using the Boruta algorithm and developed 8 ML models to predict the risk of PSD in stroke patients. Among the 8 models, the XGB model showed the highest AUC and good clinical applicability. Furthermore, the most impactful features for predicting PSD, in descending order of importance, were age, high-sensitive CRP, right lesion, temporal lobe, and cerebral hemorrhage.
Increasingly, research acknowledges that the risk factors and progression of PSCID are determined by a multitude of factors, including age, comorbidities, type of stroke (ischemic and hemorrhagic), education level, and the location and size of the stroke [3, 19]. This study encompassed patients with both types of stroke, offering a more comprehensive perspective for understanding and predicting PSD. Yan et al.'s investigation into developing a risk model for predicting the occurrence of MCI after stroke revealed that the logistic regression model achieved the highest AUC of 0.8595 [17]. However, owing to the small sample size (n = 199), the model may not be sufficiently effective. Given the complex and varied factors influencing PSCID, prediction in clinical settings is challenging, with lower accuracy compared to post-stroke functional outcomes. ML methods are often more suitable when dealing with complex influencing factors. Stroke can be categorized into hemorrhagic and ischemic strokes, with hemorrhagic strokes constituting a significantly higher proportion of all strokes, approximately 87% according to data from Johns Hopkins Medical Center [20]. Thus, research focus and resources are more inclined towards ischemic stroke. Lee et al. developed four ML models to predict the risk of PSCI in patients with acute ischemic stroke, among which the XGB model exhibited the best discrimination (AUC = 0.7919) [16]. Although PSCI and PSD may clinically overlap, they differ in definition, severity, treatment strategies, and management. Dementia represents a more severe clinical outcome, significantly impacting patients' daily lives and independence. Therefore, by concentrating on PSD as an outcome, effectively predicting its occurrence can facilitate the implementation of prevention and intervention measures at an earlier stage.
Despite the SIGNAL2 risk score and CHANGE risk score models exhibiting good discriminative ability in predicting PSCI, both models use Mini-Mental Status Examination (MMSE) ≤ 25 or Montreal Cognitive Assessment (MOCA) ≤ 22 as cut-off raw scores and incorporate age and education level as variables with high weight. However, since MMSE and MOCA scores are highly dependent on age and education level [10, 11]. Therefore, regardless of the clinical characteristics of the patient, it may have little effect on the predictive outcome. In our study, the ML models we constructed can continually learn and adapt to new data, improving their accuracy over time, unlike the static SIGNAL2 and CHANGE models. By applying the Boruta algorithm, we identified the most important features in the dataset for the predictive models. This approach, unconstrained by data type, allows for comprehensive feature selection, eliminating irrelevant features to reduce the risk of overfitting, thereby enhancing the model's accuracy and interpretability. We incorporated variables in the green zone into the 8 models, with the XGB model achieving the highest AUC value. Lastly, we employed SHAP to quantify the contribution of each feature to the predictive outcome. In the XGB model, the most important features were age, high-sensitive CRP, right lesion, temporal lobe, and cerebral hemorrhage. These key features show both consistency and differences with previous research on PSCID risk factors [15–17, 21].
Most studies agree on the close association between age and cognitive decline. For instance, the REGARDS study found that each additional year of baseline age increased the likelihood of cognitive impairment by 17% during the follow-up period [21], aligning with non-stroke population studies that identified older age as a significant risk factor for cognitive impairment [22]. However, Yan et al.'s findings suggested no correlation between age and the occurrence of MCI after stroke [17]. A systematic review and meta-analysis assessed the potential of various blood-derived proteins as biomarkers for PSCI, recommending Hcy, CRP, total cholesterol, and low-density lipoprotein as potential biomarkers for PSCI [23]. The high-sensitivity CRP test, capable of measuring low concentrations of CRP in blood, is useful for assessing low-grade inflammation and cardiovascular risk. A significant association between high-sensitive CRP concentrations and long-term cognitive decline was observed in a large study involving 5257 participants [24], consistent with our findings. Our research provides additional evidence supporting high-sensitive CRP as a potential biomarker for PSD, enhancing its potential in PSD prediction and monitoring. No association between Hcy and PSD was found in our study, although high Hcy levels have been confirmed as a risk factor for cerebrovascular events and cognitive decline [23]. The reasons for these differences may include sample selection bias, differences in data collection, or analytical methods.
The highlight of our study is the first-time construction and comparison of the performance of 8 predictive models for the risk of PSD. Furthermore, this research integrates ML techniques with demographic and imaging features to predict PSD. However, there are several limitations to our study that cannot be ignored. Firstly, this is a single-center retrospective cohort study, and the data quality and diversity might be affected, necessitating external validation and optimization. Secondly, not all patients underwent all examinations, leading to missing features in some cases. Although we excluded patients with more than 10% missing values and employed multiple imputation for features with less than 10% missing values to mitigate this concern, the possibility of residual effects remains.