Interpretable machine learning model to predict rupture of small intracranial aneurysms and facilitate clinical decision

Estimating whether to treat the rupture risk of small intracranial aneurysms (IAs) with size ≤ 7 mm in diameter is difficult but crucial. We aimed to construct and externally validate a convenient machine learning (ML) model for assessing the rupture risk of small IAs. One thousand four patients with small IAs recruited from two hospitals were included in our retrospective research. The patients at hospital 1 were stratified into training (70%) and internal validation set (30%) randomly, and the patients at hospital 2 were used for external validation. We selected predictive features using the least absolute shrinkage and selection operator (LASSO) method and constructed five ML models applying diverse algorithms including random forest classifier (RFC), categorical boosting (CatBoost), support vector machine (SVM) with linear kernel, light gradient boosting machine (LightGBM), and extreme gradient boosting (XGBoost). The Shapley Additive Explanations (SHAP) analysis provided interpretation for the best ML model. The training, internal, and external validation cohorts included 658, 282, and 64 IAs, respectively. The best performance was presented by SVM as AUC of 0.817 in the internal [95% confidence interval (CI), 0.769–0.866] and 0.893 in the external (95% CI, 0.808–0.979) validation cohorts, which overperformed compared with the PHASES score significantly (all P < 0.001). SHAP analysis showed maximum size, location, and irregular shape were the top three important features to predict rupture. Our SVM model based on readily accessible features presented satisfying ability of discrimination in predicting the rupture IAs with small size. Morphological parameters made important contributions to prediction result.


Introduction
Intracranial aneurysms (IAs) occurring in around 3% adults are relatively common in the general population [1]. Ruptured IAs lead to aneurysmal subarachnoid hemorrhage with high case morbidity and disability [2]. Of note, most incidentally detected IAs have small sizes (≤ 7 mm in diameter) WeiGen Xiong, TingTing Chen, and Jun Li contributed equally to this work. [3]. Small IAs account for more than 40% of all ruptured IAs [4], which may push patients with small IAs to accept preventive treatment and endure some additional treatment risks. Therefore, early evaluation of the rupture risk of small IAs is of great significance to provide some reference for physicians and patients to formulate treatment strategies.
Scoring systems for evaluating the rupture risk of IAs have been reported [5,6]. They usually analyze aneurysms of all sizes together. However, on account of the difference of pathophysiological characteristics between small and large IAs [7], these scoring systems might not apply for small IAs well. In addition, with the deepening of the study, various rupture risk factors have been found while the complex relationship between these factors made the risk assessment of aneurysm rupture still a thorny problem. As a result, it is necessary to apply new approaches to the rupture risk prediction model of small IAs.
Machine learning (ML), as a novel kind of modeling method, could identify the correlation between features of a multivariate large sample dataset [8]. Studies suggested that it is superior to conventional statistical methods in dealing with problematic pattern problems and multivariable interactions [9]. The potency and effectiveness of ML approaches in predicting the rupture risk of IAs have been testified. Liu et al. proposed an ML model achieving an overall prediction accuracy of 94.8% in evaluating the rupture risk of IAs located at anterior communicating artery [10]. Another research used ML to stratify the risk of developing IAs for those taking health examinations and recommended further screening tests for those at high risk [11].
In this study, we constructed prediction models for the rupture risk of small IAs based on ML methods and clinical data and validated according another cross-regional center dataset. To improve the interpretability, we introduced a model interpretation technique to rank the importance of the selected input features. We aimed to develop a convenient tool to facilitate clinical decision and optimize treatment options.

Study population
We recruited a continuous series of patients with IAs from two hospitals (Hunan Provincial people's Hospital and the second affiliated Hospital of Nanjing Medical University) between September 2015 and December 2020 and obtained data retrospectively from cerebrovascular images and medical records. The ethics committee of Hunan Provincial People's Hospital has authorized this study ([2015]-10). The inclusion criteria were as follows: (1) patients with IA(s) confirmed by digital subtraction angiography (DSA), (2) patients ≥ 18 years old, (3) patients with the size of IAs ≤ 7 mm, and (4) patients with available clinical information and imaging data. Patients who were diagnosed with malignant brain tumors, fusiform or dissecting IAs, arteriovenous fistulas, moyamoya disease, other cerebrovascular diseases, and incomplete clinical and imaging data were excluded.

Data collection and data pre-processing
The baseline data of patients were as follows: age, gender, drinking, smoking, presence of hypertension, coronary heart disease (CHD) and diabetes mellitus (DM), and history of subarachnoid hemorrhage (SAH). Morphological parameters (such as size, location, shape) were extracted from 3D-DSA images and were measured by two researchers, which were supervised by two senior neurosurgeons. The maximum neck width, neck-to-dome length (from the neck center to the IA dome), and IA width (perpendicular to the neck to the dome) were measured on a 0.1-mm-scale. Size of IAs was explained as the aneurysm neck-to-dome length or the largest distance within the aneurysm sac. IAs were categorized into narrow neck aneurysm (NNA) and wide neck aneurysm (WNA) (with a neck width exceeding 4 mm or a ratio of maximum diameter to neck width less than 2). According to the position relative to the parent vessel, IAs were divided into sidewall type and bifurcation type. Shape of IAs was categorized as regular and irregular shape (with the presence of aneurysm wall protrusions, bi-or multi-lobular or small blebs). The location of IAs was specifically divided into internal carotid artery (ICA), anterior communicating artery (ACOA), anterior cerebral artery (ACA), posterior cerebral artery (PCA), middle cerebral artery (MCA), posterior communicating artery (PCOA), vertebral artery (VA), basilar artery (BA), posterior inferior cerebellar artery (PICA), and others, which was further dichotomized as anterior vs posterior circulation. The largest IA was used for analysis when a patient was detected with at least two IAs.

Outcome measure
We aimed to predict a binary outcome (i.e., ruptured or unruptured IAs). Our outcome measure was the rupture status of IAs, confirmed by three-dimensional computed tomographic angiography (CTA), magnetic resonance angiography (MRA), or digital subtraction angiography (DSA).

Feature selection and model development
The eligible patients at hospital 1 were assigned into derivation cohort (70%) and internal validation cohort (30%) using a stratified random sampling method, and the eligible patients at hospital 2 were used for external validation. Feature selection, model derivation, and hyper-parameter tuning described below were performed using the training cohort only. Before developing the ML models, z-score was applied to normalize the continuous data [12] while one-hot encoding was employed to transform the categorical data [13]. The least absolute shrinkage and selection operator (LASSO) method was applied to selected predictive features [14], in which the features with non-zero coefficients were selected as predictive features to train the ML model. We constructed ML models used to classify ruptured versus unruptured IAs with random forest classifier (RFC), extreme gradient boosting (XGBoost), support vector machine (SVM) with linear kernel, light gradient boosting machine (LightGBM) and categorical boosting (CatBoost) algorithms, and tuned model hyper-parameters using tenfold cross-validation combined with grid search [15]. In the process of ten-fold cross-validation, our training dataset was randomly stratified into 10 smaller subsets. For each fold, 9 subsets were used for model construction with a specific set of hyper-parameters and the remaining one for model evaluation. Eventually, the models were retrained using the set of hyper-parameters with the best average AUC among the 10 models, that is, the optimal hyper-parameters.

Model evaluation
Model performance measurement was the area under the curve (AUC) of receiver operating characteristic (ROC). We also compared our five ML models with the PHASES score 5 in where higher score denotes higher rupture risk. For instance, patients scoring 2 have a 5-year rupture risk of 0.4%, while those scoring 11 have a risk of 7.2%. The method of DeLong et al. was adopted to compute confidence intervals (CIs) of the AUC values and compare the different ROC curves [16]. The cut-off threshold corresponding to the maximum Youden Index was selected as the optimal cutpoint that dichotomizes the predictions from the ROC curves [17]. Values of specificity, sensitivity, and accuracy; positive predictive value (PPV); and negative predictive value (NPV) were calculated at the optimal threshold.

Model interpretation
ML models were often criticized as black boxes because the function between input features and model output was invisible to researchers. We applied a model interpretation technique named Shapley Additive Explanations (SHAP) [18] to our best performing model to reveal the importance of each included feature in order to improve its interpretability and trustworthiness. Besides, we used 2 correctly predicted and 2 falsely predicted cases that were randomly sampled from the derivation set to make explanation for individual prediction, which clarified the causes of the model's correct and incorrect prediction. The linear SVM model is a linear model, so we give a direct interpretation by extracting the weight coefficient of every variable in the SVM model in addition to the SHAP analysis.

Statistical analyses
Statistical analysis was performed for comparison of patient and IA characteristics across training and internal and external validation sets; thereinto, continuous data employ the analysis of variance (ANOVA) while categorical data employ the Fisher's exact test. Besides, in the univariable analysis of the clinical feature difference between the ruptured and unruptured group, Mann-Whitney U test or Student's T-test was applied to continuous data while chisquared test for categorical data. A two-tailed P < 0.05 was deemed as statistical significance. Data were analyzed with the SPSS software (IBM Corporation, USA).

Study population
Totally, 1004 patients were included in this retrospective research. Their mean age was 59.06 ± 10.48 years, and 70.2% of them were female. Patients and IA characteristics of the training (n = 658) and internal validation (n = 282) cohorts from hospital 1 and external validation (n = 64) cohorts from hospital 2 are presented in Table 1. Significant differences in age, history of DM, irregular shape, and rupture status were observed across the three groups (all P < 0.05, Table 1), while no significant differences in any variable could be found between the training and internal validation sets (p = 0.060-1.000, Supplementary Table 1). Results of the univariable analysis showed that maximum size, location, irregular shape, presence of hypertension, and DM were significantly related to IA rupture (all P < 0.05, Supplementary Table 2).

Model performance
The prediction models were trained using 9 predictive features and 5 ML algorithms (including RFC, SVM, XGBoost, LightGBM, and CatBoost). Predictive features were age, hypertension, DM, irregular shape, NNA, maximum size, location at ACOA, location at ICA, and location at PCOA, which were determined by LASSO analysis ( Table 2). The hyper-parameters of each algorithm are presented at Supplementary Table 3.
Values of AUC, specificity, sensitivity and accuracy, PPV, and NPV derived from the five ML models are summarized in Table 3. ROC curves and AUC values of the PHASES score and our four models are shown at Fig. 1a-

Model interpretation
SHAP analysis was introduced to reveal the contribution to the prediction outcome of each feature in the SVM model. The model tended to correlate larger size, location at ACOA,    (Fig. 2). Based on the order of importance, the top three features that have important contribution to classification of rupture and unruptured IAs are maximum size, location (ACOA and ICA), and irregular shape (Fig. 3). We also randomly sampled 2 correctly predicted and 2 falsely predicted cases from the training dataset, as plotted in Supplementary Fig. 2. The true positive prediction that the first case was correctly classified as a ruptured IA mainly resulted from maximum size of 6.3 mm, irregular shape, no hypertension, and location at PCOA. The true negative prediction that the second case was classified as an unruptured IA mainly relied on maximum size of 2.7 mm, regular shape, and absence of hypertension. Maximum size of 6.1 mm, location at PCOA, and irregular shape are the main reasons for the false positive prediction that the third case was a ruptured IA, while maximum size of 3 mm and regular shape are the main reasons for the false negative prediction.

Discussion
Physicians and patients are often caught in a dilemma when making treatment decisions for unruptured IAs, especially in small ones. Traditional statistical methods are usually not satisfactory when dealing with the complex nonlinear relationship between the vast number of data and variables. In this study, we combined the simple variables obtained in routine clinical practice and the ML algorithm to establish a model for predicting the rupture risk of small IAs. The SVM model carried out a satisfying ability of discrimination, performing the best with an AUC value of 0.817 and 0.893 in the internal and external validation. According to the SHAP analyze, size, location, shape, and presence of hypertension exerted great influences on predicting outcome.
The prime advantage of our model was convenient to apply and serve for physicians and patients. Considering that it could be a difficult task for physicians to spend much time on collecting complex additional information in their busy work, we only collect patient and morphological characteristics that can be accessed in routine clinical practice for modeling. This design could improve the convenience of our model in clinical environment well. On the contrary, two previous studies constructed ML models based on complex hemodynamics and pyradiomics-derived morphological features, which may limit their clinical promotion [19,20]. At the same time, another two researches employed convolutional neural networks to develop prediction model, which worked by identifying information from 3D-DSA [21,22]. However, ignoring important patient characteristics could exert some impact on the clinical efficacy of their models in the real world. Another advantage of this study was that we interpreted the prediction results of our model. ML has gradually become a research hotspot because of its excellent ability to handle large samples and nonlinear relationships. However, a significant defect of ML models is that they tend to operate like "black boxes," which makes them seem less reliable for experts. What we did to conquer this flaw was introducing SHAP algorithm to interpret how each feature contributed to the prediction result [18]. By this way, our ML model would not only give a prediction value about the rupture risk of IAs, but also revealed how each variable contributed to the prediction. In general, if the aneurysm with large size, irregularly shaped, located in the anterior or posterior communicating artery and narrow neck, our model would be more inclined to think this aneurysm with a higher risk of rupture and suggest physicians to take preventive measures. Physicians could also validate the interpretation of the ML model based on professional knowledge. Most of the variables included in our model have been widely studied. It has been discussed by many studies that larger size [3,23] and irregular shape [24] associated with higher rupture risk. Similar results could be concluded in our study. Furthermore, the location of IAs is significantly bound up with the rupture risk. There is a study that suggested that the location outweighs the importance of the size in predicting the risk of rupture even if adjusted on the parent artery [25]. They found that IAs at the posterior circulation are at greatest rupture risk, followed by ACA, MCA, and ICA. In our study, IAs located at ACoA were more likely to rupture, while those located at ICA were just the opposite.
Another interesting finding was that patients with a history of hypertension in our cohort showed a lower risk of rupture, which was different from some studies. This may be attributed to the changes brought about by the use of antihypertensive drugs. In a previous animal model study, they found that the normalization of blood pressure by antihypertensive drugs can reduce the rupture rate of aneurysms in mice [26]. In addition, one Finland research pointed that drug-treated hypertension may relate to the formulation of IAs instead of the rupture and bring higher rupture risk only if not be treated [27]. Similarly, several studies regarded DM as a protective factor and attributed it to the consumption of hypoglycemic agents [28,29]. More well-designed researches were required to sufficiently investigate the connection between IA rupture and drug-treated hypertension.
There are still certain limitations in our study. First and foremost, the retrospective nature of this study may  introduce impacts to our analysis. Second, most IAs of the patients had ruptured during the study period. Although ruptured IAs were indeed unstable, there were reports considered that post rupture morphology should not be considered as an adequate alternative indicator in evaluating the rupture risk [30]. However, as a reason of that, when some risk factors are known, it is unethical to allow aneurysms with a high risk of rupture not to be treated. In addition, a study has shown that the growth and development of aneurysms are irregular and discontinuous [31], which results in periods with high and low risks of rupture. It seems to imply that there may be some hidden troubles to predict rupture based on the data of unruptured aneurysms. Third, we only took into account clinically accessible factors. Some complex factors, such as morphology and hemodynamics parameters, were rarely included in the current study. Finally, although our model is satisfying in external validation, it remains problematic that the external validation dataset is relatively small. Moreover, although the sources of our two datasets are general hospitals, the differences in some inherent characteristics of the two centers may have a certain impact on the results. For example, different levels and prestige of the hospital would bring it different types of patients. Going forward, further longitudinal research is needed to validate the performance of our model.

Conclusions
Our study combined readily accessible clinical and morphological features to derive ML models for predicting the risk of small IA rupture. In internal and external validation, our SVM model showed satisfying ability of discrimination. Morphological parameters (size, location, and shape) made important contributions to prediction result.
Author contribution JJZ, HWF, and ZZH conceived and designed the study. WGX, TTC, and JL contributed equally to this work. WGX, TTC, and JL conducted the literature review. TTC performed data analysis. WGX drafted the manuscript. LX, CZ, LiX, YBL, and DC collected the data. YZW, QJ, RZQ, and ZYX polished this article. All the authors have read and agreed to the published version of the manuscript.

Declarations
Ethical approval The ethics committee of Hunan Provincial People's Hospital has approved this study ([2015]-10) and waived the requirement of written informed consent.

Conflict of interest
The authors declare no competing interests.