The White matter hyperintensities (WMH), also known as leukoaraiosis (LA), refer to changes in magnetic resonance imaging (MRI) abnormalities caused by degenerative changes in the white matter, which usually manifest as highlights on the T2 sequence or FLAIR sequence. WMH is the most common and earliest brain tissue change in chronic small vessel ischaemic disease (CSVD), and it increases with age[1]. The prevalence of WMH is up to 50–90% in the community elderly population of older adults [2]. WMH is associated with the decline of cognitive function and executive function. The increased WMH load confers a higher risk of vascular cognitive impairment [3–6]. However, at present, the early diagnosis of WMH cognitive impairment mainly relies on the evaluation of neuropsychological scales, which has some problems, such as strong subjectivity, inconsistent diagnostic criteria, and a complicated diagnostic process. And the lack of specific diagnostic basis for vascular related cognitive dysfunction, which would lead to clinical missed diagnosis and misdiagnosis, thereby increasing the risk of CSVD and even Alzheimer's disease (AD) [7, 8]. Therefore, an objective and reliable detection method is important for early identification of WMH-related cognitive impairment, thereby assisting early clinical intervention and treatment.
Traditional shallow statistical analyses frequently failed to capture the heterogeneity behind psychiatric phenotypes, particularly in experiments with small sample sizes. To overcome this limitation, machine learning (ML) algorithms have been widely applied to MRI image analysis of various neurological diseases [9, 10], which provided a promised tool for both analyzing these variables and observing inherent disease-related patterns. For example, using the ML algorithm called support vector machine (SVM) to extract abnormal brain structure information from whole brain MRI data can achieve excellent classification accuracy of AD, which can aid in early AD diagnosis [11, 12]. One study performed machine learning analyses based on altered diffusion tensor imaging (DTI) metrics between groups [13]. They applied random forest to generate models and achieved an 80.5% accuracy in diagnosing WMH-MCI from WMH populations. However, the diagnostic ability of gray matter atrophy for cognitive impairment in WMH remains unknown.
Improving classification accuracy is the goal of machine learning algorithms in the training process. Therefore, in imbalanced-learning, the classifier would easily tend to the majority class, which leads to misclassification and model untrustworthiness. To overcome the problem of unbalanced classification, the existing methods mainly include the sample resampling method based on data-level and a cost-sensitive learning method based on algorithm-level. The data-level method tries to reduce the level of imbalance by under-sampling majority samples or over-sampling augmentation minority samples[14, 15]. However, over-sampling may increase the probability of over-fitting, while under-sampling may cause poor fitting effects, especially for the small sample data. At the algorithm level, applying the idea of cost-sensitive learning, the misclassification cost loss of different categories could be integrated into the objective function of algorithm training by a weighting strategy, so that the classification algorithm itself would have a certain data tendency[16, 17]. Prior studies that have implemented weighted cross-entropy and focal loss functions on XGBoost and demonstrated the competitive performance of this method based on five imbalanced datasets[17]. For the clinical research, the patient data usually tend to be smaller than healthy control data. Therefore, how to effectively solve the small sample imbalance is the key technology in the process of model construction.
This study aimed to develop an objective and effective classification framework to automatically distinguish patients with cognitive impairment in WMH populations. To provide rich disease representation information for the model, we extracted various scale features that could characterize structural information of gray matter and fused them, which includes macroscopic gray matter volume and those fine-grained morphological measurements based on the cortical surface. Then, to overcome the problem of unbalanced classification of small samples in this study, we respectively discussed the sample resampling method based on data-level and the cost-sensitive learning method based on algorithm-level. Moreover, we implemented an ensemble learning strategy based on a performance weighted voting mechanism to combine the above two methods. In this way, the advantages of data-level resampling method and algorithm-level cost-sensitive learning method can be well complementary, so that to obtain a WMH-MCI diagnostic model with higher classification performance and better stability, which could provide a basis for clinical diagnosis.