A strategy for identifying the risk of developing Alzheimer’s disease from healthy controls: distance to novelty detection boundary

Background: Machine learning (ML) techniques are expected to tackle the problem of the high prevalence of Alzheimer’s disease (AD) we are facing worldwide. However, few studies of novelty detection (ND), a typical ML technique for safety-critical systems especially in healthcare, were engaged for identifying the risk of developing cognitive impairment from healthy controls (HC) population. Materials and Methods: Two independent datasets were used for this study, including the Australian Imaging Biomarkers and Lifestyle Study of Ageing (AIBL) and the Fujian Medical University Union Hospital (FMUUH), China datasets. Multiple feature selection methods were applied to identify the most relevant features for predicting the severity of AD. Four easily interpretable ND algorithms, including k nearest neighbor, Mixture of Gaussian (MoG), KMEANS, and support vector data description were used to construct predictive models. The models were visualized by drawing their decision boundaries tightly surrounding the HC data. A distance to boundary (DtB) strategy was proposed to differentiate individuals with mild cognitive impairment (MCI) and AD from HC. Results: The best overall MCI&AD detection performance in both AIBL and FMUUH was obtained on the cognitive and functional assessments (CFA) modality only using MoG-based ND with AUC of 0.8757 and 0.9443, respectively. The highest sensitivity of MCI was presented by using a combination of CFA and brain imaging modality. The DTB value reflects the risk of developing cognitive impairment for HC and the dementia severity of MCI/AD. Our findings suggest that applying some non-invasive and cost-effective features can significantly detect cognitive decline in an early stage. The visualized decision boundary and the proposed DtB strategy illustrated the severity of cognitive decline of potential MCI&AD patients in an early stage. The results would help inform future guidelines for developing a clinical decision-making support system aiming at an early diagnosis and prognosis of MCI&AD.


Background
With the ageing population, we are challenged by a growing impact of neurodegenerative diseases, such as Alzheimer's disease (AD), the most common type of dementia [1]. Especially, a steep worsening of neuropsychiatric symptoms of dementia has been reported during the COVID-19 pandemic [2]. Neurodegeneration incrementally diminishes the quality of a patient's life and leads to heavy economic burden in healthcare. In 2019 the Alzheimer's Disease International (ADI) reported that over 50 million people worldwide suffered from dementia. The total estimated worldwide cost of dementia in 2018 was US$ 1 trillion. This figure is projected to double by 2030 [3]. The total number of people with dementia in China was 9.5 million in 2015 and will increase to 14.1 million by 2020 and 23.3 million by 2030 [4]. The total costs of dementia in China have been predicted to reach US$ 69 billion in 2020 and US$ 114.2 billion in 2030 [5].
There is an urgent need to develop AI-enabled clinical diagnosis support systems (AI-CDSS) to accelerate AD diagnosis and prognosis and to improve the quality of healthcare. The AI-CDSSs can outperform the traditional CDSS, improve the clinical decision-making process, reduce medical errors, and decrease costs [6]. The development of AI raised to a governmental strategy level in many countries, including the US, China, and the UK. For example, the UK Health Secretary recently announced a £250 million investment in establishing a new national AI lab that will use the power of AI to improve the health and lives of patients [7]. AI is widely seen as having the potential to improve efficiency across many sectors, and healthcare is one of the major and the most critical domains to be revolutionized by AI [8].
Novelty detection (ND) aims to identify behaviour that are not consistent with normal expectations, representing a machine learning method in AI that could be incorporated into the AI-CDSS as it is more accurate, efficient and applicable particularly in monitoring for safety-critical systems [9,10]. ND has gained much attention in application domains including anomalous behaviour detection for elderly care [11][12] [13], prediction of new disease-causing genes [14], and cancer image classification [15].
Although ND has been widely applied in medical and healthcare fields, to the best of our knowledge, there is no study evaluating the risk of developing AD using ND techniques. Research in this area has great potential for the prevention and treatment of dementia. Unlike other methods in AI that need balanced data across each given class when building models, ND techniques are applicable when only one (i.e. normal) class data, HC data in this scenario, are available. The produced ND model can be used to detect whether or not healthy elderly adults are at the risk of developing AD at an early stage, which will in turn be referred by clinicians for follow-up treatment.
This study aims to construct an interpretable novelty detector, which constructs a tightly closed decision boundary to differentiate between healthy controls and patients with dementia. We also propose a novel distance to boundary (DtB) strategy for evaluating the severity of the risk of developing AD of a patient based on the distance of a data point to the decision boundary.

Data description
Two independent datasets were used in this study, namely, the Australian Imaging Biomarkers and Lifestyle Study of Ageing (AIBL) and the Fujian Medical University Union Hospital (FMUUH) datasets.

The AIBL data
Patient records from the AIBL database (https://www.aibl.csiro.au/ ) were used in this study to build the ND models. The data was collected by the AIBL study group. AIBL study methodology has been reported previously [16].
The primary goal of the AIBL study is to discover which cognitive characteristics, biomarkers, and health and lifestyle factors determine the subsequent development of symptomatic AD. The AIBL clinical data were extracted mainly from seven categories: cerebrospinal fluid biomarkers (CSF), cognitive and functional assessments (CFA), magnetic resonance imaging (MRI), Positron emission tomography (PET), blood test (BLO), demographic (DEM), and medical history (MH). In our study, the CFA adopted the Mini-Mental State Examination (MMSE) [17] and logical memory immediate/delayed recall assessments (LMIR/LMDR). The brain imaging data used in the study consisted of coarse-grained structural MRI and PET with [ 11 C]-Pittsburgh compound B (PIB). They are the total volume for grey matter (GM), white matter (WM), and cerebrospinal fluid (CSF), and the total number of active pixels (PIB.PET) from brain imaging. Therefore, we used 32 features as potential predictors of cognitive decline associated with AD. They are 3 CFA, 12 BLO, 4 neuroimaging, 2 sociodemographic, 10 medical history (MH), and ApoE genotype features (Table 1). The Clinical Dementia Rating (CDR) scores were used to characterize the severity of AD and acted as a response measure in prediction models. Accordingly, participants were categorized into five groups based on the CDR scale levels: the healthy controls (CDR = 0), mild cognitively impaired (MCI) (CDR = 0.5), the mild (CDR = 1), moderate (CDR = 2) and severe (CDR = 3) AD patients. CDR has been previously used as an objective measure of the severity of AD and been shown to be highly correlated with the clinical diagnosis [18]. The subjects from mild, moderate, and severe AD categories were combined into one AD category. Note that only health controls (HC) data were used to train the ND models. The non-healthy data were considered as abnormal data and was used to optimize the model parameters and test the predictive performance of the model.  [19], and the Neuropsychiatric Inventory (NPI) [20], and three demographics (Age, Education Level, and Gender). Detailed analysis of AIBL and FMUUH demographic characteristics can be accessed in Additional file 1: Table S1 and Fig.S1.

Feature selection
Feature selection techniques were applied to minimize the computational costs, decrease analytical complexity, and identify significant features associated with the severity of AD. First, min-max normalization was conducted to assimilate clinical measurements of diverse scales into the range of 0~1. To avoid bias associated with the choice of one specific feature selection technique for establishing feature ranking, we have used three different univariate feature selection approaches to select the optimal number of features, namely, based on information gain ratio (IGR) [21], Pearson's correlation [22], and Chi-square [23]. The Cross-Entropy Monte Carlo rank aggregation algorithm [24] was then utilized to aggregate the ranking results obtained by the above three filters and finally get the top-10 significant features.

Novelty detection modelling approaches
Novelty detection is also referred as one-class classification [25], anomaly detection [26] or data description [27]. The ND modelling process is described in Fig. 1. The preprocessed dataset, with respect to each category, was split into five folds: four folds for model development (including 80% normal and 80% abnormal data) and one fold for model testing (20% normal + 20% abnormal). To avoid any bias introduced by randomly partitioning and to get better repeatability, each one of the five folds was selected as the testing set and the remaining four folds were used as a development set. (See the outer 5-fold CV loop in Fig. 1). Next, the normal data in each development set were further split into five folds for model training and validation. The training set only included four-folds normal data which were used to construct an ND model, while the remaining one fold normal data were combined with the 80% abnormal data as a validation set to validate the constructed model. This process was iterated five times (the inner 5-fold CV shown in Fig. 1) to tune the hyperparameters of the model to get an unbiased evaluation of the model fitting on the training set. Finally, the tuned/optimized hyperparameters and the entire normal data in the development set were applied to construct an optimal ND model whose performance would then be assessed using a testing set unseen during the model's training. The experimental results are averaged over five outer folds.
According to [28] and for interpretability, four representative ND methods, namely k nearest neighbor (KNN), Mixture of Gaussian (MoG), KMEANS and support vector data description (SVDD), were employed for ND modelling in this study. The selection of these methods is due to their comprehensive interpretability, popular applicability in various domains [29], outstanding historical contributions to ND methods development [30], and the potential expandability for further research [31]. Their computational principles are easily understandable and briefly introduced as follows.

KNN
The KNN is a representative distance-based ND method assuming that all normal data points are close to each other, and anomalies are far from the normal points [32]. The KNN method first calculates the distance between the data point x and its k nearest ) and then calculates the distance from these nearest whether a data point x is normal or abnormal by comparing these two distances. The acceptance function for a test data point can be defined as [32]:

MoG
The MoG is a commonly used density-based ND method by calculating a linear combination of N components of normal distribution on the given data. The probability density of data x can be estimated with [33]: where j a is the mixture coefficients, j  is the mean of the j th Gaussian component, and  j is the covariance matrix. Data lying in a high density area are accepted as normal; otherwise are detected as abnormal.

KMEANS
KMEANS, a representative clustering-based ND method, is one of the most popular techniques due to its simplicity of implementation [33]. This method clusters normal data using a small number (i.e. k) of prototypes. The centroids of k clustered prototypes are optimized by the following minimised square error:  is the centroid associated with the th k cluster. Any data excluded by all clusters would be detected abnormal.

SVDD
The SVDD represents a support vector machine-based ND method. It employs a hypersphere to define a closed decision boundary around normal data. The SVDD method can be simplified as [27]: where i x is a data point from the training set, a is the centre of the hypersphere, and R is the distance from a to the support vectors on the decision boundary. In this study, the radial basis function [35] was used as the kernel of SVDD to map the target data onto a boundary.

Performance metrics
To measure the ND performance, we employed three performance metrics: sensitivity, specificity and the area under the receiver operating characteristic (ROC) curve (AUC) [36]. In the context of ND in the medical domain, positive (MCI/AD) and negative (HC) represents abnormal and normal, respectively.
Sensitivity reflects the rate of the abnormal data correctly detected. Therefore, we used sensitivity to assess the power of a ND model detecting different severity of getting AD of individuals. The higher the sensitivity, the less likely an AD patient fails to be pinpointed in diagnosis and prognosis. While, specificity represents the rate of the normal data correctly detected. A higher specificity indicates that the novelty detector is less likely to misdiagnose healthy controls. AUC is an integrated quantitative presentation of the ROC curve plotted the sensitivity against the 1specificity at various thresholds. It can be used to measure the overall performance of a novelty detector thoroughly.

Feature selection results
To determine if cost-effective and non-invasive AD markers have high discriminative power when they are used for detecting potential AD patients, all features in the AIBL data were grouped into four modalities: 1) CFA including LMDR, LMIR, and MMSE; 2) brain imaging features (IMG); 3) medical history and demographics (MH&DEM); and 4) a two alleles apolipoprotein genotype and blood test (BLO&ApoE).  Table S2.

Model performance results
A thorough comparison between the ND models constructed on both datasets was conducted to verify the applicability and robustness of the applied methods and the proposed DtB strategy.

FMUUH data
Consistent with the experimental results on AIBL, those on FMUUH data shows that (

The analysis of AIBL data
All trained boundaries shown in Fig. 3A can enclose at least 86% HC data, but the MoG produced a tighter and smoother boundary than others. In term of the testing results (

The analysis of FMUUH data
As decentralized data distribution, multiple boundaries were generated by the KNN, and one loose boundary was produced by the MoG, KMEANS and SVDD (Fig. 3C), trying to include at least 88% HC. Some overlap between HC and AD (Fig. 3D) reflected high specificity for HC but low sensitivity for AD (Table 4). E.g., the MoG obtained the lowest specificity of 87.99% for HC, but the highest sensitivity of 76.96% for AD. Therefore, we proposed a Distance to Boundary (DtB) strategy to address this inevitable low sensitivity caused by overlap between HC and non-HC (MCI/AD) and further detect potential non-HC from HC.

The DtB strategy
The idea behind the DtB strategy comes from that the ND decision boundary can objectively reflect the severity of developing AD of individuals, according to the distance of data point to the boundary. The theoretical foundation of the strategy was proposed by calculating the distance of each data point to its nearest point on the boundary in order to quantify the severity of cognitive decline. To describe the strategy, we choose the MoG algorithm, which has a stable boundary, best overall performance and can precisely enclose data, to calculate the DtB values of all testing data. The DtB calculation was carried out on the 5-fold CV assessment results. Fig. 4 depicts the box plots of the distance of each category of HC, MCI, and AD data points to the decision boundary constructed by MoG (Fig. 4A for AIBL and Fig. 4B for FMUUH). We define the sign of distance of inner data to the boundary is negative, while that for outer data is positive. Table 5 lists descriptive statistics of the results in Fig. 4.   for those data points lying outside the boundary, the farther from the boundary, the more severely the representing patients may develop AD; whereas for the inner data points, the nearer to the boundary, the more risk of cognitive decline they reflect.
Basically, the DtB box plot of FMUUH data shows a similar trend to that of AIBL. The AD box is longer than HC's, as well as presenting higher 1 st and 3 rd quartiles, median, mean, and maximum DtB values than HC (Fig. 4B). The difference is that the AD box of FMUUH data is slight across the ND boundary (the horizontal dotted line), which is similar to the MCI box of AIBL. Although an AD data point may be misdiagnosed as HC due to its lying inside the boundary, we can still detect its risk of developing AD according to its DtB value. Note that, however, different features used in Fig. 3 (A-B) and (C-D) and different population cohorts from Australian and China may cause the difference between Fig. 4 (A) and (B). Meantime, Fig. 4B hints some MCI might be included in the AD patients, which also reflects an urgent need of data integration of various data resources based on the Fujian Medical University Union Hospital. Repeat visitors who participated in the AIBL study were considered as different visitors.
However, some modalities such as medical history, ApoE genotype, and gender are not time-evolved. Those features may be the reason for poor performance generated when the models were trained by the two modalities of MH&DEM and BLO&ApoE. We are further investigating the ND technique on a larger-sized data, e.g. ADNI (The Alzheimer's Disease Neuroimaging Initiative) [40] or NACC (The National Alzheimer's Coordinating Center) [41] data, and working on integrating more FMUUH data from the local hospital. Furthermore, the current study and our previous development [39] have provided a solid foundation for the next phase extension of developing a CDSS using the ND technique. Finally, Ding et al. [42] have proposed a level set method based ND approach (LSM-ND), namely level set boundary description (LSBD). Being superior to the traditional methods (e.g. based on probability, distance, clustering, statistics, and support vector machines), the LSBD introduced some interesting properties for boundary construction, such as nonlinear problem addressable without using a kernel trick, non-parametric, dynamically time-evolved to better fit the data distribution, boundary shape easily manageable, and straightforward implemented in the given data space, et al.. Therefore, based on the current work, we will deeply investigate the LSM-ND approach for early detecting MCI and AD from HC populations.

Conclusion
This study first built the novelty detectors based on different Alzheimer's datasets (AIBL and FMUUH) using four representative and easily interpretable ND algorithms.
The intrinsic pattern behind AD was investigated in different cohorts study, by comparing the performance of models trained not only by different modalities but also the different combinations of modality types. We found that the best overall performance can be obtained when only CFA features are used. Hence, applying some non-invasive and easily accessible features can significantly detect cognitive decline in an early stage. Moreover, the highest detection sensitivity of MCI can be obtained by a combination of CFA and IMG. More importantly, the insight of the study was presented by a proposed DtB strategy via illustrating and quantifying the decision boundary produced by each novelty detector using two (for visualization purpose) of the most significant CFA features from each dataset. The strategy can objectively reflect the severity of developing AD of individuals. These results would help inform future guidelines for the development of a clinical decision-making support system aiming at an early diagnosis and prognosis of MCI/AD. Additional file 1: Fig.S1 Demographic features distribution of the FMUUH data. Table S1.
Demographic distribution of AIBL data. Table S2. The performance of the ND models using all possible modality combinations after feature selection. Table S3. The performance of ND models using all possible modality combinations without feature selection.