A Novel Approach to Perform Linear Discriminant Analyses for a 4-way Alzheimer's Disease Diagnosis Based on an Integration of Pearson’s Correlation Coe�cients and Empirical Cumulative Distribution Function

The diagnosis of Alzheimer’s disease (AD) in his prodromal stage is a major topic. About 50% of the well-known Mild Cognitive Impairment (MCI) cohort are estimated to develop AD

multi-class diagnosis.One of the standard methods for supervised classification is Linear Discriminant Analysis (LDA).In contrast to the traditional method based on Principal Component Analysis (PCA) followed by LDA, the current paper proposes a novel approach where optimal LDA subspace is integrated with Pearson's correlation coefficient (PCC) method to overcome the singularity problem resulting in the case of an underdetermined dataset.We choose, for innovation, to operate the brain connectivity reconstructed from diffusion-weighted imaging modality.First, Diffusion Tensor and Magnetic Resonance brain images of 229 subjects have been preprocessed, their correspondent brain connectivity maps have been required, and connections inside and between hemispheres have been extracted.Second, correlation coefficients between features and classes have been determined, and empirical cumulative distribution functions have been reconstructed (ECDF).Features concerned by the transformation into the LDA space are those exhibiting a cumulative frequency above a determined percentile of the ECDF, in condition to guarantee the non-singularity of the withinclass variance matrix.Finally, different machine learning algorithms have been performed and evaluated thanks to the repeated five-fold Cross-Validation procedure.Compared to other methods, the originality of this work is that an accuracy of 100% has been achieved for the LMCI class diagnosis.Furthermore, the connectivity between hemispheres has been identified as a biomarker for disease diagnosis.

Introduction
The growing ratio of the population over 65 years has emphasized multiple neurodegenerative diseases, especially Alzheimer's.It has been recognized as the most occurred pathology that concerns about 10 percent of the target population.Irremediable; it imposes considerable implications, whether at the economic level concerning the state or social field regarding the patient's family and relatives.All these reasons incite the scientific community to attribute a considerable effort to the disease diagnosis in its different stages.Multiple technics such as Structural Magnetic Resonance Imaging (MRI), perfusion MRI, Electroencephalography (EEG), and resting-state functional MRI have efficiently contributed to AD diagnosis.Much of this success has been attributed to the revolutionary machine learning algorithms.For instance, based on MR images, deep learning algorithms were deployed for left and right hippocampus segmentation [1].The generated binary mask helped to compute the centroid and to extract the 3D correspondent patches in such a way that includes surrounded regions.These features have been used to learn 3-D densely connected A Novel Approach to Perform Linear Discriminant Analyses for a 4-way Alzheimer's convolutional networks.In addition to performing hippocampus shape analysis, the model extracts local features and then uses them for the classification task through a Multi-Layer Perceptron (MLP) model.
Demented vs. non-demented subjects have been predicted effectively using MRI scans.Notably, in [2], optimal classification accuracy has been achieved.The authors present a hybrid classical-quantum machine learning model for Alzheimer's detection.The resnet34 and quantum variational circuit (QVC) have been used respectively for feature extraction and binary classification.In a 4-way classification context [3], authors have performed a deep multi-task multi-channel learning (DM2L) framework.They employed a data-driven algorithm proposed in [4] to automatically detect anatomical landmarks.Images patches around the identified regions combined with demographic factors have contributed to jointly perform both regression and classification.
Classic machine-learning models have also contributed to efficient prediction.For instance, 4-binary classifications between adjacent classes have been established in [5].The purpose was to compare the effectiveness of asymmetry scores of regions of interest (ROI) and the voxel-based morphometry (VBM) method.Features extraction has been ensured based on independent samples t-test, and the highest accuracies have been achieved with a Random Forest classifier model.In [6], unsupervised stacked autoencoders (SAE) have been performed to extract the high-level features of MR images.The generated biomarkers serve either for a binary or multiclass AD diagnosis.Authors have demonstrated the effectiveness of SAE associated with zero-masking strategy in the case of diverse neuroimaging modalities, notably the anatomical MRI and positron emission tomography (PET).The method has gained efficiency thanks to its ability to detect shared information between different modalities.Furthermore, a clean high-level reconstruction has been represented from the corrupted inputs.For the same purpose, multimodal data have been preprocessed and then scored through a projection into the first linear discriminant analysis (LDA) vector in [7].These scores representing the pathology progress, have served to develop a multiclass AD diagnosis framework based on an extreme learning machine (ELM)-based decision tree.
Another imaging modality that has emerged in the last decade is Diffusionweighted imaging (DWI), an invasive modality that measures the Brownian motion of water molecules in the brain tissues [8].The diffusivity in white matter is anisotropic, so radial diffusivity is restricted.Augmentation of this measure reflects a demyelination phenomenon which can be considered a statement of Alzheimer's pathology.Apparent diffusion coefficient (ADC) allows accurate evaluation of diffusion anomalies.ADC mapping and ultra-high bvalues (ADC-uh) have been used to investigate diffusivity and pathological changes in brain patients.ADC mapping has explicitly exceeded voxel-based morphometry in detecting regions affected by AD [9].Regions altered by AD were, although, revealed through research focused on voxel-wise measures extracted from Diffusion Tensor Imaging (DTI), such as fractional anisotropy (FA) and Mean Diffusivity (MD) [10].The progression of computer-aided diagnosis systems allowed us to combine the DWI modality with tractography algorithms and reconstruct brain connectivity.
The connectome has been deployed in different ways, one study demonstrated that a network-based approach was effectively predictive for NC, MCI, and AD diagnosis, as well as EMCI and LMCI classification [11].In another study [12], topological properties related to brain organization, such as weighted clustering coefficient, weighted shortest path length, and betweenness of a node, were combined with the strength of a node and inverse participation ratio characterizing single and multi-subject to test different machine-learning algorithms.The highest performances have been provided by the support vector machine (SVM) model.Combined with VBM, topological properties have also proven effective in a 3-way classification.One more approach using the connectome, is the concept of communicability in the whole brain as an alternative to overcome the gap occurring when depending only on shortest path-based models [13].This graph metric has proved robust under testing with different machine-learning models.In addition, it was informative about the regions playing a key role in predicting the disease.
In the current paper, we propose a new way to deploy the complex network of the brain, where we get direct access to the connectome sites.Furthermore, we develop a novel approach for feature selection and dimensionality reduction to achieve an accurate multiclass AD diagnosis.First, Diffusion and anatomical images are automatically preprocessed to provide an 84 x 84 weighted symmetric connectivity matrix.Three sub-matrices are created following Equation (1) proposed in the 2.2 section.The obtained datasets represent the node's connectivity within the left, the right, and between hemispheres.The schematic overview of the proposed method is reported in Figure 1.Secondly, Pearson correlation coefficients are calculated to determine the correlation between features and classes.Next, the empirical cumulative distribution functions representing each data set are built.
While searching for relevant features, different quantiles were tested.Correspondent percentiles are selected to satisfy the non-singularity of the within-class scatter matrices.Then a five-fold Cross-Validation procedure is performed during 100 trials.Each time, the 3D-LDA correspondent features were retrieved and served for a 4-way classification.The chart clarifying the proposed methodology is exposed in Figure 2. Finally, a comparison with other methods is established, and the strengths and limitations of the proposed method are discussed.

Materials and Methods
Data used in this study are collected from Alzheimer's Disease Neuroimaging Initiative (ADNI) database (https://www.adni.loni.usc.edu).The ADNI community provides DWI for GE MRI sessions from ADNI2 and all MRI sessions from ADNI3.Images were chosen from different manufacturers in condition to A Novel Approach to Perform Linear Discriminant Analyses for a 4-way Alzheimer's have three-dimensional (3D) T1-weighted (T1W) imaging and two-dimensional echo planar DWI.For the present study, a total of 229 subjects have been analyzed.Correspondent clinical and demographic information are exhibited in Table 1.

Image Pre-processing
The downloaded DICOM brain images are converted into NIFTI files using Heudiconv software and organized at structured directory layouts (ANAT, DWI) in respecting the Brain Imaging Data Structure (BIDS) format.Diffusion and anatomical images are automatically preprocessed within the MRTRIX3 software package (http://www.mrtrix.org), the FreeSurfer software package (http://surfer.nmr.mgh.harvard.edu/),the Advanced Normalization Tools ( http://stnava.github.io/ANTs/)and the FMRIB Software Library (FSL) (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/).According to recent works in the field [5,14], multiple steps have been carried out: First, denoising takes place [6].The b0 images are extracted from the diffusion data obtained in the anterior-to-posterior AP direction.Eddy currents are corrected in the phase-encoding AP of the b0 images thanks to the FSL command.Then Bias field correction and skull stripping are performed by Advanced Normalization Tools (ANTs).
Second, different basis functions for each tissue WM, GM, and CSF, were estimated to perform multi-shell multi-tissue constrained spherical deconvolution and create an image of the fiber orientation densities (FOD) overlaid the estimated tissues.The FODs were normalized to enable comparison between subjects.It is worth noting that, in our experimentation, normalization is limited to 2-tissue (WM, CSF) for 2-shells (b=0 and b=1000) DW Images.
Third, GM/WM boundary was created for seed analysis by converting the anatomical image to MRTRIX3 format, segmenting five tissue categories (1=GM; 2=Subcortical GM; 3=WM; 4=CSF; 5=Pathological tissue), Coregistering the averaged b0 diffusion images and finally creating boundary separating the grey from the white matter.
Finally, streamlines are generated by the default probabilistic tractography approach of MRTRIX3 and then refined.They serve to create a weighted symmetric connectivity matrix W (84x84), where 84 is the number of ROIs (42 parcellations for each hemisphere), parcellated from T1-weighted images by the recon-all command from FreeSurfer, according to the Desikan-Killiany atlas.Elements of connectivity matrix W i,j represent the strength of nodes connectivity, a normalization of the number of fibers between the i th and j th nodes [15].

Features Selection
The connectivity matrix is a description of the weighted graph [5].In particular, each element W i,j of the connectome represents the normalized weight between two nodes, where 0≤ W i,j ≤ 1.Three sub-matrices were extracted from each connectome; the two first represent interconnection on the left and the right hemisphere, whereas the third concerns connections between hemispheres.Given that sub-matrices are symmetric, we propose to remove the upper triangle and the diagonal elements and to flat the remaining part to an n-dimensional vector.All in all, three m by n matrices were obtained, where m is the number of subjects in the dataset and n denotes the number of sites in each triangle determined as follows: Where j represents the site of the element W of matrix in the upper triangle.

Pearson's correlation coefficients
According to M. Grana and co-authors [10], Pearson's correlation coefficients are computed between each vector and the class label y i = 0, 1, 2, 3, respectively for normal control, EMCI, LMCI and AD patients.The matrices described above were treated independently during this process, and Pearson's correlation of a j th vector is computed as follows: Where the vector V j represents the weighted connections nodes at the j th site across all the subjects, V i,j is the value of the j th vector for the i th subject and y i is the belonging class.Once vectors resuming correlation between sites and classes are obtained, the empirical cumulative distribution functions are reconstructed.First, absolute correlation values are sorted.Then x-axis is scaled from minimum to maximum with a step size of one per n.Finally, the y-axis is constructed in such a way that each point in the x-axis is associated with the ratio of the cumulative number of immediate predecessors added by one to the set cardinality n.As seen in Figure 2, we randomly choose a first percentile, then select the voxel sites whose absolute correlation value has a cumulative frequency above.With respect to the formula in the next section, we compute the within-class variance matrix and check for singularity.The procedure is repeated until a non-singular matrix is obtained.

Linear Discriminant Analysis
Fisher linear discriminant analysis LDA is yet one of the standard methods used for supervised classification.Thanks to LDA, features are linearly projected onto an optimal subspace.The new space guarantees a maximum separation between classes and a minimum variability intra-class.To find the LDA features W, we must resolve the generalized Rayleigh quotient: A Novel Approach to Perform Linear Discriminant Analyses for a 4-way Alzheimer's For four classes classification, the scatter matrices between and within classes are computed with respect to the following mathematic formula: Where, µ is the overall mean, µ i is the i th class mean, n is the dimension of selected features, x i,j is the j th sample in the i th class, and x i,j is its correspondent mean.
The solution of the generalized Rayleigh quotient is reduced to the eigen values decomposition detailed below: Where λ is the eigenvalue.Given that S W is a non-singular matrix, the withinclass variance matrix is transposed, and the equation ( 6) is simplified to: Since the S −1 W S B matrix has no more than n −1 non-zero eigenvalues, we finally obtain three distinct eigenvectors.
So, during 100 trials, a five-fold cross-validation is performed.Each time, the eigenvector matrix W serves to transform samples into the new subspace with respect to the following formulas.
For each partition, the 3D-LDA features used for the classification task are determined following the equation (10):

Experimental results and Discussion
Limited by the singularity problem of the within-class scatter matrix, we need to test different percentile.Each time, the S w is computed, and non-singularity is checked.The empirical choice of percentile followed by an experimentation of the proposed method had effectively determined a percentile of 80% to extract relevant features.Sites were reduced from n=861 to 12 overlaid matrix representing the left hemisphere, to only six covers those corresponding to the right, and to 31 features concerning the connectivity between hemispheres.
Once obtaining the non-singular S w matrix, we proceed to the LDA method for dimensionality reduction.However, the fact that LDA is a supervised learning method impose vigilance.For instance, a traditional training to testing ratio, as it has been described in a previous work [7], leads to an overfitting problem and impacts the reliability in the clinical trials.Thereby, the repeated five-fold Cross-Validation procedure has been used in the current work to guarantee the effectiveness of the proposed method.The concept is to split the data set into five-part, four-part serve for the training and the remaining split for validation.This procedure has been repeated over a hundred trials.Each time, the LDA subspace has been determined in correspondence with the training set then the validation samples are projected.Once the LDA features are determined, the classification task takes place.
To check the reliability of the proposed method, six machine learning algorithms have been experimented separately for the node's connectivity within the left, the right, and then between hemispheres.Used classifiers were the Multi-Layer Perceptron (MLP), Support Vector Machine (SVM), Random Forest (RF), Decision tree, Logistic regression (LR), and K-nearest neighbors.An evaluation of their respective accuracies is exhibited in Figure 3, and the highest values are highlighted with an orange.
Inspecting Figure 3, we can observe that both MLP and Logistic Regression classifiers have achieved an accuracy average of 66%.Due to its reduced computation cost, the LR algorithm was chosen to evaluate the multi-class prediction.The obtained confusion matrices averages are reported in Figure 4.A comparison with the PCA +LDA method has been established for the between hemispheres' LDA features to emphasize the improvement made by the proposed method.Variations of registered accuracies with different trials are exposed in Figure 5.

Comparison With Other Methods
To the best of the authors' knowledge, the Diffusion-weighted imaging modality has not yet been established whether for three or 4-way disease diagnosis.Inspecting previous studies [5,12,16] on this matter, we notice that the multiclass diagnosis was dissected to a binary classification issue, which decreases the reliability and stringency of proposed approaches in the clinical assessment.Furthermore, regarding other imaging modalities, a lack of research concerning four classes of discrimination has also been found.Thereby, we extend the experimentation to compare the achieved results of the proposed method for 4-way Alzheimer's diagnosis.The registered performances of different pertaining studies conducted among the ADNI dataset to multiclass differentiation are listed in Table 2. From the displayed results, we can observe that the proposed method has achieved the highest performances, while the use of A Novel Approach to Perform Linear Discriminant Analyses for a 4-way Alzheimer's traditional classification algorithms.In contrast, the machine learning-based approaches [17][18][19] have proven more accurate in multi-class classification.
However, we consider that our method is both competitive and promising regarding the full performance percent delivered for the prediction of the LMCI class and given the benefits of an earlier diagnosis in a clinical context.

Discussion
An observation of the exhibited results in Figure 3 shows that, whichever classifier is used, the registered accuracies are significantly improved when the classification is based on the node's connectivity between hemispheres.Thus, they were identified as the most relevant features.In addition to the benefit of dimensionality reduction, the LDA method plays a key role in the case of categorical feature selection.A General inspection of the two first confusion matrices in Figure .3, where classification was performed based on the connectivity within the left and then the right hemisphere, displays an overlapping between adjacent classes and a deficit in the discrimination of different stages of the disease.In contrast, the highest True positive values were belonging to the third confusion matrix from the same figure, which attests to the effectiveness of connections between hemispheres for the Alzheimer's disease multiclass prediction.Interestingly, a 100% rate is registered when the discrimination of LMCI cohort.
Concerning the association of Pearson's correlation coefficient method and LDA for features selection, a comparison with the commonly used approach (PCA+LDA) has been exposed in Figure 5.The proposed method has achieved an average accuracy of 65.46 ± 1.94% , which is considerably higher than the registered performance of the original approach 54.71 ± 2.1%.
From the same figure, we can explicitly observe the variability between trials.We notice that the performance has roughly deviated between a minimum of 60%, and a maximum of 70%.The use of repeated k-fold cross-validation in the current work was particularly crucial, especially since we were limited by a relatively small data set, which excludes the possibility to adopt the common strategy of external validation.Overall, we notice that the connectivity of the nodes between hemispheres has proven efficient for multi-class prediction.Therefore, it is primordial to determine whether it can be considered a biomarker for disease diagnosis.Therefore, a statistical test is performed to investigate an eventual dependency with different classes, and the results are shown in Figure 6.
A visual inspection shows that, despite the EMCI class, there is already a statistical significance revealing the effectiveness of discrimination between different disease stages based on the connections between hemispheres (B.H) in comparison with those within the left (L.H) and the right (R.H).The non-significance (p−value>0.05),revealed when comparing provided accuracy based on R.H vs. B.H for the diagnosis of earlier stage EMCI, assesses that earlier changes have occurred within the right hemisphere.
Regarding the left vs. right hemisphere performance comparison, a highly significant difference (p-value≤ 1.00e −04 ) characterizes the CN and the EMCI cohorts.In contrast, non-significance has been found concerning the LMCI and AD classes.Therefore, in accordance with founded results in a region-ofinterest volumetry-based anatomical study [5], we conclude that there is an association between brain connectivity changes within the left hemisphere and the disease aggravation.

Conclusion
In this paper, a novel approach using the LDA method for a 4-way Alzheimer's disease diagnosis has been proposed.An integration of Pearson's correlation coefficient and empirical cumulative distribution function has been performed to ensure both feature selection and dimensionality reduction.A judicious choice of percentile was conditioned by the non-singularity of the withinclass variance matrix.In addition, repeated five-fold cross-validation has been established to compute 3D-LDA features and learn multiple classifiers.This procedure allows us to evaluate the model performance face to different subsets of the data, avoid overfitting to the training data, and consequently ensure the incredibility of the process in the clinical trials.Experimental results demonstrate that the proposed method provides promising performances in comparison with the traditional approach (PCA+LDA).In contrast to machine learning achieves, where huge data sets were used, the average accuracy is considered reduced.The proposed approach is not limited to the Alzheimer's disease multiclass diagnosis, it can be generalized to any multiclass classification context.In addition to an efficient multi-class prediction, our work contributes to identifying the node's connectivity between hemispheres as a disease biomarker for medical interpretation.

Fig. 1
Fig. 1 The schematic overview of the proposed method: (a) Convert downloaded brain image to BIDS format, (b) Preprocessing of ANAT and DWI images, (c) Create the connectome, (d) Extract the weighted connectivity inside and between hemispheres, (e) Create three matrices, representing weighted connections in left, right and between hemispheres, by flatting triangles for each subject, then concatenating the required data of N subjects in respect of axis 0

Table 1
Clinical and demographic information 1 NC normal control.2EMCIEarlier mild cognitive impairment.3LMCI Later mild cognitive impairment.A Novel Approach to Perform Linear Discriminant Analyses for a 4-way Alzheimer's