An Improved Method for Diagnosis of Parkinson’s Disease using Deep Learning Models Enhanced with Metaheuristic Algorithm

Accurate diagnosis of Parkinson's disease (PD) at an early stage is challenging for clinicians as its progression is very slow. Currently many machine learning and deep learning approaches are used for detection of PD and they are popular too. This study proposes four deep learning models and a hybrid model for the early detection of PD. Further to improve the performance of the models, grey wolf optimization (GWO) is used to automatically fine-tune the hyperparameters of the models. The simulation study is carried out using two standard datasets, T1,T2-weighted and SPECT DaTscan. The metaherustic enhanced deep learning models used are GWO-VGG16, GWO-DenseNet, GWO-DenseNet + LSTM, GWO-InceptionV3 and GWO-VGG16 + InceptionV3. Simulation results demonstrated that all the models perform well and obtained near above 99% of accuracy. The AUC-ROC score of 99.99 is achieved by the GWO-VGG16 + InceptionV3 and GWO-DenseNet models for T1, T2-weighted dataset. Similarly, the GWO-DenseNet, GWO-InceptionV3 and GWO-VGG16 + InceptionV3 models result an AUC-ROC score of 100 for SPECT DaTscan dataset.


Introduction
Parkinson's disease, also known as neurodegeneration, is an illness characterised by the progressive death of dopamine-producing brain cells.Dopamine is an organic substance produced by neurons that serves as a neurotransmitter in the brain, facilitating communication between neurons.Parkinson's disease results from impaired neuronal communication due to insu cient dopamine production in the brain.The substantia nigra, a small region of the brain, is where the neurons of the human brain are got affected due to Parkinson's disease, a long-term, neurological, and progressive motor illness [1].
A new United Nations research claims that nearly 1 billion people worldwide, or approximately one in six, suffer from neurological conditions like epilepsy, migraine, brain injuries, and neuro-infections like Alzheimer's, PD, stroke, and multiple sclerosis.Each year, 6.8 million of these sufferers were passing away [2].Although the actual cause of Parkinson's disease is unknown, it is believed that a combination of inherited and environmental factors is responsible for it [3].In the modern world, PD affects 2-3% of people who are at the age of 65 and older [4].Parkinson's disease progresses differently in every patient, and it is impossible to anticipate how quickly the disease may progress in any speci c person.While some people may have only minor symptoms for years, others may do so quite fast as they progress to more severe problems.Parkinson's disease often starts with minor tremors or other motor symptoms on one side of the body and progresses slowly over a number of years.The disease's symptoms could extend throughout the body and get worse, possibly affecting both sides of it.Even though Parkinson's disease is an ongoing and advancing condition, there are medicines that can help to manage symptoms and improve the standard of living.Parkinson's disease does not yet have an appropriate early diagnosis or treatment.Medication, physical therapy, and lifestyle modi cations are some of its treatments.
Parkinson's disease progression can be slowed down or stopped, even though there is presently no known cure for it.
These days, arti cial intelligence (AI) approaches -machine learning (ML) and state of art deep learning (DL) are greatly assisting medical professionals in the early diagnosis of illnesses.Due to this, research has recently been done to automatically identify Parkinson's disease using MRI images utilising a variety of AI and ML algorithms.Many different diseases and ailments have been diagnosed using deep learning, and the ndings frequently outperform traditional benchmarks [5].
Deep learning models which are a state of art performance mostly used in image classi cation problems.
With their ability to learn intricate patterns and features from images, they can often surpass traditional machine-learning approaches in accuracy.It automatically extracts relevant features from images, hence cause the elimination of manual feature engineering.This feature extraction capability permits the model to learn complicated representations and capture both low-level and high-level features present in the images.It can handle large-scale datasets e ciently.They can learn from vast amounts of labelled data, which is essential for training accurate image classi ers.Deep learning frameworks and libraries are designed to leverage parallel computing resources like GPUs to accelerate training and inference processes.
Over the past two decades, meta-heuristic optimization techniques have gained a lot of popularity.A few of these include particle swarm optimization (PSO) [6], grey wolf optimization (GWO) [7], ant colony optimization (ACO) [8], arti cial bee colony optimization (ABC) [9], etc. Hyperparameter tuning is one of the tedious jobs to manually ne-tune the parameters to obtain the best optimal values.The populationbased metaheuristic algorithm known as Grey Wolf Optimisation (GWO) is in uenced by the way grey wolves hunt.It searches for the best answers in a problem area by combining exploration (diversi cation) with exploitation (intensi cation).It is used to automatically ne-tune the parameters which mimic the social behaviour of grey wolves, including their leadership hierarchy and group hunting.The improved capacity of GWO prevents results from being stuck in the local optimal value [10].It also nds the best solution with a quick convergence rate.
The bene ts of deep learning and GWO in image classi cation include higher accuracy, autonomous feature extraction, scalability, transfer learning skills, robustness to uctuations, nding the optimal solutions, automatic hyperparameter tunning and continuous progress through continuing research and development.A variety of AI approaches using ML and DL models have been created in the past.In this study, a new framework is employed by combining grey wolf optimization (GWO) with four deep learning models known as VGG16 [11], DenseNet [12], InceptionV3 [13], DenseNet-LSTM [14] and a hybrid model The following is a concise explanation of the paper's main contribution.I. Number of images created empty tuples, for which di culty is coming while running the deep learning algorithms.These empty tuples are removed to obtain better performance by using the Python function.
IV.The proposed models are compared with the existing models using various performance metrics.
Following is the format for the remaining section: The earlier studies are covered in Section 2. Section 3 explains the preprocessing of MRI images and the development of methodologies.Experimental results and discussions, and comparisons between the existing models and proposed models are discussed in Section 4. In Section 5, conclusions and future scope are discussed brie y.

Related Literature
In the past few years, various studies have been created and published by academics worldwide to help in Parkinson's disease diagnosis.Many of these researchers have used various AI methods to analyse and classify the MRI brain images in order to detect various diseases related to Parkinson's disease.Deep learning techniques are the most often used method for classifying MRI images due to their capacity to deliver superior results than those obtained by more conventional machine learning techniques.This particular section explains the research using ML and DL methods to diagnose patients with Parkinson's disease.
2.1 Related review literature using T1, T2-weighted dataset group with the condition and healthy controls.Due to the fact that gender has a substantial impact on neurobiology and PD cases are developed more likely in males than women, it is advantageous that different research is conducted for men and women.
[18] have examined the viability and usefulness of employing multi-modal MRI datasets to distinguish between PD, PSP-RS, and HC subjects automatically.For this investigation, there are 45 PD, 20 PSP-RS, and 38 HC subjects with available T1-weighted MRI datasets, T2-weighted MRI datasets, and diffusion-tensor (DTI) MRI datasets.Brain morphology using T1-weighted, brain iron metabolism using T2-weighted, and microstructural integrity using DTI dataset regional values are determined by an atlas-based approach.These values are used to choose features, and then classi cation is performed using a variety of well-known machine-learning approaches.[19] have proposed a 3D CNN architecture after data pre-processing to learn the complex patterns in MRI images to identify Parkinson's Disease.406 individuals from the baseline visit, including 203 in good health and 203 with Parkinson's disease, are selected for the experiment.
A novel method is used by [20] which trains a deep neural network model using data from new patients, speci cally with T1 MRI and DaTscan datasets.The information utilised to model the knowledge retrieved from the PPMI database contains a set of vectors that represent the clustering centres of these representations, along with the matching DNN structure.The ability of the uni ed model created using these many datasets to predict Parkinson's disease in an effective and transparent manner has then been demonstrated.[21]  The authors in [23] have suggested CNN with eight layers deep for 3D T1-weighted MRI images to differentiate between PD and HC individuals.The proposed model additionally made use of the information provided by the individuals' ages and genders.In addition, batch and group normalization are applied to the designed model, increasing the accuracy up to 100%.[24] has described an autonomous diagnosis approach that distinguishes between PD and HC with high accuracy.Benchmark T2-weighted MRI scans for both PD and HC are made available to the public by the PPMI.Image registration technique is used to choose and align the middle 500 slices of a T2-weighted MRI scan.
2.2 Related review literature using SPECT DaTscan dataset required for the network to learn attributes that set them apart from other regions of interest (ROIs).In order to assess the effectiveness of the network model, 10-fold cross validation is used.[27] provide six well-known interpretation techniques and four deep-convolutional neural network designs.Also, the authors suggest a mechanism for evaluating interpretation performance as well as a way to use interpreted input to aid in model selection.[11] suggest a computer learning model that accurately identi es whether every given DaTscan has PD or not while offering a logical justi cation for the prediction.Visual indicators are created utilising Local Interpretable Model-Agnostic Explainer (LIME) approaches.Further, transfer learning is used to train DaTscans on a CNN (VGG16) from the PPMI database, and the resulting models have 95.2% of accuracy.Finally, the paper concludes that the suggested approach may successfully assist medical professionals in the PD detection because of its measured interpretability and accuracy.To analyse pictures from dopamine transporter single-photon emission computed tomography (DAT-SPECT) has been suggested utilising an ANNin [28].With the use of an active contour model, striatal regions are segmented and utilised as the data performing transfer learning on the arti cial neural network which is pre-trained to distinguish Parkinson's disease.To serve as a benchmark, support vector machine is trained to use semi-quantitative measurement metrics including the speci c binding ratio (SBR) and asymmetry index.
The active contour model is utilized to segment the striatal regions in the images.These segmented regions are then employed as the dataset for an already-trained ANN to do transfer learning.The goal is to separate PD from Parkinsonism associated with other disease.[29]have used arti cial neural networks (ANN) and image processing techniques to identify Parkinson's disease in its early stages.The images used are 200 SPECT scans from the PPMI dataset, out of which 130 are of normal participants and 70 are of Parkinson's disease (PD) patients.Using the sequential grass re algorithm, the caudate and putamen areas of the images are determined.To distinguish healthy and Parkinson's disease-infected people, these above features are loaded into an ANN.A novel approach is introduced by [30] for the medical treatment of neurodegenerative disorders, like Parkinson's, that utilises trained DNNs to extract and utilise latent information.The paper uses transfer learning along with k-means clustering, K-NN classi cation, and DNN trained representations to enhance disease prediction using MRI data.In recent past, [31] have presented a model for the early identi cation of PD which combines image processing with ANN in order to improve the imaging diagnosis of PD.The caudate and putamen serve as the study's region of interest (ROI), and the model identi ed them by analysing 200 SPECT images from the PPMI database, out of which 100 are of healthy people and 100 are of PD people.The ANN is then fed with the ROI area data, with a thought it will recognise patterns similar to how a human observer would do.[32] have suggested a novel method that uses 3-dimensional convolutional neural networks (CNNs) to differentiate between PD and healthy control.In order to reduce over tting and boost the neural network's generalisation abilities, the training set as well as the data from this set's sagittal plane using a straightforward data augmentation technique is given as input to the model.One of the di cult challenges all are facing is determining Parkinson's disease in the early stages.To conduct research on the early detection of PD using MRI images, various authors developed numerous computer-based machine learning and deep learning methods as described above.
In this study, authors have proposed four deep learning models whose hyperparameters are optimized using GWO, namelyGWO-VGG16, GWO-DenseNet, GWO-DenseNet + LSTM, GWO-IncepionV3, and a hybrid model (GWO-VGG16 + InceptionV3) which is the novelty of this paper.No authors earlier used these models with T1, T2-weighted and SPECT DaTscan for PD detection.Here, a number of images are creating empty tuples, for which di culty is coming while running the deep learning algorithms.These empty tuples are properly handled and removed to obtain better performance.This problem has also never been addressed by any authors previously in the literature.

Materials and Methods
This section illustrates the proposed methodology, preprocessing of MRI images, and model development.After that, the data is divided into two sets using 80:20 ratio for the train and test sets.
Again, the train set is divided into train and validation sets.The 80% of input images are fed to the proposed model for training and then the models are validated using 20% from the train set samples.Finally, models are tested using the remaining 20% of the data.The distribution is depicted in Fig. 1 below.

Proposed Methodology
In the proposed methodology following are the steps : Step 1: Firstly, T1, T2-weighted and SPECT DaTscan MRI datasets are collected from the PPMI website.also done using batch normalization for scaling.
Step 3 : Datasets are divided into train and test sets using the holdout method (80:20 ratio) Again train set is divided into (80:20 ratio) two sets i.e. train and validation set.
The proposed methodology is also graphically presented in Fig. 2.

MRI data collection
The MRI data are extracted from PPMI website [33].The PPMI dataset is a large-scale longitudinal investigation of Parkinson's Disease (PD) conducted by the Michael J. Fox Foundation for the research of Parkinson's.The objective of the study is to nd biomarkers that can aid in predicting the onset and progression of PD and to create new treatments for the condition.The PPMI dataset contains a variety of information, including clinical evaluations, genetic information, biospecimen samples (blood and CSF), and brain imaging data (MRI and DaTscan).Researchers from all across the world can analyse and do research on the dataset.
One of the distinguishing characteristics of the PPMI dataset is its longitudinal nature, which monitors patients over a number of years.This feature enables researchers to examine changes in disease development and nd potential biomarkers for the illness.The dataset also includes a large control group of healthy individuals, which provides a baseline for comparison.T1, T2-weighted MRI [34] and SPECT DaTscan [33] datasets used in this study are collected from the PPMI website.

MRI Data Samples
In this study, two datasets are used i.e.T1, T2-weighted and SPECT DaTscan.A total 30 number of subjects are included in T1, T2-weighted MRI dataset, from which 15 subjects (Male-7, Female-8) have Parkinson's disease (PD) and 15 subjects (Male-7, Femal-8) are healthy control (HC), which contains a total number of 9070 MRI images of different sizes.Out of 9070 MRI images, 3620 are PD subjects, and 5450 are HC subjects.A total 36 number of patients are included in the SPECT DaTscan dataset, from which 18 subjects (Male-9, Female-9) are suffering from Parkinson's disease (PD) and 18subjects (Male- Finally, unlike some other picture formats, png, jpg images don't lose any information when they are compressed, which might be crucial in the area of medical imaging, where even minor data loss can have serious repercussions. null arrays for which the machine learning models create a huge number of misclassi cations.These images are removed based on the threshold value of 30 pixels.Then images are cropped and stripped using python library functions.After it, images are normalized using batch normalization.After preprocessing, nal size of the MRI images is 224 x 224 x 3which given as input to the models.The original MRI images are shown in Figs.  .Each of the 13 convolutional layers is having3x3 lters with a stride of (1).After each max pooling layer, the number of lters doublesi.e.6 x 6 with a stride of 2, starting with the rst convolutional layer that includes 64 lters.The max pooling layers help to decrease the number of model parameters and avoid over tting by reducing the spatial dimensions of the output by a factor of 2.
Padding is a technique that is used by all convolutional layers to guarantee that the output's spatial dimensions match those with the inputs.Recti ed linear unit (ReLU) is one of the activation functions that introduces nonlinearity into the model comes after each convolutional layer.It has 2 fully connected layers, each with 256, 128neurons respectively.There are 128 neurons in the output layer, corresponding to the two classes in the T1,T2-weighted and SPECT DaTscan datasets.In order to output a probability distribution over the classes, it uses a "sigmoid" activation function.The VGG16 algorithm is renowned for its ease of use and capacity to extract intricate information from images.However, it can be expensive to train and utilise computationally because it is a very deep network with huge parameters.The images are given as input to the input layer, typically of size 224x 224 x 3. The stem network extracts features from the input images using three convolutional layers.With a 3x3 kernel, the rst, second and third layer consist of 32, 32 and 64 lters, respectively.The max pooling layer, which follows the stem network, has a 3x3 lter with a stride of 2.

DenseNet
There are several inception modules in InceptionV3 that are responsible to do feature extraction at various scales.Each inception module is made up of a number of convolutional layers with pooling layers and of various lter of sizes (1x1, 3x3, and 5x5) concatenated along the channel dimension.
Compared to conventional convolutional layers, Inception modules are computationally inexpensive.Two auxiliary classi ers are included in InceptionV3 after the 5th and 9th inception modules.The auxiliary classi ers are made up of a dropout layer, a softmax activation function, a ReLU activation function, a fully connected layer with 1024 neurons, and a global average pooling layer.The auxiliary classi ers' role includes supplying the network with more training data and minimizing the vanishing gradient issue.
After the last inception module, InceptionV3utilizes a global average pooling layer to shrink the output's spatial dimensions to 1x1 feature map.A fully connected layer with 128 neurons is fed with the output of the global average pooling layer, which corresponds to the two classes in the T1, T2-weighted and SPECT DaTscan datasets.The fully connected layer outputs a probability distribution over the classes using a sigmoid activation function.
Proposed Hybrid Model (VGG16 + InceptionV3) The combination of VGG16 and InceptionV3 is known as a hybrid model.Both models are described above individually.The VGG16 model's output is used as input to the InceptionV3 model, and nally, it foretells whether a given patient will get Parkinson's disease or not. Figure 5 illustrates the hybrid model's entire architecture.

Grey Wolf Optimization (GWO)
Seyedali Mirjalili has introduced GWO in 2014 by imitating the social conduct, hierarchy of leadership, and hunting on the communal land of grey wolves [7].Canidae is the family that includes the grey wolf (Canis lupus).As the top predators in the food chain, grey wolves are known as apex predators.The majority of grey wolves prefer to live in packs.The typical size of the group is between 5 and 12 people.Alpha, Beta, Delta and Omega are four different species denoted by (α), (β), (δ) and (ω) and as shown in Fig. 6.
The step-by-step procedure of grey wolf hunting is as follows: 1. Tracking, chasing, and approaching the prey.2. As soon as the target starts moving, it is pursued, hounded, and surrounded.
3. attacking the prey or assaulting it.
In this section, social hierarchy, encircling, and attacking is mathematically represented as follows

Social hierarchy
Alpha is the best solution (α) to mathematically express the social hierarchy, followed by (β) and (δ) as the next two best options.The remaining candidate solution is the (ω).α, β, and δ serve as the hunting (or optimisation) cues in the GWO algorithm.The remaining ω wolves come after these α, β, and δwolves.
Encircling/Surrounding Prey Grey wolves circle their prey during hunting.The encircling behavior is mathematically represented as where current iteration is denoted by t, coe cient vectors are denoted by S and U, the position of the prey is denoted by T x , and grey wolf's position is denoted by T. The vector and are represented as where q 1 , q 2 are arbitrary vectors with a range of [0, 1] and the components of decrease linearly from the value 2 to 0 throughout the course of iterations.
Hunting the prey: Grey wolves have the ability to track down and encircle their prey.Typically, the alpha leads the hunt.Hunting may occasionally be done by the beta and delta.It is assumed that the most promising candidate solution, alpha, delta, improves knowledge of the potential prey's location in order to 11 The wolves in motion attack the prey when it stops.is a randomly chosen number between − 2r and 2r, while r 2 is a number between − 1 and 1.The search agent's next position is a position that falls somewhere between the object's most recent location and its preyer position.Thus, the attacking state is Algorithm for hyperparameters optimization of deep learning models using GWO: Step-by-step procedure: Page 16/42 Step-1 : Set the ranges of hyperparameter values.The ranges are given in Table 3.
Step-2 : Set the population size of the grey wolves.
Step-3 : Create an objective function that measures how well the deep learning models performed after being trained with the provided hyperparameters.This function gauges the model's effectiveness on a validation set.
Step-4 : Using the wolves' tness levels, the dominance and hierarchy are determined.
Step-5 : Update the alpha, beta and delta and omega wolf's position using the Eq. ( 11) Step-6 : Make that the wolf's new positions remain inside each hyperparameter's stated ranges.A location is modi ed if it exceeds the limits.
Step-7 : Verify that the termination condition-such as completing the required number of iterations or obtaining the target tness value-is met.The optimisation procedure ends if the condition is satis ed; otherwise, return to step 5.
Step-8 : Take the optimal collection of hyperparameters for the deep learning models, which corresponds to the solution that is best and represents the wolf with the highest tness value.

Experimental Results and Discussion
Prior to discuss the outcomes, the fundamental performance evaluation criteria that are frequently used to evaluate different machine learning models while they are still in the training phase as well as in the testing phase are discussed in this section.

Performance evaluation with confusion matrix
The confusion matrix [37] which is a two-dimensional table is used to determine performance metrics.It displays the actual and predicted class values which are represented by its elements as true positive (T + ve), true negative (T-ve), false positive (F + ve), and false negative (F-ve).Checking the degree of misunderstanding between the various classes can be done by calculating these four elements.Based on the confusion metrics, the ve score metrics used in this study are as follows -Accuracy (Acc) = …..( 12) Speci city (Spe) = ….(14) Along with the true + ve rate and false + ve rate, the ROC curve is shown on a graph which is known as receiver operator characteristic.Area under curve (AUC) score, or area under the curve, is also obtained.

Experimental Results analysis of all the proposed models
Experimental results are described by the following subsections where it displayed the training results by plotting the accuracy and loss curves for each model used.For each model, the confusion matrix is also created and displayed.

Training Results of all the proposed Deep Leaning Models
The experiments are carried out in Python by using various packages such as keras, opencv, tensor ow 2.1, scikitlearn [38] using the system con guration of intel Core i5 processor, 8th Generation, with 16 GB RAM, and NVIDIA GEFORCE graphics combined with 8 GB memory.The standard T1, T2-weighted and SPECT DaTscan datasets are used for the study.The datasets are split into two sets i.e., train and test using an 80:20 ratio.Again, the training set is then split into train and validation sets.To train all the proposed deep learning models with GWO using the algorithm given in section 3.4, various hyperparameters used are shown in the Table 2 and the optimized hyperparameters by GWO are shown in Table 3.All the proposed deep learning models GWO-VGG16, GWO-DenseNet, GWO-DenseNet-LSTM, GWO-InceptionV3 and hybrid model GWO-VGG16 + InceptionV3 are pre-trained using the above hyperparameters.All the models comprise of an input layer, two hidden layers and an output layer.Every model has their own layers, such as convolutional, max-pooling, stem, global average pooling etc.Each layer consists of 256, 128 number of neurons respectively.Every hidden layer ends with a dropout layer with 20 percent of neurons dropping out to overcome the over tting problem.ReLu activation function is employed to all the hidden layers.To train all the models 'adam' optimizer and loss function 'binary cross entropy' is used.The GWO algorithm is used for hyperparameters optimization with all the proposed models to obtained the better performance.The ranges of parameters are given manually and optimized hyperparameters are shown in the Table 3.
Training/Validation accuracy/loss for T1, T2-weighted MRI Dataset The training accuracy and training loss are plotted for all the proposed models for both the T1,T2weighted MRI and SPECT DaTscan datasets are exhibited in Figs.It is clearly observed from the above gures that the optimized model's accuracy in some models initially uctuates but the curve becomes smoother as the model is taught/trained more.Additionally, it can be seen from the loss plot that GWO provides a superior loss rate in some models than the others.
Training/Validation accuracy/loss for SPECT DaTscan Dataset is shown Fig. 8.The x axis shows curves, and the y axis represents improvement.How well a model is trained using train set is shown by the training curve.In actuality, only 30 epochs of convergence are su cient for all models.The validation curves revealed whether a particular model is under tting, over tting, or just correct for a particular range of hyperparameter values.Additionally, there is very little over tting in all of the models.In light of this, the accuracy of convergence on training set is quite close to the accuracy of convergence on validation set.Here, it is observed that at the validation phase all models performed well and obtained more than 99% accuracy.

Testing Results of all the proposed Deep Leaning Models
A fully separate data subset that is previously prepared, is used to test and evaluate the effectiveness of the proposed models.The Figs. 9 (a)-(e) and Figs. 10 (a)-(e) show the confusion matrix for all the proposed models for both the datasets i.e., T1,T2-weighted MRI and SPECT DaTscan, respectively.
Confusion Matrix using SPECT DaTscan dataset is shown in Fig. 10.

Experimental Results and Discussions
Before obtaining the results of the proposed models, various preprocessing techniques and hyperparameter optimization technique is applied to obtain better accuracy and other performance measure results.Four deep learning models VGG16, DenseNet, DenseNet + LSTM, InceptionV3 and a hybrid model VGG16 + InceptionV3are trained using the two standard datasets T1, T2-weighted and SPECT DaTscan with 9070 and 20096 images for 30 epochs.The results are brie y explained below for both the datasets.

Results using T1, T2-weighted MRI dataset
The results evaluation and comparison of all the proposed models are presented in the Table 4 and the ROC curve is plotted for each model.Based on the representation of ROC curve, all the proposed models obtained AUC of approximately 100%.This proves that the proposed models are best in detection of Parkinson's disease.

Results using SPECT DaTscan MRI dataset
The result evaluations and comparison of all the proposed models are given in Table 5 and the ROC curve is plotted for each model for DaTscan datasets.

Comparison with the existing models
The proposed models' outcomes are presented in Tables 6 and 7 along with comparisons to other previously reported models.The comparison exhibit that for both datasets, the proposed deep learning models beat all other existing models in terms of performance metrics like accuracy, sensitivity, speci city, precision, f1-score and AUC score.The above table shows that, in terms of accuracy, from all the proposed deep learning models, hybrid model GWO-VGG16 + InceptionV3 outperform the other eleven existing models and obtained 99.92% accuracy which is nearly similar to the model proposed by [23] with 100% accuracy.The comparisons are graphically represented by pie chart as shown in Fig. 13.Multiclass classi cation can also be done.

Conclusion and Future Work
The detection of Parkinson's disease is becoming more and more crucial today.Because PD is a tremor illness, it is increasingly di cult to make an accurate diagnosis of the condition, especially in the early stages.This study proposes a classi cation approach for Parkinson's disease (PD) detection that enables  Proposed methodology for early detection of PD using GWO and deep learning models.
The hierarchy of grey wolf top to bottom.

3. 4
Model development Four deep learning models with the combination of grey wolf optimization technique GWO-VGG16, GWO-DenseNet, GWO-DenseNet-LSTM, GWO-InceptionV3 and a hybrid model GWO-VGG16 + InceptionV3 have been proposed in this study for detection of PD accurately.All the proposed models are explained brie y below: VGG16 VGG16 (Visual Geometry Group 16) [11]is a deep CNN architecture that has suggested by the University of Oxford's Visual Geometry Group in 2014.It is created for image classi cation problems and has accomplished state-of-the-art performance on various benchmarks, including the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) dataset.Thirteen Conv (convolutional) layers, 3 fully connected dense layers, and other layers made up the 16-layer,VGG16.The input layer accepts an image as input of size 224 × 224×3 behaviour of the wolves mathematically.The three best candidate solutions are mathematically represented to update their position as follows - appropriate when[36].The behaviour of wolves used to depict the process of nding the best solution.Following is the pseudocode of gray wolf optimization:Pseudocode for the grey wolf optimization -Initialize population of grey wolves as Zi, where, i = 1,2,…..n Initialize p, U and S Fitness calculation for every search agent Z α = Best search agent Z β = 2nd best search agent Z δ = 3rd best search agent While (t < Max_iterations) For every search agent Update the current position using Eq.(11) End of the for loop Update the value of p, U and S. Fitness calculation of all search agents then Update Z α , Z β and Z δ t = t + 1 end of while loop return Z α 7(a)-(e) and Figs.8(a)-(e), respectively.

Figure 1
Figures

Figure 3 Original
Figure 3

Figure 4 Image
Figure 4

Figure 5 Architecture
Figure 5

Figure 13 :
Figure 13: Pie chart representation of comparison between proposed deep learning model with other existing models using SPECT DaTscan dataset.
The features are extracted from DaTscan dataset and clinical assessments of motor symptoms in the steps 1 and 2. And in step 3, an ensemble of DNN is trained to predict 4 years of patient outcome.[26]havecreated a CNN model that can distinguish between PD patients and HC patients based on SPECT images.In this study, 2723 images of SPECT dataset are used out of which 1364 samples from the PD group and 1359 samples from the HC group.The image normalization method is used to improve the regions of interest (ROIs)

9 ,
Female-9) are healthy control (HC) which contains a total of 20096 MRI images.Out of 20096 MRI images, 14344 are PD subjects and 5752 are HC subjects.The sample size is distributed as shown in

Table 1 .
[35]es are available in DICOM (Digital Imaging and Communications in Medicine)[35]le format which is used to store and send medical pictures like X-rays, CT scans, and MRIs.A lot of image-related metadata, including patient data, information on the image's acquisition, and other medical data, is included in DICOM les.However, the DICOM ve format is di cult to deal with when employing these pictures for machine learning tasks.Another reason for converting DICOM images to jpg is that DICOM images have different pixel representations and bit depths, depending on the speci c equipment and software used to generate them.Jpg images, on the other hand, have a standardized pixel representation and bit depth, making them more consistent and easier to work with.
Those patients are included in the study whose age is between 55 and 75 years.Only PD and HC subjects are included.Exclusion CriteriaPatients whose age is less than 55 and greater than 75 are excluded from this study.Other category subjects are excluded such as SWEDD, PRODROMAL, etc.3.3ImagePre-processing MRIMany machine learning libraries and frameworks don't natively support DICOM les, which is one of the reasons DICOM images are generally transformed to other image formats, like png or jpg, before being used for image classi cation.Although Python has libraries for reading and manipulating DICOM les, it can often be simpler to convert the images to a more widely used format, such as png or jpg, and then use conventional image processing packages to work with the images.
[12]eNet[12], short for Dense CNN, is a deep learning architecture that Huang et al. have rst presented in 2016.It is designed to address the vanishing gradient problem and encourage deep neural networks that reuse features.It creates connections that are dense between all layers.Each layer in this architecture receives feature maps from all levels below it as input.Gradient ow throughout the network is made possible by this connection structure, which provides direct access to features at various depths.DenseNet is made up of dense blocks, each of which has several levels.Each layer in a dense block is connected to all layers before it.The overall network design is created by gradually connecting dense units.Convolutional and pooling layers are employed as transition layers to shorten the distance between packed blocks.They contribute to preserve connections while lowering computational complexity and feature map sizes.The key advantages of DenseNet are feature reuse, parameter e ciency, and mitigating the vanishing gradient problem.] is a variant of the Inception architecture that is introduced by Christian Szegedyet al. in 2015.InceptionV3 is a deep neural network that is created for image classi cation and object detection tasks.It consists of an input layer, stem network, inception modules, auxiliary classi ers, average pooling, fully connected dense layers and a nal (output) layer.
[14]eNet is widely used and has produced state of the art outcomes for a number of computer vision applications, such as semantic segmentation, image classi cation and object recognition.It is now a well-liked option among deep learning researchers and practitioners.DenseNet-LSTMDenseNet with LSTM[14]refers to a network that combines Long Short-Term Memory (LSTM) networks with the DenseNet network.The strengths of LSTM's modelling of sequential data and ability to detect temporal relationships are combined with DenseNet's feature extraction skills in this hybrid architecture.

Table 2
Hyperparameters used in all the proposed models

Table 4
Results of the proposed models using T1, T2-weighted dataset The results of Table4demonstrate that all the proposed models achieved more than 99% of testing accuracy except GWO-DenseNet + LSTM which resulted 98.29% accuracy and the hybrid model (GWO-VGG16 + InceptionV3) obtained highest accuracy 99.94% with the training loss of 0.0272 which is minimum among all models.To examine these results, the ROC curve showing the diagnostic test's sensitivity (sen) vs. speci city(spe)are used.This kind of curves aid in comparing various models depending on the value of the AUC variable.For each model, ROC curve is plotted and given in the Figs.11(a)-(e) which presents the curve between the true + ve rate (TPR) and false + verate (FPR).

Table 5
Results of the proposed models using SPECT DaTscan dataset

Table 6
Proposed models comparison with existing models using T1,T2-weighted dataset

Table 7
Proposed models' comparison with the existing models using SPECT DaTscan dataset

Table 7
The study has certain research limitations.Firstly, limited sample size is taken for the experiment; bigger sample size is still required to con rm the validity of these experimental models.Besides that, to implement with the larger sample size, it needs high memory spaces more than 16GB RAM, high-end GPU system in this case.Because of the high dimension data, it takes more time to execute and increases the time and space complexity.Only two datasets T1, T2-weighted MRI and SPECT DaTscan are used in this study.Only binary class classi cation problem is used for early prediction of the Parkinson's disease.
shows that in terms of accuracy, from all the proposed models, GWO-DenseNet, GWO-InceptionV3 and hybrid model GWO-VGG16 + InceptionV3 outperform the other existing models with 100% of accuracy.The comparisons are graphically represented by pie chart and given in Fig.14.