Artificial Intelligence Model for Parkinson Disease Detection using Machine Learning Algorithms

DOI: https://doi.org/10.21203/rs.3.rs-2098372/v1

Abstract

Background

In order for Parkinson's disease (PD) treatment and examination to be logical, a key requirement is that estimates of disease stage and severity are quantitative, reliable, and repeatable. The PD research in the past 50 years has been overwhelmed by the subjective emotional evaluation of human’s understanding of disease characteristics during clinical visits.

Method

The Parkinson's disease data set contains 23 features and 197 instances, of which 8 patients are sound and 23 patients, are analyzed as PD patients. Relying on chi2 test, extra trees classifier and correlation matrix as feature extraction strategies and relying on Decision Trees, K Nearest Neighbors, Random Forests, Bagging, AdaBoosting and Gradient Boosting as supervised AI calculations for permutation calculations. The calculation is based to obtain higher classifier accuracy, as well as ROC curves accuracy.

Results

Three conspicuous component selection strategies allow each of the 23 features to select 10 best performing features. The DT classifier has a higher accuracy of 94.87% in a dataset with 23 attributions, just like a dataset with 11 features. These results are also checked by ROC curve (AUC = 98.7%).

Conclusions

This calculation significantly separates PD patients from patients at the individual level, thus ensuring the use of computer-based findings in clinical practice.

1. Introduction

Parkinson's disease (PD) is a dangerous disease that occurs on the earth after Alzheimer's disease. Countless people around the world have experienced this disease. PD is a reformist and long-term focus sensory system degenerative disorder that seriously affects the elderly [1]. The significant side effects of PD are developmental weakness, such as delayed back development, muscle extension, hindered standing and balance, loss of procedural development, changes in speech, and changes in composition. The undergoing PD has no steady progress in dopamine under the body framework. The sound problem is a potential side effect for PD patients [2]. Such patients have problems with speaking, such as volume level and irregular pronunciation. The sound problems of these issues can be evaluated for early PD analysis. Diagnosing and monitoring PD through speech signals is more accurate and equally powerful. The result information is often used by neurologists to analyze PD through voice recording systems to help patients and get clear opinions. The new symptom model of Parkinson's infection has been released, and the main model rules for Parkinson's disease of the Movement Disorder Society have been established. Their goal is to help standardize clinical examinations [3]. The confirmation of Parkinson's infection is usually guided by certain methods, such as observational evaluation and evaluation of patient clinical records. These strategies, like the abrupt strategies that distinguish PD, are not reliable in terms of accuracy and feasibility. The Medical Foundation announced that the current determination framework has not yet accurately distinguished Parkinson's disease. To overcome these limitations, we need a reliable technique that can be used to identify and help prevent PD. In this association, part of the AI strategy is critical to the identification, avoidance, and treatment of PD [4].

In order to overcome the aforementioned problems, this article proposes a new coordination strategy based on chi2, extra trees classifier and correlation matrix to select the outline of the appropriate features [5]. These calculations have been used to process a large number of features and rank them as needed. Compared with a separate demonstration, the mixture of the three calculations provides excellent execution [6]. Then, at that time, we used the selected features to train and test six classifiers to predict PD patients. This article describes as follows:

• First, three calculation methods are proposed for selecting suitable features. That is, the chi2, extra trees classifier and Correlation matrix, which allocates an appropriate load to each component in the feature set, locates the feature according to the weight, and finally resolves the correlation.

• Secondly, the presentation of decision tree, K nearest neighbors, random forest, Bagging, AdaBoosting, and Gradient Boosting has been evaluated using selected features. The results show that compared with the first feature list, DT has created key results based on the features of the chi2, extra trees classifier and correlation matrix. In addition, ROC curve has been drawn for each selected object, including all the features approved by the result obtained by the classifier.

• We have conducted extensive investigations on real-world data sets, and the results show that compared with partners; the proposed analysis techniques (including selective classifier-AI classifier) have achieved key results in terms of high accuracy and low computational cost.

The rest of the paper is divided into seven parts. Section 2 describes the writing survey. Section 3 introduces the tools and techniques for each model/device/program/calculation used in the discussion, and their importance in advancing the proposed method. The Section 4 experiment is arranged to simulation environment and the required boundary and data set depiction. Section 5 discusses the legitimacy and adequacy of the model through decomposition of the results, and finally Section 6 describes the conclusion of work and the scope of future inspections.

2. Literature review

In the writing, the suggested PD analysis strategies, obstacles and benefits have been summarized in Table 1 for better arrangement, just as the importance of the strategies we proposed. Nonetheless, these technologies are limited in selecting the outline of the appropriate features, and therefore suffer from a lack of PD recognition and proficiency issues. 

Table 1

Literature Review

Author

Proposed Method

Accuracy (%)

Little et al. [7]

PD Diagnosis method

using SVM

91.4

Sakar et al. [8]

PD diagnosis using SVM

92.75

Li et al. [9]

PD Detection method

using fuzzy-based

non-linear transformation

techniques integrated

with SVM

93.47

Spadoto et al. [10]

PD detection method

using evolutionary-based

methods and optimal path

forest classifiers

84.01

Gok et al. [11]

PD diagnostic system by

employing the Rotation

Forest Ensemble (RFE)

KNN classifier

98.46

Peker et al. [12]

mRMR-ANN

98.12

Naranjo et al. [13]

TSCA

86.20

Cai et al. [14]

RF-BFO-SVM

97.42

Haq et al. [15]

L1-Norm-SVM and CPD

99

Yadav et al. [16]

Five basic classifiers & ensemble technique

93.83, 73.28%


Das and Tsanas et al. [17, 18], a unique artificial intelligence-based techniques have been created to analyze PD patients. Little et. al. [7] Proposed a technique for distinguishing Parkinson's disease using speech signal information. They distinguished between 23 PD patients and 8 able-bodied subjects. SVM is used to characterize Parkinson's disease and healthy individuals. The accuracy of the proposed strategy was recorded at 91.4%. In another survey [18], 132 features were selected based on the signs of dysphonic discourse. Calculations using specular selection (FS) like LASSO, Relief, MRMR and LLBFS [18]. In addition, the model uses the feature selection calculation to select 10 features from 132 features, which are used for the sequence of Parkinson's disease and the entity. In contrast, Sakar et al. [8] a large number of voice recordings from 40 subjects were collected, of which 20 were Parkinson's disease subjects and 20 were non-Parkinson's subjects. 26 speech signals containing daily pronunciation, words, numbers and vowels were recorded. They used the Praat acoustic inspection program to record the speech [19]. In addition, a theme (LOSO) and S-LOO approval strategy was used to check the presentation of K-NN and SVM classifiers [20]. Exploratory work [7] proposed a strategy that relies on ML calculations, using speech signals to diagnose Parkinson's disease. Conveys the calculations of feature selection, such as help, LLBS, LASSO, and mRMR, and the proposed method achieves excellent results in terms of accuracy. Sakar et al. [8] established an analysis framework using SVM and achieved an accuracy of 92.75%. In addition, Der et al. [9] by using a fluffy-based indirect change strategy combined with SVM, a model for analyzing PD was proposed, and an accuracy of 93.47% was achieved. Andre et al. [10] proposed a symptom framework for PD recognition by using the backwoods classifier based on the strategy of change and the ideal way. The framework achieved an accuracy of 84.01%. Chai et al. [14] another academic project is planned to identify PD. The SVM and mitigation calculations are coordinated with the simplified calculations for bacterial removal, and critical accuracy is achieved. Emarie et al. [19] cultivated a program that uses fluffy theory, K-NN, and PCA to diagnose PD, and achieved an accuracy of 96.07%. Tsanas [18] planned the use of PSO and improved FKNN to find the determination strategy of PD, and obtained an accuracy of 97.47%. To this end, Gok [11] used the results of the Rotation Forest Ensemble (RFE) KNN classifier to propose a PD analysis framework with an accuracy of 98.46%. Along this path, Das [15] studied the scheduling and execution of ANN, strategy recurrence (LR) and (decision tree) DT. Compared with other LRs and DTs, the grouping execution of ANN is very good in terms of accuracy, and an accuracy rate of 92.9% is obtained. A PD discovery framework was proposed in [12], using mRMR to feature certain calculations and complex and respected ANN classifiers. The proposed framework has achieved 98.12% accuracy. Taking into account the writing of the survey, we infer that to effectively derive PD, a very smart judgment framework is required. In the planning of the PD analysis framework, current investigations [21, 22] have used unique grouping calculations, for example, strategy recurrence [23], support vector machines [13], k-NN [23], DT, NB [24] and ANN discovered PD. Among these classifiers, compared with different classifiers, the help vector machine performs very well. In view of the occasional extra features that will affect the performance of the characterization, just like the computational complexity of the model, the grouping execution of the classifier can be improved by selecting the appropriate element determination technology. Notable element selection and boundary improvement calculations include: help, mRMR, LASSO, LLBFS, Genetic-Algorithm (GA), Particle-Swarm Optimization, Whale- Optimization-Algorithm (WO), Natural Product Fly Enhancement (FFO), Differential Flowering Fertilization and Bacterial Elimination and Refinement (BFO) have been used in the feature selection of the existing trial selection outline features.

3. Tools And Techniques

The specific details and basic ideas used in the proposed model are described below in Fig. 1. This section proposes six machine learning classifiers and three feature selection methods.

3.1. Machine Learning Classifiers

Decision Tree

There is a classifier whose graphical explicit feature determination is an essential part of the learning cycle: the selection table. The entire question of the study selection table includes selecting the correct credits to be combined. Usually, this is done by estimating the cross-approval execution of the various feature subsets of the form and selecting the best performing subset. Fortunately, the leave-one-out cross-approval is very gentle for this classifier. Obtaining cross-approval errors from the selection list obtained from the preparation information is just a matter of controlling the class check associated with each table entry, because the design of the table will not be changed or erased as occasions increase [25]. To a large extent, the feature space is searched through the pursuit of best priority, because this method is less likely to fall into the largest neighborhood than other methods, such as forward selection.

K-Nearest Neighbor

In order to handle the different marks, a directional calculation is used, which is adjusted and a bunch of name signals are obtained. To assign another point, it finds the nearest point and makes a decision on that point, so it assigns the nearest mark [26]. The following distance work is used to evaluate KNN.

Random Forest

Random forest classifier is a comprehensive learning system for collecting, backing off, and various efforts that can be performed with the help of decision trees. These decision trees can work during planning, and the benefits of this category can be portrayal or retrogressive. With the help of such unpredictable remote areas, people can resolve their affinity for over-adaptation to arrangement sets [27].

At the random forest level, it is completely expected on all trees. The importance of the entire part of each tree is evaluated and isolated by the complete number of trees:

 \(\text{R}\text{F}\text{f}\text{i}\text{i}=\frac{\sum _{\text{j}\text{ϵ}\text{a}\text{l}\text{l} \text{t}\text{r}\text{e}\text{e}\text{s}}\text{n}\text{o}\text{r}\text{m} }{\text{T}}\) fiij

Where,

RFfii = the significance of highlight I determined from all trees in the Random Forest model

Norm fiij = the standardized element significance for I in tree j

T = absolute number of trees

Bagging

The idea of bagging (deciding grouping, average recurrence type problems and uninterrupted ward income factors) is suitable for prescient information mining space, adding expected orders (predictions) from many models, or models from various learning information of similar types. It is also used to solve the inherent instability of the results, while applying complex models to the index of usually little information. Assuming that the task of information mining is to build a model with a foresighted arrangement, there are usually very few data sets to prepare the model. We can generate sub-examples (with substitutions) from the data set multiple times and apply, for example, tree classifiers (such as CART and CHAID) to progressive examples. In fact, it is common to develop completely different trees for various examples, outlining the instability of the model that is usually obvious with a small number of data sets [28]. One strategy for determining individual predictions (for novel perceptions) is to use all the trees found in various examples and apply some basic democracy: the last feature is a feature that various trees often predict.

AdaBoosting

AdaBoosting or Adaptive Boosting is an AI used for meta-computation. Different learning indicators are usually used to further improve execution efficiency. The benefits of other learning evaluations will be combined into a weighted whole, which is stable with the last benefit of the supported classifiers. AdaBoosting is versatile and can guarantee substitute students who are powerless due to the misclassification of past classifiers [29]. AdaBoosting perceives large amounts of data and one condition. On some issues, over-fitting is not as defensive as other learning measures. Each substitute may be weak, but as long as everyone performs better than any theory, the last model may eventually be severely affected by a strong substitute.

$${E}_{t}=\sum _{i}E[{F}_{t-1}\left({x}_{i}\right)+{\alpha }_{t}h\left({x}_{i}\right)]$$

Among them, \({\text{F}}_{\text{t}-1}\left(\text{x}\right)\) = Boosted classifier, E (F) = error function, \({\text{F}}_{\text{t}}\left(\text{x}\right)={\text{a}}_{\text{t}}\text{h}\left(\text{x}\right)\)=frail learner, \(\text{h}\left({\text{x}}_{\text{i}}\right)\) = test in the learning set, t = no. if iteration, \({{\alpha }}_{\text{t}}\) = distribution coefficient, \({\text{E}}_{\text{t}}\) = boost the result of the classifier.

Gradient Boosting

Gradient boosting is an AI method for recurrence and characterization problems. It gives an expectation model as a bunch of general forecasting models and selection trees. Like other upgrade methods, it builds models in an ingenious way and summarizes them by allowing self-affirmation to be recognizable by appalling work [30].

Extensive use of "gradient improvement" follows strategy 1 to limit target work. In each cycle, we adjust the basic students to the negative point of the negative tendency, and continue to increase the normal value, and add it to the previously emphasized motivation.

$${\text{F}}_{\text{m}}\left(\text{x}\right)={\text{F}}_{\text{m}-1}\left(\text{x}\right)-{{\gamma }}_{\text{m}}\sum _{\text{i}=1}^{\text{n}}\nabla {\text{F}}_{\text{m}-1}\text{L}\left({\text{y}}_{\text{i}},{\text{F}}_{\text{m}-1}\left({\text{x}}_{\text{i}}\right)\right),$$
$${{\gamma }}_{\text{m}}=\frac{\text{arg}\text{m}\text{i}\text{n}}{{\gamma }}\sum _{\text{i}=1}^{\text{n}}\text{L}\left({\text{y}}_{\text{i}},{\text{F}}_{\text{m}-1}\left({\text{x}}_{\text{i}}\right)\right)-{\gamma }\nabla {\text{F}}_{\text{m}-1}\text{L}\left({\text{y}}_{\text{i}},{\text{F}}_{\text{m}-1}\left({\text{x}}_{\text{i}}\right))\right)$$

Where,\(\text{L}\left(\text{y}, \text{F}\left(\text{x}\right)\right)\text{i}\text{s} \text{a} \text{d}\text{i}\text{f}\text{f}\text{e}\text{r}\text{e}\text{n}\text{t}\text{i}\text{a}\text{b}\text{l}\text{e} \text{l}\text{o}\text{s}\text{s} \text{f}\text{u}\text{n}\text{c}\text{t}\text{i}\text{o}\text{n}\)

3.2. Feature Selection Method

Suppose we consider the list of capabilities to be processed as x with n features. The feature selection is picking m, out discrete advancement problem n contains the set, that is, m ≤ n (24). Display and execute a classifier that is basically unaffected by features. Therefore, it is fundamentally important to deal with unimportant features from the feature set [31].

Chi2 Test

The χ2 (chi2) test involves determining the calculation of χ2 between each component and the target and selecting the ideal number of features with the best χ2 score by using the following equation [32]:

$${\chi }2=\sum _{\text{i}=1}^{\text{n}}\frac{{({\text{O}}_{\text{i}}-{\text{E}}_{\text{i}})}^{2}}{{\text{E}}_{\text{i}}}$$

Where,\({\text{O}}_{\text{i}}=\text{O}\text{b}\text{s}\text{e}\text{r}\text{c}\text{a}\text{t}\text{i}\text{o}\text{n} \text{i}\text{n} \text{c}\text{l}\text{a}\text{s}\text{s} \text{i}\)

$${\text{E}}_{\text{i}}= \text{O}\text{b}\text{s}\text{e}\text{r}\text{v}\text{a}\text{t}\text{i}\text{o}\text{n}\text{s} \text{i}\text{n} \text{c}\text{l}\text{a}\text{s}\text{s} \text{i} \text{i}\text{f} \text{t}\text{h}\text{e}\text{r}\text{e} \text{w}\text{a}\text{s} \text{n}\text{o} \text{r}\text{e}\text{l}\text{a}\text{t}\text{i}\text{o}\text{n}\text{s}\text{h}\text{i}\text{p} \text{b}\text{e}\text{t}\text{w}\text{e}\text{e}\text{n} \text{t}\text{h}\text{e} \text{f}\text{e}\text{a}\text{t}\text{u}\text{r}\text{e} \text{a}\text{n}\text{d}$$
$$\text{t}\text{a}\text{r}\text{g}\text{e}\text{t}.$$

Extra Trees Classifier

For extracting salient features between data set elements by applying the element importance of the model, the model scores each information component and the higher the score, the more components in the income variable [33]. We apply the ET classifier to evaluate the five main features of the data set.

Correlation Matrix

Correlation is an attribute to check whether the characteristics of the data set are associated with the target variable. The relationship may be positive or negative. To a certain extent, if the single meaning of a feature is expanded, it will increase the value of fairness, while if the single meaning of the relevance is expanded, it will reduce the objective value [34]. Through the heat map, it can undoubtedly discover which features are most suitable for the target variable.

4. Experimental Setup

The data set used in this examination can be accessed online on the UCI machine learning repository [35], which contains the acoustic features of 31 patients. 23 of these patients are experiencing PD. The data set has 197 instances, of which 23 are acoustic features that are separate from the patient. The exploratory meeting aims to discover the features that improve the expected PD performance (see Table 2). The analysis was done on Jupyter Notebook (Anaconda3), Python adapted to 3.8 and 32-bit Windows 7 framework, 4 GB RAM and Intel® Core™ i3-4600U CPU @ 2.10 GHz 2.70 GHz. The size of the preparation set and test set are 80% and 20%, respectively. In order to evaluate the performance of each classifier, the results have been taken into account for accuracy. Finally, we analyzed the results obtained from the experiment. Table 2 describes the details of the PD patient data set.


5. Results And Discussion

Pre-preparation methods, for example, deleted missing feature, standard scalar, and min-max scalar have been applied to the data set to successfully prepare and test with the classifier. These factual strategies are the basis for a basic understanding of the data set. The data set has 197 instances and 22 real value features and a output object class. Figure 2 is a correlation matrix, which is a two-dimensional depiction of information, where colors indicate values. The correlation matrix provides a quick visual summary of the data. More complex matrix allows observers to understand complex data sets. In addition, the links between factors indicate that when the value of one variable changes, the other variable usually moves in a certain direction. Understanding this relationship is helpful because we can use the value of one variable to predict the value of another variable.

5.1. Result Based on Feature Selection

In this section, the test results of feature selection calculation chi2, extra tree classifiers and correlation matrix have been explained and discussed in detail. No element is selected in any feature selection algorithm MDVP: Jitter(%), MDVP:RAP, MDVP:PPQ, Jitter: DDP, RPDE. Subsequently, these characteristics have little effect on the confirmation of PD.

The features selected by chi2 calculation are shown in Table 3.

 
Table 3

Feature selected by chi2 algorithm

Attributes

Score

MDVP:Flo(Hz)

456.626628

MDVP:Fo(Hz)

316.985398

MDVP:Fhi(Hz)

227.402656

HNR

22.691579

MDVP:Shimmer(dB)

3.210348

PPE

2.151107

D2

1.381600

spread2

1.232614

Shimmer:DDA

0.462793

NHR

0.457699


The features selected by the extra tree classifier are shown in Figure 3.

In addition, the characteristics of the correlation matrix selection are represented in Table 4 and Fig. 4, respectively.

 
Table 4

Features selected by correlation matrix

Feature Name

Score

PPE

0.53

spread2

0.45

MDVP:Shimmer

0.37

MDVP:APQ

0.36

Shimmer:APQ5

0.35

Shimmer:APQ3

0.35

MDVP:Shimmer(dB)

0.35

Shimmer:DDA

0.35

MDVP:Fo(Hz)

-0.38

MDVP:Flo(Hz)

-0.38


After applying the three feature selection techniques, 10 best features and one output class are selected from each method mentioned in Tables 3 and 4 and Figs. 3 and 4.

Chi2, extra tree classifiers and correlation matrix have been used for effective training and testing of the classifier DT, KNN, RF, Bagging, AdaBoosting and Gradient Boosting. Thus the experimental results of feature selected by chi2, extra tree classifiers and correlation matrix with classifiers have been reported in Table 5. The experimental results show that the classifier classification performances is same for classifier DT which is 94.87% on reduced feature sets as well as the full feature set. While on the other hand the classifiers (KNN %accuracy = 82.05 for without feature Selection & chi2 feature selection & Extra trees classifier, RF accuracy = 92.30% for without feature Selection & chi2 feature selection, Bagging accuracy = 92.30% for without feature Selection, AdaBoosting accuracy = 89.74% for Correlation matrix, Gradient Boosting accuracy = 94.87% for chi2 feature selection & Correlation matrix) have the higher performance accuracy.

 
Table 5

Performance measurement of the classifiers with full features vs. reduced features

No. of Features

% Accuracy

Without feature Selection (23-Attributes)

Chi2 feature selection Technique (11- Attributes)

Extra Trees Classifier Feature Selection Technique (11- Attributes)

Correlation matrix Feature Selection Technique (11- Attributes)

DT

94.87

94.87

94.87

94.87

KNN

82.05

82.05

82.05

79.48

RF

92.30

92.30

89.74

87.17

Bagging

92.30

89.74

89.74

87.17

AdaBoosting

87.17

84.61

87.17

89.74

Gradient Boosting

92.30

94.87

92.30

94.87


Based on these statistical results, we conclude that DT is significantly better than other peers in accuracy, so the proposed method is suitable for PD identification. Therefore, using chi2, extra tree classifiers and correlation matrix FS algorithms and classifiers (DT, KNN, RF, Bagging, AdaBoosting, and Gradient Boosting) to select more appropriate features helps the model effectively diagnose PD. The features selected by the proposed FS algorithm include MDVP: Fo (Hz), MDVP: Fhi (Hz), MDVP: Flo (Hz), MDVP: Jitter (Abs), MDVP: Shimmer, MDVP: Shimmer (dB), Shimmer: APQ3, shimmer: APQ5, MDVP: APQ, shimmer: DDA, NHR, HNR, DFA, spread2, D2, PPE. In short, the proposed method can be used to detect PD, especially in the early detection of PD. Figure 5 represents the performance of classifiers with and without feature selection.

5.2. Result Based on ROC Curve

ROC curve were assessed to each set and subset of PD patients to recognize affectability (true positive rate) against the investigation group. Region under the ROC curve (AUC) was assessed to gauge how well the classifiers can recognize a dataset with full features and with diminished features between the investigation groups [36]. In Fig. 6 below, the area under the classifier’s performance measurement curve, where:

Figure (6a) Represents ROC curve with 23 features

Figure (6b) Represents chi2 feature selection (11 features)

Figure (6c) Represents extra trees classifier feature selection (11 features)

Figure (6d) Represents correlation matrix feature selection (11 features)

By analyzing the ROC curve, the gradient boosting classifier has a higher performance accuracy of 98.7%, with11 features are reduced by chi2 method.

6. Conclusion

This research aims to solve the problem of speech performance execution in Parkinson's disease by using feature elimination and the execution of multiple classifiers. Parkinson's disease is a dangerous human disease, and different people all over the world have experienced this disease. In this way, a reliable method is needed to fully confirm PD. In this article, we propose a reliable technique that uses appropriate AI to confirm the approach of Parkinson's disease. In particular, DT, KNN, RF, Bagging, AdaBoosting and Gradient Boosting have been applied to the grouping of Parkinson's disease and sound subjects. Chi2, extra tree classifiers and merging techniques based on correlation matrix have been accepted for the selection of relevant features. In addition, the K-fold cross-validation strategy has been used to determine the ideal value of the super boundary of the best model. In addition, evaluation measurements have been used to evaluate the presentation of the proposed model. The test results show that the DT group effectively evaluated PD and physical subjects. The high presentation of our strategy is due to the feature selection that determines the high enough features of the calculation. In terms of accuracy, the proposed strategy achieved amazing results and achieved 94.87% accuracy and AUC (%98.7). In addition, the suggested strategy can be easily used in medical service associations. In future work, since deep neural tissue will naturally select appropriate features for characterization, and artificial intelligence calculations need to include selection calculations, in future work, deep neural work methods will be used to sort Parkinson’s disease and entities. The proposed strategy will be applied to other data sets to identify comparative types of diseases. Treatment and recuperation after a given illness is of the utmost importance. In this way, we will gradually reduce infection and apply it for recuperation in the future.

Declarations

Conflict of Interest The authors have no affiliation with any organization with a direct or indirect financial interest in the subject matter discussed in the manuscript.

Data Availability The data set used in this exploration work can be found on UCI machine learning repository.

Compliance with ethical standards

Funding Declaration The authors received no financial support for the research, authorship and/or publication of this article.

References

  1. Lim, S. Y., Fox, S. H., & Lang, A. E. (2009). Overview of the extranigral aspects of Parkinson disease. Archives of neurology, 66(2), 167–172.
  2. Perez-Lloret, S., Rey, M. V., Pavy-Le Traon, A., & Rascol, O. (2013). Emerging drugs for autonomic dysfunction in Parkinson's disease. Expert opinion on emerging drugs, 18(1), 39–53.
  3. Seppi, K., Weintraub, D., Coelho, M., Perez-Lloret, S., Fox, S. H., Katzenschlager, R., … Sampaio, C. (2011). The Movement Disorder Society evidence‐based medicine review update: treatments for the non‐motor symptoms of Parkinson's disease. Movement disorders, 26(S3), S42-S80.
  4. Yu, K. H., Beam, A. L., & Kohane, I. S. (2018). Artificial intelligence in healthcare. Nature biomedical engineering, 2(10), 719–731.
  5. Ma, L., Fu, T., Blaschke, T., Li, M., Tiede, D., Zhou, Z., … Chen, D. (2017). Evaluation of feature selection methods for object-based land cover mapping of unmanned aerial vehicle imagery using random forest and support vector machine classifiers. ISPRS International Journal of Geo-Information, 6(2), 51.
  6. Macleod, A. D., Dalen, I., Tysnes, O. B., Larsen, J. P., & Counsell, C. E. (2018). Development and validation of prognostic survival models in newly diagnosed Parkinson's disease. Movement Disorders, 33(1), 108–116.
  7. Little, M., McSharry, P., Hunter, E., Spielman, J., & Ramig, L. (2008). Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. Nature Precedings, 1–1.
  8. Sakar, C. O., & Kursun, O. (2010). Telediagnosis of Parkinson’s disease using measurements of dysphonia. Journal of medical systems, 34(4), 591–599.
  9. Li, D. C., Liu, C. W., & Hu, S. C. (2011). A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets. Artificial intelligence in medicine, 52(1), 45–52.
  10. Spadoto, A. A., Guido, R. C., Carnevali, F. L., Pagnin, A. F., Falcão, A. X., & Papa, J. P. (2011). Improving Parkinson's disease identification through evolutionary-based feature selection. In 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society (pp. 7857–7860). Ieee.
  11. Gök, M. (2015). An ensemble of k-nearest neighbours algorithm for detection of Parkinson's disease. International Journal of Systems Science, 46(6), 1108–1112.
  12. Peker, M., Sen, B., & Delen, D. (2015). Computer-aided diagnosis of Parkinson’s disease using complex-valued neural networks and mRMR feature selection algorithm. Journal of healthcare engineering, 6(3), 281–302.
  13. Naranjo, L., Perez, C. J., Martin, J., & Campos-Roca, Y. (2017). A two-stage variable selection and classification approach for Parkinson’s disease detection by using voice recording replications. Computer methods and programs in biomedicine, 142, 147–156.
  14. Cai, Z., Gu, J., & Chen, H. L. (2017). A new hybrid intelligent framework for predicting Parkinson’s disease. IEEE Access, 5, 17188–17200.
  15. Haq, A. U., Li, J. P., Memon, M. H., Malik, A., Ahmad, T., Ali, A., … Shahid, M. (2019).Feature selection based on L1-norm support vector machine and effective recognition system for Parkinson’s disease using voice recordings. IEEE access, 7, 37718–37734.
  16. Yadav, S., & Singh, M. K. (2021). Hybrid Machine Learning Classifier and Ensemble Techniques to Detect Parkinson’s Disease Patients. SN Computer Science, 2(3), 1–10.
  17. Das, R. (2010). A comparison of multiple classification methods for diagnosis of Parkinson disease. Expert Systems with Applications, 37(2), 1568–1572.
  18. Tsanas, A., Little, M. A., McSharry, P. E., & Ramig, L. O. (2011). Nonlinear speech analysis algorithms mapped to a standard metric achieve clinically useful quantification of average Parkinson's disease symptom severity. Journal of the royal society interface, 8(59), 842–855.
  19. Howell, J. (2017). When technology is too hot, too cold or just right. The Emerging Learning Design Journal, 5(1), 2.
  20. Hsu, C. W., & Lin, C. J. (2002). A comparison of methods for multiclass support vector machines. IEEE transactions on Neural Networks, 13(2), 415–425.
  21. Chen, H. L., Wang, G., Ma, C., Cai, Z. N., Liu, W. B., & Wang, S. J. (2016). An efficient hybrid kernel extreme learning machine approach for early diagnosis of Parkinson׳ s disease. Neurocomputing, 184, 131–144.
  22. Singh, N., Pillay, V., & Choonara, Y. E. (2007). Advances in the treatment of Parkinson's disease. Progress in neurobiology, 81(1), 29–44.
  23. Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., … Steinberg, D.(2008). Top 10 algorithms in data mining. Knowledge and information systems, 14(1), 1–37.
  24. Pernkopf, F. (2005). Bayesian network classifiers versus selective k-NN classifier. Pattern recognition, 38(1), 1–10.
  25. Chaurasia, V., & Pal, S. (2020). Applications of machine learning techniques to predict diagnostic breast cancer. SN Computer Science, 1(5), 1–11.
  26. Soumaya, Z., Taoufiq, B. D., Benayad, N., Achraf, B., & Ammoumou, A. (2020). A hybrid method for the diagnosis and classifying parkinson's patients based on time–frequency domain properties and K-nearest neighbor. Journal of medical signals and sensors, 10(1), 60.
  27. Byeon, H. (2020). Best early-onset Parkinson dementia predictor using ensemble learning among Parkinson's symptoms, rapid eye movement sleep disorder, and neuropsychological profile. World Journal of Psychiatry, 10(11), 245.
  28. Tiwari, A. K. (2016). Machine learning based approaches for prediction of Parkinson’s disease. Mach Learn Appl, 3(2), 33–39.
  29. Ali, L., Zhu, C., Golilarz, N. A., Javeed, A., Zhou, M., & Liu, Y. (2019). Reliable Parkinson’s disease detection by analyzing handwritten drawings: construction of an unbiased cascaded learning system based on feature selection and adaptive boosting model. Ieee Access, 7, 116480–116489.
  30. Karabayir, I., Goldman, S. M., Pappu, S., & Akbilgic, O. (2020). Gradient boosting for Parkinson’s disease diagnosis from voice recordings. BMC Medical Informatics and Decision Making, 20(1), 1–7.
  31. Chaurasia, V., & Pal, S. (2021). Stacking-Based Ensemble Framework and Feature Selection Technique for the Detection of Breast Cancer. SN Computer Science, 2(2), 1–13.
  32. Chaurasia, V., & Pal, S. (2014). Data mining techniques: to predict and resolve breast cancer survivability. International Journal of Computer Science and Mobile Computing IJCSMC, 3(1), 10–22.
  33. Chaibub Neto, E. L. I. A. S., Bot, B. M., Perumal, T., Omberg, L., Guinney, J., Kellen,M., … Trister, A. D. (2016). Personalized hypothesis tests for detecting medication response in Parkinson disease patients using iPhone Sensor data. In Biocomputing 2016: Proceedings of the Pacific Symposium (pp. 273–284).
  34. Zhan, A., Mohan, S., Tarolli, C., Schneider, R. B., Adams, J. L., Sharma, S., … Saria,S. (2018). Using smartphones and machine learning to quantify Parkinson disease severity:the mobile Parkinson disease score. JAMA neurology, 75(7), 876–880.
  35. https://archive.ics.uci.edu/ml/datasets/parkinsons access on 4 July 2021.
  36. Sawada, H., Oeda, T., Yamamoto, K., Kitagawa, N., Mizuta, E., Hosokawa, R., … Kawamura,T. (2009). Diagnostic accuracy of cardiac metaiodobenzylguanidine scintigraphy in Parkinson disease. European Journal of Neurology, 16(2), 174–182.