Using Fuzzy-Rough Subset Evaluation for Feature Selection and Naive Bayes to Classify the Parkinson’s Disease

: Feature selection is one of the issues in machine learning as well as statistical pattern recognition. This is important in many fields (such as classification) because there are many features in these areas, many of which are either unused or have little information load. Not eliminating these features does not make a problem in terms of information, but it does increase the computational burden for the intended application. Besides, it causes to store of so much useless information along with useful data. A problem for machine learning research occurs when there are many possible features with few attributes of training data. One way is to first specify the best attributes for prediction and then to classify features based on a measure of their dependence. In this study, the Fuzzy- Rough subset evaluation has been used to take features in core of similar features. Fuzzy-rough set-based feature selection (FS) has been demonstrated to be extremely advantageous at reducing dataset size but has various problems that yield it unproductive for big datasets. Fuzzy- Rough subset evaluation algorithm indicates that the techniques greatly decrease dimensionality while keeping classification accuracy. This paper considers classifying attributes by using fuzzy set similarity measures as well as the dependency degree as a relatedness measure. Here we use Artificial Neural Network, Naïve Bayes as classifiers, and the performance of these techniques are compared by accuracy, precision, recall, and F-measure metrics.


INTRODUCTION
These days with the increasing of computer and database technology, a large number of features can be obtained and saved in databases for various real-world implementations. Some of the features could be irrelevant or redundant for classifying learning; they could significantly lead to reduces efficiency and accuracy of classifiers and causes utmost computational confusion. Consequently, afore employing a data collection it is needed to pretreatment the data to extract inessential attributes. Attribute choosing or feature selection, a significant method for declining the number of unneeded attributes, is applied to figure out an ideal attribute subclass to carry out classification on the assumption that keeping categorization accuracy. Of late years, attribute extraction has been comprehensively utilizing in data procedure, templet identification, and machine learning [1][2][3][4][5][6][7][8][9][10][11][12][13].
The standard of the machine-legible input that works on it usually contributes to the success of machine learning algorithms. These items comprise whether there is indifferent, needless, untrustworthy, or noisy data. Data mining, which has started to gain great importance recently, has achieved great success in interpreting big data by processing; It makes predictions effective in solving major problems in many sectors, especially finance, health, communication, and education [14].
The use of feature selection, artificial intelligence, and machine learning applications in the field of health is carried out in many sub-fields such as medical diagnosis and disease tracking, cost estimation, imaging analysis, resource planning, and emergency management, processing of unstructured data [15][16][17]. Artificial intelligence models, which are also used to make highscale patient data functional, play an active role in increasing data reliability and quality [18], [19]. However, regarding the use of artificial intelligence in the field of health; The accuracy of clinical data, data management, legal and ethical processes regarding data protection limit the use of artificial intelligence in the field of health. Machine learning techniques are most commonly used in the field of health sciences for predicting, diagnosis, and determining postillness complications, and it is aimed to provide patients with better quality healthcare services by saving time and workload [20].
Indeed, it is stated that machine learning-based classification methods play a decisive role in both decision support systems and disease diagnosis in today's medical research [21,22].
Recently, there has been a great deal of interest in evolving methods that able to deal with inaccuracies and uncertainties, and a significant amount of them is in the field of fuzzy and rough sets. Rough set theory's success is owing to some particulars of the theory. Merely the facts covered in the data are investigated. No further information about the inputs is needed for data analysis such as thresholds or proficient knowledge on a specific field. And it detects the minutest knowledge representation for data. While rough set theory controls only one type of handicap that exists in data, it is supplementary to other concepts for the goal, like fuzzy set theory. Fuzzy sets are deal with ambiguity, and rough sets are cover the undiscriminating [23].
Parkinson's disease (PD) appears as the decease of dopaminergic neurons in the substantia nigra pars condensed inside the midbrain. This neuro damage caused a domain of symptoms containing coordination outcomes, bradykinesia, vocal changes, and toughness. Dysarthria is also detected in PD patients; it is specified by laxity, paralysis, and deficiency of coordination in the motor-speech system: affecting respiration, phonation, articulation, and prosody.
whereas symptoms and the disease period change, PD is mostly not distinguished for many years. accordingly, there is a necessity for more accurate diagnostic implements for PD finding because, as the disease moves forward, more symptoms become apparent that make PD more difficult to treat. Therefore, a big deal of endeavor has been made to extend methods for early detection, mostly at pre-symptomatic phases in order to slow or stop disease forward movement. The rapid advance in machine learning techniques has made it challenging to combine large-scale, high-dimensional objects. Thus, it has extended quickly in computeraided machine learning approaches for Combined analysis. Well-known pattern analysis methods, such as Artificial Neural Network (ANN), Naive Byes (NB), have been used for early detection of PD and the prediction of PD progression [24][25][26].

MATERIALS and METHODS
Fuzzy sets were presented and expressed using membership functions by L.A. Zadeh in 1965 and have many convenient utilizations [27,28]. For the aim to implement fuzzy set similarity measures to specify concurrence between two distinct features, each feature is expressed as a fuzzy set over the patient's data sets. The dataset must be normalized to specify a degree of membership in [0, 1]. The patient's membership degree in the fuzzy set specifies the level of certainty for that patient's data.
Suppose is a reference set (finite and non-null collection of items) and is a non-null set of finite features. Definition 1. İn the classical set theory an element must belong or not belong to a set. In fuzzy theory, an element can belong to a set by degree (0 ≤ ≤ 1). The fuzzy belonging function is shown such as ( ) ∈ (0,1) where is an element collection, is an object, and A is a fuzzy set which is given below [29,30].  The division of , which is produced by ( ), is demonstrated / ( ) (or / for simplicity) and can be computed as: where ⊗ is particularly defined as follows for sets and : Let ⊆ . can be approximated by employing only the knowledge contained within by setting up the − lower and − upper approximations of : 〈 , 〉 is labelled as a rough set. Let and be sets of elements inducing equivalence relationship over , then the positive, negative, and boundary regions can be described as: The positive region comprises all items of that can be categorized into classes of / by employing the information in attribute P. The boundary region is the set of attributes that can possibly, but not sure, be categorized in this method. The negative region is the set of elements that cannot be categorized into classes of / . ( ) The fuzzy-rough dependency function can be determined as come after: If the fuzzy-rough diminution process is to be a utility, it should be able to struggle with various features by calculating the dependency between the multiple subsets of the original attribute set. In the fuzzy condition, objects could belong to various equivalence classes, and thus, the Cartesian product of / ({ }) and / ({ }) must be noted in assigning / .
For a subset of attributes , where, is a resemblance entrance and tolerance classes are produced by fuzzy similarity connection as: So, lower and upper approximations of ⊆ are expressed as: The pair ( , ) is named tolerance fuzzy rough set. The positive region and dependency degree obtainable determined as before.
In each phase, we append one feature in the decrement collection and compute the grade of dependency, when there is not increase in the amount of dependency, the algorithm ends.

THE RESEARCH FINDINGS AND DISCUSSION
Fuzzy logic employs linguistic variables, determined as fuzzy sets, to approach human reasoning. The used feature selection technic gets pre-processed data set as an input and generates ranked attributes based on the got-together approach of F-Score and Accuracy on data set statement values normalized by fuzzy Gaussian membership function. we used fuzzyrough subset evaluation to feature selection. These top-selected data are used by the Fuzzydiscernibility and Neural-a network for classification. The process is explained with the following steps: The datasets used in this study comprise expression data for a set of attributes existing at https://www.kaggle.com/c/parkinsons-detection. This work is applied using Weka version 3.9.5.

ii. Attribute Selection
After normalized the dataset it is time to attribute selection. In this study as said before we applied a fuzzy rough subset evaluation technique for attribute selection and the searching method is Hill Climber. In this method, the fuzzy-rough set similarity is implemented for every feature and then select the n-top attributes.

iii. Classifying
In this study, Artificial Neural Network (ANN) and Naive Bayes (NB) are utilized as classifiers to determine the performance of attribute selection technique. Artificial Neural Network (ANN) and Naive Bayes (NB) are employed as classifiers to determine the performance of attribute selection techniques in this work. Naive Bayes is an uncomplicated learning algorithm that makes use of Bayes principle each other by a powerful supposition which the criteria are circumstantially individualistic, dedicated the class. Meanwhile this independence supposition is frequently disturbed, Naive Bayes against all odds constantly gives passionate classification correctness. Unified by its algorithmic performance and plenty alternative favourable features, it guides to Naive Bayes becoming extensively used virtually [37].
An ANN is adaptive in disposition because it changes its anatomy and modifies its weight to keep down the error. An adaptation of weight is based on the knowledge that moves internally and externally by network within the learning period. The vantages of ANN are it needs less formal statistical training, indirectly uncover complex nonlinear connections betwixt dependent and independent variables, figure out all probable interplays among predictor variables, and the presence of various training algorithms [38].

iv. Measurement Metrics
As mentioned in this paper, we use Artificial Neural Network (ANN) and Naive Bayes (NB) methods for classification and accuracy, precision, recall, F-measure, and computational time Here, TP is a true positive indicator that is accurately identified, TN represents a true negative that has been properly rejected, FP false positive that is misidentified, and likewise, FN represents a false negative that has been wrongly rejected.
Precision: This is determined by the proximity of two or more measurements to each other.
Precision is also expressed as a positive predictive measure Recall: is also noted as the actual positive proportion or sensibility that is retrieved to measure a division of the relevant samples Recall-Precision metric is a useful measure of success of prediction when the classes are very imbalanced. A large domain below the graph displays both great recall and big precision, that high precision shows fallen false-positive rate, and high recall relates to a fallen false-negative rate.
F-Measure: This is an evaluation of test carefulness. It considers both p and r in the test to account for the measure.
Computing Period: The interval indispensable to accomplish computational progress by assessing the classification implementation time [39].

Result:
The study is carried out to recognize classification accuracy for various numbers of attributes from the ranking list. Table 1. shows the Artificial Neural Network classification performance for Parkinson detection before selection attribute and after that. As can be seen from the table, a relative increase in performance is visible, and this means that the reduction of dimensions in the dataset while raising performance has caused a significant reduction in the calculated time.

Conclusion:
In this study, Fuzzy-Rough Subset Evaluation algorithm has been used for feature selection that can select a small set of datasets to prepare a highly precise classification of the instances.
The proposed dataset normalized by fuzzy Gaussian membership function. The F-Score and Fuzzy-Rough Subset Evaluation are exploited on the normalized dataset to rank the objects. Fscore is utilized to recognize relevant attributes and FRSE is applied to remove the redundancy among the features. In all feature selection results in Parkinson's data, the data selected may or may not be a subset of disease progression signature. So, the top n attributes are selected for classification in ANN and NB. The train set validation is used to detect the average classification accuracy. It provides 0.04% average classification accuracy for the ANN method and a 1.01% percentage for NB classifiers. It also gives the highest average ROC accuracy for NB compared to the ANN algorithms. In summary, the normalization of a dataset by using the Fuzzy Gaussian membership function can modify the classification accuracy with the suggested measure. The performance of the Naive Bayes is totally better than the Artificial Neural Network algorithm. So, the proposed method is effective and consistent for the detection of Parkinson with a small number of features.