Effective Prediction of Rheumatoid Arthritis (Ra) Diagnosis Using Hybrid Harmony Search with Adaptive Neuro Fuzzy Inference System


 One of the severe auto immune diseases that affects the entire human body is Rheumatoid Arthritis (RA), the disease triggers one’s immune system to attack the inner linings of bones and causes severe inflammation of the synovium. The continuous erosion of bone lining leads to permanent loss of the joint, accounting this severity the early prognosis of the disease is a significant and inevitable process. But, the sign and symptoms of the disease are always uncertain. The symptom of RA disease is similar to other inflammatory diseases, so highly experienced experts can identify the disease in its early stage. To support the clinicians and technicians for early prognosis of the disease, a computer-aided decision support model based on Harmony Search –Adaptive Neuro Fuzzy Inference System is presented in this study. The Harmony search algorithm is employed to select the optimal features, and ANFIS is adopted to perform classification. To demonstrate the effectiveness of the model, metrics such as Accuracy, Sensitivity, Specificity, Precision, Recall, F-measure, Positive Predictive Value, Negative Predictive Value, Root Mean Square Error, and Mean Absolute Error are employed and evaluated in MATLAB simulation environment. The proposed HS-ANFIS outperformed other models developed in this research and existing works of literature.


INTRODUCTION
Rheumatoid Arthritis (RA) is a chronic systemic inflammatory sickness [1&2] that occurs on joints and muscles, resulting in apparent interference of joint structure and capacity.
According to the World Health Organization (WHO) report, the prevalence rate of RA is 0.3% to 1% worldwide, and in developed countries the disease is becoming a major threat. In India, RA's prevalence is estimated to be 0.92%, and women are highly vulnerable to the disease than men [3]. According to the WHO report, it was estimated that there are more than 100 types of arthritis [4], arthritis affects more than one among four adults in the population [5]. According to arthritis foundation America, one in three adults have arthritis. Inspite of encountering greater challenges in treating RA disease, clinicians have a high degree of challenges to diagnose the RA disease.
The RA disease is basically diagnosed by satisfying the American College of Rheumatology (ACR)/European League Against Rheumatism (EULAR) classification criteria for RA [6]. The diagnosis of RA over suspected patients is carried out by undertaking a series of laboratory tests and analysing the brief case history. Initially, the RA started with severe pain without any swelling, the symptom is a non-specific and also appears to be in conjunction with other immune diseases. The major clue for suspecting RA disease is a prolonged period of morning stiffness in limited number of joints. The laboratory test results can improve the diagnostic sensitivity, and they are carried out by examining the level of ESR (Erythrocyte sedimentation rate), CRP (C-reactive protein), RF (Rheumatoid factor), anti-CCP (Anti-cyclic citrullinated protein), ANA (Antinuclear antibody), blood count rate, and other imaging tests.
The ESR presents the blood settlement rate through a liquid column, and the ESR rate helps to distinguish the inflammatory and non-inflammatory diseases. Still ESR level varies with age [7] and also it may include other diseases like malignancy, pathogenic infections, and other related conditions. The RA factor represents auto immune proteins present in our body, about 5% to 10% of peoples have a higher level of RA factor, which get rises with age [8]. The most effective laboratory test for RA diagnosis is counting the anti -CCP anti-bodies. ELISA is the test used to calculate the anti-CCP rate, and it is more specific test to report positive RA than Rheumatoid factor [9]. The test based on estimating the C-reactive protein, the protein which is released by the liver during inflammatory diseases. With this test, the patient may reported to be affected with Systemic Lupus Erythematosus (SLE) or RA. The ANA has a specific disease association with RA, and about 98% of patients affected with SLE reported with positive ANA, and those who affected with other connective tissue disease show 40% to 70%.
Still, it was reported that ANA might present about 5% in many healthy individuals [10]. So, with ANA test the patient health history also should be analysed to report positive RA. Thus, it is clearly understandable that the single laboratory test is not enough to diagnose the RA disease, the patient is supposed to undergo series of various laboratory tests. Based on all the test results investigated with a patient health history, age, genetic factors and so on, and the clinician can come to a valid conclusion of reporting positive RA or Non-RA. Meanwhile, if the diagnosis procedure is delayed, the course of the disease may become severe, which may pay the way for other risk factors and even permanent loss of life.
The designing of intelligent -making model for early diagnosis of RA is always an open field of research. Numerous research studies have been presented in existing literature. Still, the thirst of finding the best model has not elapsed because of its challenge to meet better classification accuracy. In this research article, a computer-aided decision support model based on Harmony Search -Adaptive Neuro Fuzzy Inference System is developed, and the performance is evaluated by employing metrics such as classification accuracy, sensitivity, specificity, precision, recall, F-measure, PPV, NPV, RMSE, and MAE.

Research Contribution
The contributions made in the proposed work are as follows:  To collect and preprocess the real-time dataset includes both RA and Non-RA affected patients.
 Feature selction is performed by employing Harmony search strategy. The remaining section of this paper is organized as follows: Related works are presented in section 2, the methodology is discussed in section 3, the experimental modelling is discussed in section 4, the performance metrics are presented in section 5, the simulation results are briefed in section 6 and based on the investigation of results, inferences and conclusion are presented in section 7.

RELATED WORKS
The RA is an auto-immune disease which affects the joints of hands, legs, wrist, knees, angles etc. The disease is also termed as systematic disease as its nature of affecting is not only the joints but also the other organs of the body such as lungs, heart etc. The essential domains and tests of clinical examinations in rheumatology to report a patient with RA, numerous clinical examinations are needed. Indeed the experience and more skilled man power is needed to avoid late and wrong diagnosis [11]. The mortality risk associated with ILD and RA has been made and it was identified that the patients affected by ILD and RA reported high moratlity than the patients affected by RA without ILD [12]. The risk factors associated with RA include, development of malignant lymphoma, cardio vascular disease, atherosclerosis, Temporomandibular disorders, depression among women's voice disorder [13][14][15][16][17][18][19][20][21][22][23][24].
Considering the major stress patients undergo, researchers have proposed many intelligent decision support models for the past few decades for prognosis of the disease.
Machine learning methods for early prediction of RA based on electronic health records [25][26][27][28][29], deep learning strategy on X-ray images [30], an ensemble approach for disease gene identification, where EPU achieved an accuracy of 84.8% [31]. The Decision Stump as weak Learner, and Cuckoo search named CS-Boost for early prognosis of the disease [32]. Adaboost based classifier model for early diagnosis of fibromyalgia and arthritis [33], Numerous Neural Network based diagnosis model for arthiritis diagnosis. Neural network based RA diagnosis is investigated in [34]. Ajava based software tool has been presented which preformed pairwise alignment and analysis based on mutability score [35].
The risk factors associated with the RA diseases and the existing models on clinical decision support system for RA disease diagnosis are presented in the survey. The presented models are mainly concentrated on classification accuracy as performance matric to evaluate the model that is inadequate to validate a medical diagnosis procedure. In this article, numerous metrics are employed to validate the model. Further, there is no much studies available in this field of RA diagnosis research because of the complex nature of disease symptoms which imposes a huge burden to attain the outcome. In this article ANFIS model is developed to perform classification because of its adaptive nature to handle the high level of uncertainty in disease diagnosis [36][37][38][39][40][41][42][43][44] , and feature selection is a significant step in the design of classification model. Here, Harmony search algorithm is employed to select the necessary features, due to its advantage of simple implementation, better learning ability and improved convergence [45][46][47][48][49].

Harmony Search algorithm
Harmony Search (HS) algorithm is a meta-heuristic calculation that emulates the ad-lib procedure of a music player. Every performer plays a note while discovering the best notes of Harmony from end to end. The fundamental goal of the calculation is to eliminate the overall complexity occur during the search process. The HS method is inspired by the underlying principles of the harmony improvisation [50]. The calculation stream demonstrates parameter HMCR called concordance memory tolerating or thinking about rate. If the HMC rate is excessively low, just a couple of best harmonies are chosen and they lead combine too gradually. If this rate is to a great degree high (close to 1), every one of the harmonies are utilised in the concordance memory, consequently prompting different harmonies are not investigated well conceivably wrong arrangements acquired. Subsequently, ordinarily, we use HMCR=0.9. In standard, the pitch can be balanced straight or nonlinearly, and direct alteration is utilised.
Where old X is the current pitch or arrangement from the concordance memory, pitch adjustment determined by a pitch bandwidth PB and a pitch adjusting rate PAR. Here ε is a random number generator in the range of [-1, 1]. We can relegate a pitch-changing rate PAR to control the level of the alteration. Along these lines, we generally use PAR =0.1 ~0.5 in many applications. Three parts in harmony search can be outlined as the pseudo code appeared in Algorithm 1 and flow diagram in Figure 1. We can see that the likelihood of randomisation  Figure 1 Harmony search flow diagram

Adaptive Neuro-Fuzzy Inference System (ANFIS)
Adaptive Neuro-Fuzzy Inference System (ANFIS) is a hybrid structure containing the neural framework and the fuzzy logic [51]. The standard base contains the fuzzy on the chance that guidelines of Takagi and Sugeno's sort as seek after: If x is A and y is B then z is f(x, y) where A and B is the fluffy sets in the antecedents and z = f(x, y) is a fresh capacity. For the most part f(x, y) is a polynomial for the data factors x and y. Regardless, it can be whatever other limits that can portray the yield of the system inside the fuzzy. At whatever point f(x, y) is predictable, a zero demand Sugenofuzzy model is formed which may be seen as a remarkable example of Mamdanifuzzy reasoning system where every standard coming about is controlled by a fuzzy singleton. If f(x, y) is taken to be a first demand polynomial, Sugeno fuzzy model is encircled. For the first demand two guidelines Sugeno fuzzy derivation structure is given in Figure 2, the two precepts may be communicated as: Rule 1: If x is A1 and y is B1 then f1 = p1x + q1y + r1 (4) Rule 2: If x is A2 and y is B2 then f2 = p2x + q2y + r2 (5) In this above derivation system, the yield of every standard is a direct blend of the information factors included by a steady term. The last yield is the weighted normal of each standard's yield.

Layer 5: Overall Output
Where n is the dimension of the problem. Stage 5: Improvise a selected feature from the HM by using three principles, the congruity memory thought, the pitch alteration, and the arbitrary determination.

Dataset Construction:
The reliable dataset construction is an essential part of developing a disease diagnose model using artificial intelligence techniques. Now, there is no existing dataset available for Rheumatoid Arthritis, so data collection is the first set of the proposed study. In this work, the clinical data is collected from the various outpatient units in Coimbatore, India. The dataset is  Table 2. The dataset has 20 features, whereas the proposed strategy has optimally selected six features.

Normalisation
The purpose of Normalisation techniques is used to map the data to a diverse scale. In this work using Z score normalisation, also called as Zero mean normalisation. Here, the RA data is normalised based on the mean and standard deviation.

Data Segregation
After consummating the Normalisation process, the dataset is divided into two portions using the percentage (%). Dataset splitting is the procedure of dividing the entire dataset into two portions. The first portion is training data and it contains 80% of  N is represented as  

Training and Testing phase
The Pre-processed data is feed into Harmony Search Algorithm, the algorithm tends to select the optimal features. The selected features should follow the constrain such that, the selected number of features should be minimum. Meanwhile, maximising the accuracy, the cost function is presented as follows: Where ' Acc ' is the accuracy of the classification model, 'L' is the selected attribute length, N represents the number of features,  and  indicates the weight of classification accuracy and feature selection quality,  ∈ [0, 1] and  = 1 −  .
In the testing phase, each new RA data is analysed and its principal features are located and compared with the principal features of trained RA data. If some matches are found, the data is classified by the HS-ANFIS according to the previously defined rules. Initially, the test query RA data is to be received from the user, and then feature extraction is done. Next, the proposed HS-ANFIS classifier approach is applied on the given query data to flag the data into 'Normal' or 'RA'. The parameters for proposed HS-ANFIS technique for RA disease classification is presented in Table 3.

PERFORMANCE METRICS
The The PPV and NPV portray the execution of an analytic test.    Figure 5. The accuracy of the HS-ANFIS is 4.47% higher than that of GWO-ID3, 7.47% than that of PSO-ID3, 32.84% than that of GWO-SVM. The sensitivity of the HS-ANFIS is 2.18% higher than that of GWO-ID3, 3.38% improved than PSO-ID3 and 5.2%. The specificity is 22.18% improved in HS-ANFIS to GWO-ID3, about 32.79% higher than that of PSO-ID3 and 44.39 % better than that of GWO-SVM.
The Recall is improved to the value of 2.18%, 3.38% and 5.214 % than that of GWO-ID3, PSO-ID3 and GWO-SVM. The precision is 2.082% higher than that of GWO-ID3, 4.89% higher than that of PSO-ID3 and 33.34%. The F-measure is 2.43% improved than that of GWO-ID3, 4.14% improved than that of PSO-ID3, 21.84% higher than that of GWO-SVM strategy.
Similarly, the RMSE value is significantly decreased than that of other models, with better kappa score. The performance of the HS-ANFIS is compared with the existing works of literature, and the obtained matric results for the given training and testing dataset is reported in Table 6 and 7. The training accuracy of the proposed HS-ANFIS is 17.1% better than that of C4.5, 8.8% higher than that of PSO-C4.5, 9.3% higher than that of GWO-C4.5 and 3.7 % improved to the HGWO-C4.5 for the same datasets employed [25] 16.9 % higher than that of CSBoost [27]. The sensitivity of the proposed HS-ANFIS is 20% better than that of C4.5, 11.8% higher than that of PSO-C4. 5, 9.9% better than that of GWO-C4.5, 3.9% higher than HGWO-C4.5, 10.1% to REACT and 19% higher than that of CS-Boost. The specificity is 6.6% better than that of C4.5, 7.4% to PSO-C4.5, 5.6% better than that of GWO-C4.5, 5.7% higher than HGWO-C4.5 and 4.6% higher than the CS-Boost strategy. The computational time of the proposed HGWO-C4.5 is significantly reduced than that of other models employed in the study.

1.48
The training response of the Proposed HS-ANFIS is presented in Table 6, the testing accuracy is 24% better than that of C4.5, 14% higher than of PSO-C4.5%, 13.3% improved to GWO-C4.5, 14.4% better than that of HGWO-C4.5, 20.2% higher than that of REACT, 23.3% higher than the CS-Boost strategy employed for comparison. The sensitivity is 24.1% higher than that of C4.5, 16.9% better than that of PSO-C4.5, 15.2% improved to GWO-C4.5, 6% higher than that of HGWO-C4.5, 15% higher than that of REACT, 23.7% improved to CS-Boost. The sensitivity of Proposed HS-ANFIS is 4.06% better than that of C4.5, 1.36% higher than that of PSO-C4.5, 1.1% better than that of GWO-C4.5, 1.46% higher than that of HGWO-C4.5, 4.06% higher than that of REACT, 2.36% improved than that of CS Boost. The Testing time is considerably improved for the proposed strategy. Moreover, on comparing the training and testing performance the proposed HS-ANFIS has better generalisation ability than the other models in the existing works of literature.

AUTHOR CONTRIBUTIONS
The authors contributed to each part of this paper equally. The authors read and approved the final manuscript.

COMPLIANCE WITH ETHICAL STANDARDS
Funding: No funds, grants, or other support was received.