An Adaptive Neuro-Fuzzy Expert System for Clinical Staging of Patients with Chronic Kidney Disease

Background : Chronic kidney disease (CKD) is a health complication faced by almost every nation, as Ghana is no exception. The Ghana Dialysis Service Foundation (DSF) indicated that an average of 12,000 kidney-failure cases is diagnosed among Ghanaians every year. An adequate diagnosis of suspected CKD will play a critical role in saving many lives. This study widens the predictive factors grouped as; personal lifestyle, laboratory findings, and medical history of a patient. The glomerulus filtration rate addresses the controversy of diagnosing CKD without considering aetiology. Methods: The study adopted a structured, guided questionnaire to obtain the dataset from health records of 180 patients diagnosed with CKD. The Renal clinic of Komfo Anokye Teaching hospital gave access to the patients' health records following the clearance from the ethical committee in Kwame Nkrumah University of Science and Technology. The health records were categorized into Personal Lifestyle, Laboratory findings and Medical history. Eleven factors that influence the incidence of chronic kidney disease were identified and analyzed in this work. The developed model is based on a hybrid of Artificial Neural Network and Fuzzy logic (ANFIS) techniques. Results: An experimental result of the proposed predictive model recorded an average Root Mean Square Error of 0.85014 for training and 1.3983 for testing. The result showed a close relationship between the actual and predicted outcomes for the training and testing. The Modification of Diet for Renal Disease (MDRD) model is applied to the dataset to estimate the Glomerulus Filtration Rate to determine the stage of the Chronic Kidney Disease. The developed model (NANFIS) gave a better performance than the Modification of Diet for Renal Disease MODEL results. Conclusions: Considering the aetiology that widens the predictive factors provides an improved perspective in predicting the stage of the disease. The Modification of Diet for Renal Disease model considers Age, race, sex and creatinine to determine the location of Chronic Kidney Disease. The increase in data would help to improve the performance of the developed model better.


Background
Chronic kidney disease (CKD) is a health complication faced by almost every nation, as Ghana is no exception [1]. There were approximately 35 million deaths in 2005, as stated by the World Health Organization (WHO) [2]. Over two million people worldwide need dialysis or a kidney transplant to stay alive, with these patients forming just about 10% of people who need these treatments to live [3]. The individual's lifestyle remains a crucial determinant for a person diagnosed with chronic kidney disease [4]. Chronic kidney disease with the stages of 1-3 (milder CKD) affects approximately 7% of the world's population, with the highest cases recorded in developing countries. Predicting the clinical stage of the chronic kidney disease as well as the progression of the disease with some level of accuracy requires an elaborate consideration of factors that may influence the incidence of the disease. The stage of CKD determines the severity of the disease. The rate of progression and the definition of the stage is quite critical as it gives several indications that support the required intervention and treatments.
The lifestyle decisions and practices of an individual directly influence health and wellbeing. Practices such as consuming high amounts of salt, alcohol and self-medication do not promote healthy kidney life. The human body is most active during the day, the same as its ability to carry out its metabolic activities. The individual's lifestyle burdens the kidney if the practices that promote quality health are not encouraged. It can be seen as the keeping of unhealthy way of life. Hence, the absence of physical activity or exercise, eating late into the night, having bedtime snacks cause Chronic Kidney Disease, which otherwise will be different when the person maintains a healthy lifestyle [5] (Michishita et al., 2016). Besides lifestyle decisions and practices, many medical conditions affect the health of the kidney, i.e. diabetes, high blood pressure (hypertension), Glomerulonephritis, Polycystic Kidney Disease and kidney stones is a direct product of one's lifestyle practices.

Institutional Background
The data source is the renal clinic department of the Komfo Anokye Teaching Hospital in the Ashanti Region. The hospital's geographic location makes it accessible to all the towns or areas that share boundaries with the region. Referrals are received from all upper areas, including Northern, Upper East and West, Bono and Ahafo, formally Brong Ahafo and some part of the Volta region. [6].

Methods Data
A retrospective review of patients' health records spanning over ten years  is carried out to obtain the dataset needed to build the predictive model. The risk factors of chronic kidney disease as identified from literature are used to collect the data. These attributes were divided into three main categories; Personal Lifestyle, Laboratory findings, and Medical history.
The laboratory and medical history data are made up of the following attributes; Age, Gender, Family History (per CKD), Alcohol History, and rate of Salt Intake for the Lifestyle, Serum creatinine, Specific Gravity, Albumin and Urea/BUN for Laboratory result and year of diagnosis for Hypertension and Diabetes. The attributes listed above are the medical history data components required to predict chronic kidney disease. Table 1 gives detail on the features or risk factors for the development of the predictive model.
Ageing or Age as a risk factor influences the level of the glomerulus filtration rate. The increase in Age causes a reduction in the value of the estimated glomerulus filtration rate (eGFR), which increases the risk of developing CKD. The genetic predisposition that results from specific genetic variations, often inherited from a parent (Family History), also increases an individual's susceptibility to developing chronic kidney disease. An example is the rare types of kidney disease inherited from one's parent, i.e., Polycystic kidney disease (PKD) and Fabry disease [7].
Creatinine is a waste product formed by the normal breakdown of muscle cells in the body [8]. Laboratory test of Serum Creatinine serves to prompt most people who may not be aware of the danger their kidney poses to them.
Symptoms identified with too little kidney function include loss of appetite, vomiting, itching, weakness, and flu-like symptoms. Swelling in the legs and shortness of breath may occur if water builds up in the body [9].
Salt Intake or the amount of salt consumption also plays a vital role in the proper functioning of various organs in the body. It is a significant electrolyte source for controlling fluids entering and leaving the body tissues and cells. The human body excretes unwanted fluid by filtering the blood through the kidneys. Extra fluid not needed by the body is sucked out and put into the bladder to be removed as urine. The process continues by the kidneys using osmosis to draw the extra water out of the blood. This process uses a delicate balance of sodium and potassium to pull the water across a wall of cells from the bloodstream into a collecting channel that leads to the bladder [10]. Properly managing the salt intake of patients plays a fundamental role in maximizing the beneficial effect of Angiotensin-converting enzyme (ACE) inhibitors on CKD progression as well [11].
The Specific Gravity measurement allows physicians to know the concentration of all particles present in the patient's urine. A high level of Specific Gravity indicates that there is a contamination of glucose. The measurement can also indicate diabetes insipidus, glomerulonephritis, etc.
Albumin, a significant protein present in the blood, help determine how the kidney is functioning. Kidney disease is also associated with reduced urea (Urea/BUN) excretion, leading to a consequent rise in blood concentration.

Chronic Kidney Disease (CKD)
The Glomerular Filtration Rate (GFR) is the most common criteria to measure how healthy one's kidney is performing, estimated using the Modification of Diet in Renal Disease (MDRD) model. MDRD is the most preferred method for estimating the GFR of CKD patients with GFR less than 60ml/min/1.73m 2 . [12] (Matsushita et al. from Chiu et al., 2013). The severity of CKD of a patient is dependent on the stage of the disease ( Table 2). A person with a persistent malfunctioning kidney for three months or more qualifies to be labelled as a CKD patient. The evidence relates to some diseases such as cancer, cardiovascular disease, diabetes, etc. These diseases share common modifiable risk factors: alcohol consumption, Body Mass Index (BMI), cigarette smoking, unhealthy diet, and physically inactive [13]. Modifiable lifestyle risk factors are the lifestyles of an individual exhibited in their daily activities. The modifiable lifestyle risk factors include Body Mass Index (BMI), level of consumption of alcohol and cigarette, daily fruit and vegetable consumption (diet), physical activity [13].
The MDRD method estimates the GFR using creatinine measurement, sex, race and Age, which can be seen to limit the number of factors needed to predict the stage of CKD efficiently, a challenge identified as a controversy, that is, diagnosing CKD without any consideration of aetiology [14]. This work improves the number of features to determine the stage of Chronic Kidney

Features/Variable Statistics
The statistical analysis of the collected data reveals that the average Age of patients considered in this work is forty-nine years (49yrs), with an average serum creatinine measure at 529 mol/L, which is also very high, having a more significant influence on the incidence of chronic kidney disease. The mean values of diabetes also show that the patients diagnosed with a diabetic condition is most likely to have a high value for the serum creatinine. The standard deviation for the various attributes, especially for the attribute age, shows that, even though the minimum Age is eighteen years (18yrs), most of the patients with chronic kidney disease can be thought to be within Age thirty-four sixty-four (64yrs). Table 6 gives the exact values for the statistical results of the collected data.

Machine Learning Techniques
The adaptive Neuro-Fuzzy Inference System is the machine learning technique used in this study. It is a hybrid of an Artificial Neural Network and a Fuzzy Inference System. The strength of fuzzy logic to reason under uncertainty and that of the artificial neural network to learn from a given dataset is combined to help predict the risk of chronic kidney disease of an individual. The Artificial Neural Network and the Fuzzy Inference System are supervised and unsupervised learning techniques, respectively. The fuzzy Inference System uses the subtractive clustering algorithm to generate the rules of the Fuzzy Inference System [15].
The machine-learning algorithm uses two phases; the learning phase and the testing /validation phase. The training phase ensures that the predictive or classification model is built by using the training dataset. The testing /validation phase, on the testing dataset, estimates the performance of the generated/built models.
Artificial Neural Network.

The work of Warren McCulloch and Walter
Pitts in 1948 laid the foundation of Artificial Neural Network (ANN), which aimed to describe how the primary processing of the elements of the brain function. The ANN is made up input layer, hidden layer(s) and the output layer. Each layer consists of many nodes with its associated parameters. The nodes are connected through directional links identified with neurons that make up the brain. ANN can implicitly detect complex nonlinear relationships between dependent and independent variables, i.e. detecting all possible interactions between predictor variables and the availability of multiple training algorithms. The adaptive ability of a node in the neural network is due to the output of each node. The result of each node is dependent on its associated parameters from the data set. The neural network computes parameters for each node, and these parameters are updated based on the learning rule to minimize the error measure. The learning rule for a given ANN is usually Gradient Descent and the chain rule. Still, due to the limitation of the Gradient Descent, which is slowness and tendency to be trapped in local minima, Hybrid Learning Rule is preferred.

Fuzzy Inference System.
Fuzzy inference is formulating the mapping from a given input to an output using fuzzy logic. Fuzzy logic is the technique used to derive logic described as fuzziness describing a phenomenon or quantity ratio. Unlike the usual uncertainty in the computation of quantitative analysis, the fuzzy inference system employs the fuzzy if-then rules to model the qualitative aspect of human knowledge and reasoning processes. This provides the mathematical tools to build a fuzzy model  Different inference systems follow the type of defuzzification method used; Type 1 uses Weighted Average, Type 2 uses the centroid of area, mean of maxima, bisector of area, and maximum criteria. Type 3 of the fuzzy The inference system is the Takagi-Sugeno-Kang inference system whose output is by a linear combination of the result of each rule plus a constant term. The final output of the Takagi-Sugeno-Kang is the weighted average of the output of each rule.

Fuzzification
Fuzzy Inference Engine Defuzzification Crisp Input

Fuzzy Rule Database
The Adaptive Neuro-Fuzzy Inference System is based on the Takagi-Sugeno Inference system. Takagi-Sugeno-Kang (TSK) inference system, besides Mamdani fuzzy system, is one of the widely used fuzzy models. The approach of the TSK-style fuzzy system produces the crisps output directly as the rule consequents use polynomials.
A TSK-style fuzzy rule base comprising of n rules with m antecedents each takes the form: Where and 0 , are the number of rules and the constant parameters of the linear function of the rule consequents, respectively, i∈ {1,2, … , } ∈ {1,2, … , }.

Adaptive Neuro-Fuzzy Inference System.
The ANFIS technique as applied in this work is identified by the style of the rule and the consequent, which is based on the TSK-fuzzy style (type 3 fuzzy inference system). Figure  2 shows the ANFIS generalized architecture.
Let assume a fuzzy inference system with; Two inputs x and y and one output, z.
Rule base containing two rules.

LAYER 1
Every node in this layer is a square node with a node function 1 = ( ) Where x is the input to node and is the linguistic label/value. In other words, is the membership function of 1 and it specifies the degree to which the given x satisfies the quantifier . The bell-shaped function is chosen for . Or Where { , , } is the parameter set. Changes in the parameters also influence the shape of the bell-shape function on the linguistic label . Other membership functions such trapezoidal, triangularshaped. Parameters in this layer are referred to as premise parameters.

LAYER 2
Every node in this layer is a circle node labelled Π, which multiplies the incoming signals and sends the product out as an output The output of every node represents the firing strength of a rule. T-norm operator that performs generalize AND can be used as node function in this layer.

LAYER 3
Every node in this layer is a circle node labelled N. The ℎ node calculates the ratio of the … rule's firing strength to the sum of all firing strengths.

LAYER 4
Every node I in this layer is a square node with a node function Where W 1 ̅̅̅̅ is the output of layer three and { , , }is the parameter set. The parameter in this layer is the consequent parameter.

LAYER 5
The single node in this layer is a circle node labelled ∑ that computes the overall output as the summation of all incoming signals.
The learning algorithm implemented by the ANFIS technique is the Hybrid Learning Algorithm. In the forward pass of the Hybrid learning algorithm, functional signals go forward till Layer 4, and the consequent parameters are identified by Least Square Estimate (LSE). In the backward pass, the error rate propagates backwards, and the gradient descent updates the premise. There are four methods of updating the parameters in the ANFIS technique. These include using only the Gradient Descent, Using the Gradient Descent and One Pass of LSE, using both Gradient Descent and the LSE. The fourth method is by using the Sequential LSE only. The classification of a given data point by the ANFIS technique is by the design of the fuzzy rules. The TSK fuzzy inference system generates the rules for a fuzzy system automatically by using the available dataset. TSK attempts to obtain fuzzy rules by categorizing and organizing the given dataset into clusters using a data clustering algorithm  [16]. The algorithm proposed in [17], outline the step by step sequence followed by the TSK for the generation of the rules of the fuzzy system.
p is the dimension of the input vector.
2. Generate initial rule-based which is just one rule by the Least Square Method. The fuzzy antecedents in the initial rule are determined by Where = 1, … , . The Least Square technique is used as the Consequent parameter.
3. Construct a new rule and add it to the fuzzy rule base. The vector with the worse error is identified ( ′ ) and taken as the candidate center for a new rule.
Fuzzy antecedent , in a rule is characterized by ( , , , )j = 1, … , p (15) Center of the new rule is identified by: The consequent parameters are estimated using Locally Weighted Regression (LWR), a way to construct regression surface through a multivariate smoothing procedure. The resulting parameters of the new rule are computed by minimizing the objective function , defined as: is the weighting factor which is assumed and is the same value as the degree of membership. The weighting factor ensures that the consequence parameters are influenced only by the data point within the fuzzy set that defines the region of validity of the new rule. The fuzzy rule acts like an independent model that is only related to a subset of training data. The Hybrid learning algorithm combines a recursive singular value decompositionbased least-squares estimator and the gradient descent to tune or refine the parameters of the rule base.
5. The last step is to measure the performance of the generated model. This is by the use of Mean Square Error evaluation criteria, given as: If the MSE ≥ ε and M < then the algorithm returns to step 3; otherwise, the Overall accuracy = given termination conditions are satisfied, and the fuzzy model is constructed.

Results
The result section elaborates the data, evaluation matrix, and predictive model performance generated in this research work. The dataset consisted of 180 records of patients that generated 294 data points.

Evaluation Matrix
The key performance indicator (KPI) for the system's predictive model is the Root Mean Square Error (RMSE) and the Confusion Matrix.     The NANFIS model identified the most influencing features and the relationship between those features and the stage of chronic kidney disease. As indicated in Fig. 3 below, the features with the most influence include Age, Serum Creatinine, Gender, and the amount of Salt-Intake.
The stage of chronic kidney disease that the Age of a person most influences in conjunction with Serum Creatinine is across all the steps, as shown in Figure 3(a). The graph concerning the Age of the patient indicates that the younger the patient, the lower the setting of chronic kidney disease. The older patients with the disease usually have stage three and above cases. The Gender of the patient, as shown in Figure 3 (b), indicates that most men are diagnosed with chronic kidney disease at a later stage compared with their female counterparts. This calls for the attention of most men to have periodic medical screening to know how their kidney is performing and to get help as early as possible before their kidney become worse.
The Salt-Intake also influence the incidence of chronic kidney disease across all the stages. Consuming too much salt poses a significant challenge to the filtering ability of the kidney. On the other hand, hypertension accelerates the kidney's rate of damage, considering Figure 3(c). The CKD stage associated with hypertension begins from stage 3 and goes above stage 4. Thus, the most likely stage of prediction based on these variables in conjunction with Serum Creatinine is across all stages of chronic kidney disease. The Salt-Intake also influence the incidence of chronic kidney disease across all the stages. Consuming too much salt poses a significant challenge to the filtering ability of the kidney. On the other hand, hypertension accelerates the rate at which the kidney gets damage considering Figure 3(c). The CKD stage associated with hypertension begins from stage 3 and goes above stage 4. Thus, the most likely stage of prediction based on these variables in conjunction with Serum Creatinine is across all stages of chronic kidney disease.