Kidney is one of the most important body organs that filtrates all the wastes and water from human body to make urine. Chronic Kidney Disease (CKD), also commonly known as chronic renal disease or chronic kidney failure, is a life threatening disease that is attributed to the failure of the kidney in performing its routine functionality. It leads to the continuous decrease of Glomerular Filtration Rate (GFR) for a period of three months or more and is a universal health problem. Some common symptoms of the disease include hypertension, irregular foamy urine, vomiting, shortness of breath, itching and cramps [1], whereas high blood pressure and diabetes are the main causes of this disorder.
CKD is often diagnosed in later stages when dialysis or kidney transplant are the only options left to save the patient’s life. Whereas, an early diagnosis can lead to the prevention of kidney failure [2]. The best way to measure kidney function or to predict stages of kidney disease is to monitor the Glomerular Filtration Rate (GFR) on regular basis [3]. GFR is calculated using age, gender, race and blood creatinine value of a person. Based on the value of GFR, CKD may be categorized into six stages as shown in Table 1.
Table 1
CKD Stages According to GFR Measurement Values
Stage | GFR | Description |
1 | 90–100 mL/min | Normal kidney function or structural abnormalities |
2 | 60–89 mL/min | Mildly reduced kidney function |
3A | 45–59 mL/min | Moderately reduced kidney function |
3B | 30–44 mL/min | Moderately reduced kidney function |
4 | 15–29 mL/min | Severely reduced kidney function |
5 | < 15 mL/min or dialysis | End stage kidney failure |
Symptoms of CKD are not disease specific. The symptoms develop gradually and some patients have no symptoms at all. Hence, it becomes very difficult to detect the disease at early stages.
Machine Learning (ML) has recently played a significant role for the diagnosis of diseases by just analyzing the records of existing patients and training a model to predict the behavior of new patients [3]. ML is a branch of Artificial Intelligence in which the computing machine learns automatically and thus the prediction gets better from training experiences. A category of ML is supervised learning which may be used for regression or classification of dataset. ML is being used very effectively in different domains, especially, in the biomedical field for the detection and classification of several diseases. Different ML algorithms may be used to predict diseases with each one having its own strength and weaknesses. Among these, decision-tree provides classified reports for kidney related diseases with more accuracy [3]. Thus, it seems quite suitable to be used to build a prediction system to diagnose kidney diseases at early stage.
CKD has been recognized as a leading public health issue. Millions of people die each year due to inadequate provision of healthcare, lack of health education [25] and high cost treatment of CKD. According to the global facts about kidney diseases, globally, 13.4% estimated population is affected by CKD [24]. Many studies have been conducted to predict the stages of CKD using different classification algorithms and acquired expected results of their proposed model. S. Ramya et. al. [7] worked on Random Forest, Radial Basis Function and Back propagation Neural Network for the classification of CKD. Their comparative study revealed that Radial Basis Function provides the best accuracy rate with 85.3 percentage. Jing Xiao [8] established nine models and compared their performance to predict the CKD stages according to its severity. Predictive models include ridge regression, lasso regression, logistic regression, Elastic Net, XG Boost, neural network, k-nearest neighbor, random forest and support vector machine. Results of experiments obtained in their study, show that the Elastic net model produced the highest sensitivity, i.e., 0.85. Logistic regression provided the best results for sensitivity, specificity and Area Under the Curve (AUC) with 0.83, 0.82 and 0.873, respectively. El-Houssainy et al. [12] applied Probabilistic Neural Networks (PNN), Support Vector Machine (SVM) and Multilayer Perceptron (MLP) on the dataset to predict the severity of CKD. Their study resulted in a 96.7% classification accuracy, which is the highest derived by PNN with 12 seconds execution time, whereas, MLP had shown time efficiency and derived results with a minimum execution time of 3 seconds.
However, to the best of our knowledge, no work is conducted to detect the stages of CKD using age, sex, race and Serum Creatinine attributes. In this study, we focus on using two machine learning algorithms i.e. J48 and Random Forest, to predict the stages of CKD. Our study reveals more accurate results than most of the existing studies, i.e., we achieved 85.5% accuracy using the J48 algorithm within 0.03 seconds and 78.25% accuracy using the random forest algorithm within 0.28 seconds.