Heart Disease Detection Using Machine Learning

This paper analyzes the detection of heart disease using machine learning algorithms and python programming. Over the post decades, heart disease is common and dangerous disease caused by fat containment. This disease occurs due to over pressure in the human body. Using different types of parameters in the dataset we can predict the cardiac-disease. We have observed a dataset consists of 12 parameters and 70000 individual data values[5] to analyze the performance of patients. The main objective of the paper is to get a better accuracy to detect the heart-disease using algorithms in which the target output counts that a person having heart disease or not.


Introduction
Python is most powerful programming language having numerous libraries which is used in this project with machine learning model. Machine learning is a subset model of arti cial intelligence network in which uses complex algorithms and deep learning neural networks. Cardio vascular disease is a widespread disease in all over a region. This type of disease may cause due to smoking, high blood pressure, diabetes, overweight, hyper tension, cholesterol etc that has to be accumulated because of the fatty foods or unlimited intake of foods or non-moving to anywhere. This disease may occur by various heart problems such as coronary-artery disease, cardio-vascular, stroke, heart failure and much more.
Chest pain (cp), resting blood pressure, cholesterol, resting electrocardiographic results, fasting blood sugar(fbs), maximum heart achieved, exercise induced angina, ST depression induced by exercise relative to rest, slope of the peak exercise ST segment, number of major vessels colored by uoroscopy etc… are the major reasons for causing heart problems but we have a attributes of individual person like height, weight, systolic blood pressure, diastolic blood pressure, cholesterol, glucose, smoke, alcohol, active (physically active person). Python libraries are the pre-requisites for making prediction in which SKLEARN is basically used in machine learning predictions. From SKLEARN, we will be able to preprocess the data by splitting the attributes and labels, test and train data, and also scale the values in the data to be values between 0 and 1 by importing the library STANDARDSCALAR. Also SEABORN is another library used in our prediction to correlate each and every attributes together. At last the confusion matrix decides accuracy perfectly by importing CONFUSION MATRIX.
The paper represented in, is Compared with KNN, SVM, Random classi er, decision tree classi er given accurate result for Heart Disease Prediction System-HDPS. The prediction was made better accuracy of 98.83% by decision tree machine learning method than other methods (O.E.Taylor) [3].
Better data mining techniques when predicting heart disease (Animesh Hazra). In this paper, c4.5, kmeans, decision tree, SVM, naïve bayes and all other machine learning algorithms are compared to get a better accuracy of heart disease [1].
On the other hand, Praveen Kumar Reddy, 2019, Try to reduce the occurrences of heart disease using decision tree algorithm. In this, Support Vector Machine algorithm classi es the data values by using hyper plane and decision tree is implemented by Gini index method in which highest gain of the attributes gives a better representation of decision tree algorithm [2]. iii. Applying Algorithms Comparing 4-machine learning algorithms such as SVM, Decision tree, Random forest classi er and K-nearest neighbor to get the better accuracy to which highest parameter may cause disease.

Methodology And Analysis
For each algorithm, there is a pseudo code helpful to develop any kind of programming language. In python, there is a simple way to establish any kind of algorithm in which simple and short code easier to predict accuracy.
The algorithms used in this project is highly helpful to predict the accurate result to detect heart disease in which factors that cause a disease can be detected. The following algorithms have built in this project.
i. K-Nearest Neighbor algorithm: KNN is a supervised classi er that carry-outs a observations from within a test set to predict classi cation labels. KNN is one of the classi cation technique used whenever there is a classi cation. It has a few assumptions includes dataset has little noise, labeled and it should contains relevant features.
By applying KNN in large datasets takes long time to process. The accuracy gained with this algorithm is 63.4%.
ii. Random Forest Classi er: Random forest classi er is a powerful tool in the machine learning library. With this classi er, we will be able to get higher accuracy and training time should be less. Initially, we have to build a model and by splitting variables into training and test set. After splitting the data, train the dependent variables and predict the response. By using the random forest classi er, the accuracy predicted result is of approximately 71% but actually 71.4%.
iii. Decision tree classi er: In this algorithm, preprocessing made initially by splitting data into training and test data .Feature scaling can be done because of normalizing the values before prediction. Import a decision tree classi er to t the training sets of dependent and independent variables in which Gini-index criterion is used to predict the accuracy or response for the test set. The accuracy gained with this algorithm is 68.4%. iv. Support Vector Machine (SVM): SVM is also one of the classi cation algorithms in machine learning in which better accuracy can be predicted. In comparison of other algorithms, it is better for predicting accuracy in an expected way.
In our prediction, predicted highest accuracy is 72.5% using linear SVM kernel.
In our prediction, predicted highest accuracy is 86.2% using Gaussian SVM kernel.

A. Results and Visualization:
Our main goal is to predict the accuracy for future problems that the disease may cause and which algorithm gives more accuracy that can be made for the target output counts that a person having Heart disease or not.
The imported dataset can be processed and correlated to each other and visualize the correlation for each attribute with another attribute to each other by Heap map shows highest correlation for cholesterol and glucose.
For the above KNN classi er score in the range 1 to 11, the accuracy rate predicted at 69.8%.If any value x with the 'k' by assuming number of neighbor, it will re ect the prediction rate nearly 69%-70% because large data is used.