A Machine Learning Technique to Analyze Depressive Disorders

Depression is an ordinary mental health care problem and the usual cause of disability worldwide. The main purpose of this research was to determine that how depression affects the life of an individual. It is a leading cause of morbidity and death. Over the last 50–60 years, large numbers of studies published various aspects including the impact of depression. The main purpose of this research is to determine whether the person is suffering from depression or not. The dataset of Depression has been taken from the Kaggle website. Guided Machine Learning classiers have helped in the highest accuracy of a dataset. Classiers like XGBoost Tree, Random Trees, Neural Network, SVM, Random Forest, C5.0, and Bay Net. From the result, it is evident that the C5.0 classier is giving the highest accuracy with 83.94 % and for each classier, the result is derived based without pre-processing.


Introduction
Depression has now become a common disease for the people nowadays. It is especially seen in youngsters due to several reasons. We feel moody, weakness, loss of energy, we can't able to take proper sleep, we also feel disturbed by society. We are unable to handle our responsibilities. It is a type of disease from which everyone is suffering it may be due to responsibilities, ignorance in life, and due to many other reasons.
A mood disorder can also be a symptom of depression, we can't able to feel too fresh, people become moody. Antidepressant medicines and psychotherapies become an effective treatment for depression. If this problem become continues for a long time then it leads to a great effect in relation and also leads to mental weakness. That's why it's recommended to treat mental disorders as soon as possible.
There is various machine learning algorithm that is used for the prediction that person is suffering from depression or not. Our main aim is to determine the accuracy of various classi er algorithms of machine learning and nd out the algorithm which is best tted for our dataset. We selected the following classi er for nding the accuracy of the test data: -XGBoost Tree, Random Trees, Neural Network, SVM, Random Forest, C5.0, Bay Net, and Random Tree.

Issue Statement And Background Knowledge
According to facial and verbal analysis techniques presented the algorithm with the help of upgraded classi cation of the data. An average detection of 82.2% in males and 70.5% in females are recorded by the system [1]. Stolar et al. determined the advanced spectral roll-off set in improvement with the help of phonic spectral features. All the features that included the best individual spectral gave an average classi cation with the accuracy of 71.4% in males and 70.6% in females [2].
Four common classi cation prototypes, including Bayes Network, C 4.5 Decision Tree, Support Vector Machine (SVM), and Arti cial Neural Network (ANN) were applied to determine the aging patients who were suffering from the prior symptoms of depression and found out the ANN showing the best results showed by Soundariya et al. [3].
To check the disclosure of depression Tsugawa et al. examined the activities of the user in social media. Through experiments, they showed features acquired from the activities of users which helped to anticipate depression of users with 69% accuracy [4].
Haque et al. explored the 3D facial features and the language vocalized to gauge the depression intensity. The embedded Convolutional Neural Network (CNN) model had been compared by this research. This model denoted a sensitivity of 83.3% and speci city of 82.6% [5].
Aldarwish and Ahmed applied Naive Bayes and SVM models on the prior processed posts from social sites. To classify SNS users they came up with a web application that could be used by depressed patients and psychiatrists. By training and formulating better models there are chances to increase the accuracy of this model in further modi cations [6].
De Choudhury et al. evolved an estimated accuracy that could be acquired by using activities of the depressed users on Twitter. For Machine Learning they obtained the training data with the help of numerous people. By using SVM they recorded the activities of the users on Twitter to predict the risk of depression among them. Experimental results showed an approximation of 70% accuracy [7].  It is a super algorithm that is useful for both regression and classi cation. This algorithm creates multiple decision trees and get prediction by each of them and nally selects the best solution. It is less accurate than the XG boost tree. It is created by using the random subspace method. In this method, multiple deep trees are trained in different parts of the same data set to achieve less variance.

SVM
SVM algorithm creates a decision boundary that segregates n-dimensional space into classes to put the new data point incorrect category. The best day which it chooses is call hyperplane. I also choose the extreme points/vectors which are called the support vectors that's why this algorithm is called Support Vector Machine.

C5.0
It is a calculation used to create a decision tree based on Quinlan's previous ID3 calculation. It is much easier to understand and deploy.

Random Forest
This algorithm simply generates multiple decision trees and further divides them into the class prediction and all the forest trees give a vote and then nally majority decision tree is chosen by this algorithm and we get the nal result. The method used by this algorithm is "bagging".

Bay Net
It is a probabilistic graphical model also known as a decision tree, Bayesian network classi er, and recognized by many other names. It depends on Bay's Theorem. It assumes that the presence/absence of any features of a variable is not related to the presence/absence of features of other variables.

XGBoost Tree
XGBoost means "Extreme Gradient Boosting". This algorithm uses a gradient boosting framework. It is used for supervised learning in Machine Learning. It performs well when the prediction involves unstructured data such as images and text.

Neural Network
This algorithm establishes a relationship within the dataset in the way the hum brain does, Neural system is similar to the system of neurons. It may be organic or arti cial. It changes the input data to generate the best possible network so there is no need to redesign the output criteria.

Confusion Matrix
Classi cation Matrix describes the performance of a classi cation model in the tabular format on a set of data for which we know the true value.

Classi cation Accuracy
It is the rate of correct classi cation. It may for an independent test set or using some variation of the cross-validation idea.

Classi cation Error
Classi cation errors come when g(X) ≠ Y. The best classi er g*, known as the Bays classi er, and it is one that minimizes the probability of classi cation error.

Precision
It is the closeness of more than two measurements. If you get a nearby value like 3.2 each time then your result would be precise. Precision is not dependent on accuracy. You may be precise but inaccurate.

Recall
It the ratio of how many times you get the correct result to the number of results.

AUC
AUC means "Area Under the ROC Curve". It measures all the 2-dimensional area under the ROC curve and measures the performance across all classi cation thresholds.

GINI
It is the probability of wrongly classi ed variables when randomly chosen.

Result And Discussion
There are two partitions of the dataset testing and training. IBM SPSS Modeler is used to nd out the result and this dataset is 70% trained and 30% tested. 7 classi ers were used to nd out the most accurate result. For each classi er, results are noted based on -(i) without SMOTE (ii) without SMOTE AUC (iii) Without SMOTE F-Measure (iv)Without SMOTE PRA.
The results of the models are as follows: The following table consists of precision, recall, F-measure, AUC, GINI coe cient, and accuracy values: - According to the AUC values of all the classi ers, the graph can be represented as:  According to the F-Measure values of all the classi ers, the following graph can be represented as: -

Conclusion
Depression now becomes a super disease among people around the globe. Around 75% of the people were suffering from depression remain untreated in developing countries [11]. This paper aims to predict whether a person is suffering from depression or not. To achieve the best result, 7 classi ers are used such as Random Tree, SVM, Neural Network, Bay Net, Random Forest, XGBoost Tree, C5.0. The result is noted without applying any lters.

Future Work
This type of study helps in the future to prevent depression. This data helps in spreading a serious effect of depression and spread health awareness among people. If this study continues it will give us a better understanding of depression and better treatment for the people who are suffering from this. In the future,