Intelligent SMOTE Based Machine learning Classication for Fetal State on Cardiotocography Dataset

A major contributor to under-ve mortality is the death of children in the 1st month of life. Intrapartum complications are one of the major causes of perinatal mortality. Fetal cardiotocograph (CTGs) can be used as a monitoring tool to identify high-risk women during labor. The objective of this study was to study the precision of machine learning algorithm techniques on CTG data in identifying high-risk fetuses. CTG data of 2126 pregnant women were obtained from the University of California Irvine Machine Learning Repository. Out of 2126 CTG dataset 78% of them were normal, 14% were suspect, and 8 % had a pathological fetal state. To improve data imbalance SMOTE is applied followed by ve different machine learning classication models were trained using CTG data. Sensitivity, precision, and F1 score for each class and overall accuracy of each model were obtained to predict normal, suspect, and pathological fetal states. For the model validity two statistical parameters MCC & kappa (k) are used. SMOTE based all the classication algorithm provides the higher degree of accuracy with minimum value is 96% and RF algorithm had the highest prediction accuracy about 98.01% which is quite satisfactory. Model validation statistical parameters MCC & kappa is maximum achieved by RF about 0.968 & 1 and for SVC is 0.977 & 1 respectively. Finally proposed work also compared with previous state of art techniques.


Introduction
Globally 2.4 million children died in the rst month of life in 2019. There are approximately 7 000 newborn deaths every day, amounting to 47% of all child deaths under the age of 5-years, up from 40% in 1990. In a pregnancy cycle, the fetal heart rate (FHR) is one of the most important evidence about the fetus [1]. The obstetricians are using cardiotocograph (CTG) to get information that includes FHR and uterine contractions (UC) related to the fetus. The CTG intended for not only to get FHR, but also to observe the mother's contractions and other kinds of fetal monitoring [2]. CTG is a medical test utilized throughout pregnancy which records UC and FHR. This test can be employed by either external or internal techniques. With internal test, a catheter is located in the uterus after a precise quantity of expansion has taken place. In external tests, a pair of sensor nodes is attached to the mother's stomach. The CTG data usually represents two lines. The upper line records the FHR in beats per minute. The lower line records uterine contractions [3]. In order to nd fetal risks based on CTG, machine learning techniques turn out to be an increasing trend to produce decision support systems in medicine. Different studies have carried out for the classi cation of the CTG data [4].
The information taken from CTG is utilized for early identi cation of a pathological state and can help the obstetrician to predict future problems and hinder before occurring a permanent impairment to the fetus. Throughout the delivery of the baby who is showing to hypoxia can cause a temporary impairment or death. Because of the wrong diagnosis of the FHR pattern recordings and inappropriate treatments employed to the fetus can achieve more than half of these deaths [5]- [7]. While its practicality, there might be some inconsistency in the success of CTG monitoring, predominantly in low-risk pregnancies. If there is an inaccurately evaluated fetal pain then, it might be results in useless treatments or if there is an inappropriate investigation of fetal wellbeing then it might be excluded essential treatments [8].
CTG data using three different machine learning techniques to predict fetal distress [9]. An employment of statistical features extracted from Empirical Mode Decomposition (EMD) [10].The extracted features from the sub-band decomposition classi ed as normal or risky. They achieved 86% accuracy for the test data.
Another study presented a two-steps examination of fetal heart rate data which permits for effective prediction of the acidemia risk. The FHR signals are classi ed by Support Vector Machines (SVM), fuzzy, Multilayer perceptron. A new model which utilize the arti cial neural network (ANN) to classify the CTG data [11]. The Recall and F-score were employed to assess the performance. Moreover, they proposed the k-means clustering for the CTG classi cation. Adaptive neuro-fuzzy inference systems (ANFIS) is utilized [12] for the CTG classi cation. Moreover, SVM and Genetic Algorithm (GA) based classi cation method was implemented [13].  [20]. Prediction of FHR has been performed by hybrid ADB with SVM applied in a CTG datasets which contain 2126 dataset with 21 attributes. Overall research performed into two stage, in rst stage PCA used to sorted the potential attributes & later on hybrid classi cation algorithm used [21]..Maximum accuracy obtained by the proposed model is 98.6% for selected attributes.
In this paper, several ensemble machine-learning models examined to classify the CTG data as unhealthy or healthy based on the three obstetricians' decisions. The contribution of this paper is to implement Bagging ensemble method to classify the CTG data. To the best of the authors' knowledge, the Bagging ensemble classi ers have not been employed previously for the CTG classi cation. Hence, this paper compares the performances of the single and ensemble learners in terms of F-measure, accuracy, and ROC area. Hence, in section 2, materials and methods are presented. In section 3, results and discussion presented. Section 4 is conclusion.

Methods
The dataset was obtained from the University of California Irvine Machine Learning Repository. It comprised of 2126 pregnant women who were in the third trimester of pregnancy. The dataset consisted of 35 attributes used in the measurements of FHR and uterine contractions (UCs) on CTG ( Figure 1).
According to the Child Health and Human Development, the core risk variable used to derive the state of fetus includes qualitative and quantitative descriptions of FHR and UCs [22]. The machine learning algorithms used in this study were Decision Tree, Random Forest, KNN, SVC & Linear SVC. The current dataset was split into training and testing folds using K-Fold Cross Validation technique to test the performance of each machine learning model in the training phase.

Feature Selection Approaches:
The increase in diagnosis cost and the huge volume of data produced by different sources consist of the number of attributes. All attributes may not be useful, thus it is necessary to remove them during data preprocessing or feature selection. The feature selected attributes would, in turn, improve the performance to build a better classi cation. The various feature selection methods such as embedded, ensemble and hybrid methods, lter methods and wrapper methods have been applied to study the fetal heart rate or CTG analysis [23]. In this research we used Symmetrical Uncertainty based feature selection methods [24] [24] and ve classi cation algorithms such as Decision Tree, Random Forest, KNN, SVC & Linear SVC to study the CTG data analysis.

Data Sources:
The publicly UCI machine learning repository has been used to retrieve the Cardiotocograph (CTG) dataset available at https://archive.ics.uci.edu/ml/datasets/Cardiotocography. The multivariate data type consists of 2126 instances with 35 attributes, which are numeric. The class attribute consists of 3 distinct values, which are Normal, Suspect, and Pathologic. The frequencies of 2126 instances are as follows: 1655 normal, 295 suspicious and 176 pathologic, indicating the uneven distribution of the observations across the classes, which refers to class imbalance dataset. The imbalanced datasets require special attention because the regular classi ers accuracies are inappropriate to use for class imbalance [5], [25], [26], since these classi ers generally favor the majority class i.e., the class with a large number of instances. The performance of the classi er can be improved by the ensemble of classi ers.
However, the majority of ensembles is static and cannot be applied to imbalanced datasets [27]. Apart from this, based on experimental results, it is known that the performance on the balanced dataset is better than the imbalance dataset [28]. In the view of aforementioned sentences, the dataset used in this study consists of 248 normal fetal state class randomly derived from 1655 instances from UCI repository CTG dataset), keeping other class codes as the same (i.e., 295 suspicious and 176 pathologic) with 23 attributes is shown in

Attributes Description:
The dataset consists of 23 attributes. The predictable attribute is referred to "NSP: Fetal state class code (N = normal; S = suspect; P = pathologic)". The description of the attributes is shown in Table 2.

Result Analysis
To the best of author knowledge, most of the classi cation model studies have been carried out on the UCI machine learning repository CTG dataset [29], [30]. Thus, there were no studies addressing the derived dataset with the ve machine learning techniques. To measure the performance of each classi cation algorithm, the accuracy has been taken into accordance. The key outcome of this study was to compare major machine learning algorithms (listed above) with regard to their precision accuracy and sensitivity to predict normal, suspect, or pathologic fetal state based on CTG attributes. Various statistical techniques were used to compare the performance of the algorithms. These included precision, sensitivity or recall, F1 score, and overall accuracy ([true positive + true negative]/[true positive + true negative + false positive + false negative]).
On the provided dataset, the experiment is run, and the results are produced. Each experiment is evaluated to strati ed K-fold validation to ensure that the results are free of bias. The major goal is to remove any bias in the outcomes, as feature engineering sometimes leads to the omission of speci c characteristics, which might affect overall prediction results. Furthermore, the process of feature engineering is typically highly costly. Machine learning algorithms are provided raw data after some preparation. The ndings are then obtained and compared to current state-of-the-art systems. The dataset was examined, and methods were used when needed, and the model was trained to improve the precision.

Performance Measure
One of the most blatant misrepresentations about machine learning model assessment is that every dataset, regardless of its type, can be quanti ed using the same evaluation matrices. The majority of machine learning models are judged on their accuracy [31]- [39]. When working with an unbalanced dataset, this deliberate proves to be deceiving. As a result, several appropriate assessment matrices, as well as accuracy, are employed. Precision, recall, the F1 measure, and the ROC curve were used to evaluate the proposed study [40], [41].

Classi cation model evaluation
The reason for the assessment of a classi cation model is to achieve a solid evaluation of the model that is known as the model's predictive performance. Diverse execution parameters can be utilized.
Provided that the model is dependent on training set and has speculation property which is basis for the quality assessment. For any assessment measure, it is imperative to recognize its incentive for a speci c dataset performance, particularly the training set performance, and its true generalization performance. The created model's training performance is determined by assessing the model on the training set.
However, the aim of classi cation models is not to categorize the training data. Suitable evaluation processes are required to dependably evaluate the unfamiliar values of the assumed performance measures on the whole domain [43], [44].

Statistical Measurement
The Mathews correlation coe cient (MCC) is a metric for evaluating binary classi cation quality [45]- [47]. The Matthews correlation coe cient is a contingency matrix technique of calculating the Pearson product-moment correlation coe cient between actual and predicted values that is unaffected by the unbalanced datasets issue. MCC is the only binary classi cation rate that awards a high score only if the binary predictor accurately predicts the majority of positive and negative data instances. It has a range of [1, +1], with extreme values of -1 and +1 for perfect misclassi cation and perfect classi cation, respectively, and for coin tossing classi er MCC=0. Equation (1) is demonstrating the MCC.
The kappa (k) statistic is a key parameter for judging the model's consistency[48]- [50]. It compares the outcome of the suggested model to the outcome of the randomly classi ed technique. The kappa statistic's value ranged from 0 to 1. The model's expected effect is represented by a value near to 1, whereas 0 indicates that the model is awed. (2), (3), and (4) demonstrate the kappa statistic's equation.
In present research range of kappa value 0.702 to 1 indicates proposed model attains great consistency.
Both the values of MCC & kappa for all the algorithms are shown in Table 4.

Comparison with existing system
The proposed work's ndings are being compared to the results of other state-of-the-art existing system in order to ascertain the proposed work's trustworthiness.

Conclusion
Classi cation of accuracy from CTG dataset is a one of major challenges in the medical diagnosis system. Delayed detection of pathologic fetal state based on CTG attributes may caused serious health issue of mother & baby so early diagnosis is important. In modern research for early detection in medical diagnosis ML techniques have been introduced. ML techniques are the subsection of AI which has capability to learn the large amount of unlabeled & unstructured data in few seconds. In this research we proposed existing techniques for diagnosis of early detection of pathologic fetal state on CTG datasets.
In last decades there are several research performed for the detection of pathologic fetal state in terms of accuracy. All the approaches used same dataset (CTG) for their training & testing model. In Ml there are number shortcomings for the prediction of Diabetes like accuracy & identi cation of potential attributes etc. Hence, a model must be designed in such a manner in future so that it able to overcome these shortcomings. In this research we performed the overall research into three stages: In rst stage imbalance CTG data oversampled by SMOTE, in second stage we used hyper parameter tuning of training dataset to reduce the model's complexity & make a trade-off between these components and nal stage we applied six Machine learning technique for testing data classi cation.  Figure 1 Attributes of CTG datasets Flowchart of the proposed model Comparative study of proposed ML techniques on ROC Curve Figure 5 Comparative study based on Statistical parameters Figure 6 Comparative study between state of art techniques with present research