To the best of author knowledge, most of the classification model studies have been carried out on the UCI machine learning repository CTG dataset [29], [30]. Thus, there were no studies addressing the derived dataset with the five machine learning techniques. To measure the performance of each classification algorithm, the accuracy has been taken into accordance. The key outcome of this study was to compare major machine learning algorithms (listed above) with regard to their precision accuracy and sensitivity to predict normal, suspect, or pathologic fetal state based on CTG attributes. Various statistical techniques were used to compare the performance of the algorithms. These included precision, sensitivity or recall, F1 score, and overall accuracy ([true positive + true negative]/[true positive + true negative + false positive + false negative]).
On the provided dataset, the experiment is run, and the results are produced. Each experiment is evaluated to stratified Kfold validation to ensure that the results are free of bias. The major goal is to remove any bias in the outcomes, as feature engineering sometimes leads to the omission of specific characteristics, which might affect overall prediction results. Furthermore, the process of feature engineering is typically highly costly. Machine learning algorithms are provided raw data after some preparation. The findings are then obtained and compared to current stateoftheart systems. The dataset was examined, and methods were used when needed, and the model was trained to improve the precision.
3.1 Performance Measure
One of the most blatant misrepresentations about machine learning model assessment is that every dataset, regardless of its type, can be quantified using the same evaluation matrices. The majority of machine learning models are judged on their accuracy[31]–[39]. When working with an unbalanced dataset, this deliberate proves to be deceiving. As a result, several appropriate assessment matrices, as well as accuracy, are employed. Precision, recall, the F1 measure, and the ROC curve were used to evaluate the proposed study[40], [41]. The accuracy ratio is the number of correct predictions divided by the total number of inputs. The confusion matrices are obtained by calculating the true positive (TP), true negative (TN), false positive (FP), and false negative (FN) values (FN). TP/ (FN + TP) and FP/ (FP + TN) are two considerations that are computed as TP/(FN + TP) and FP/(FP + TN). Another statistic common screening a model's classification accuracy is the receiver operation curve (ROC)[42].
Table 3. Performance of SMOTE based Classifier Algorithms
Algorithm

Accuracy

Precision

Recall

F1 Score

AUC

Decision Tree

96.8%

96.8%

96.5%

96.4%

0.89

Random Forest

98.01%

97.8%

97.7%

97.5%

0.96

KNN

96.2%

96.2%

96%

96%

0.92

SVC

97.7%

97.5%

97%

97%

0.96

Linear SVC

97%

97%

97%

97%

0.64

Table 3 shows the tabular form result analysis of Average Accuracy, Precision ,F1 score , Precision Area under ROC & Computational Time for SMOTE based Logistic Regression , Random Forest , Decision tree, KNN & SVM models when trained and tested on the Tabular data consisting of actual 540 records. Results are obtained after principle component analysis & SMOTE. Average parameters are calculated for both negative & positive classes’ cases. We found that SMOTE based Random Forest performed best among the entire SMOTE based algorithm with the Average Accuracy, precision, Recall & F1 Score values of 98.01%, 97.8%, 97.7% & 97.5 % respectively. However the SMOTE based Random Forest and SMOTE based KNN & SVM have least computational time & maximum area under ROC with the values of 0.10 sec & 96% respectively.
Figure 3 shows graphical comparison for the entire SMOTE based machine learning algorithm on the basis of computational time. From the plot we found that SMOTE based Random Forest have least computational time of 0.010 sec while SMOTE based Decision Tree have a maximum time of 0.031 sec.
3.2. Classification model evaluation
The reason for the assessment of a classification model is to achieve a solid evaluation of the model that is known as the model’s predictive performance. Diverse execution parameters can be utilized.
Provided that the model is dependent on training set and has speculation property which is basis for the quality assessment. For any assessment measure, it is imperative to recognize its incentive for a specific dataset performance, particularly the training set performance, and its true generalization performance. The created model’s training performance is determined by assessing the model on the training set. However, the aim of classification models is not to categorize the training data. Suitable evaluation processes are required to dependably evaluate the unfamiliar values of the assumed performance measures on the whole domain [43], [44].
3.3 Statistical Measurement
The Mathews correlation coefficient (MCC) is a metric for evaluating binary classification quality[45]–[47]. The Matthews correlation coefficient is a contingency matrix technique of calculating the Pearson productmoment correlation coefficient between actual and predicted values that is unaffected by the unbalanced datasets issue. MCC is the only binary classification rate that awards a high score only if the binary predictor accurately predicts the majority of positive and negative data instances. It has a range of [1, +1], with extreme values of –1 and +1 for perfect misclassification and perfect classification, respectively, and for coin tossing classifier MCC=0. Equation (1) is demonstrating the MCC.
The kappa (k) statistic is a key parameter for judging the model’s consistency[48]–[50]. It compares the outcome of the suggested model to the outcome of the randomly classified technique. The kappa statistic’s value ranged from 0 to 1. The model’s expected effect is represented by a value near to 1, whereas 0 indicates that the model is flawed. (2), (3), and (4) demonstrate the kappa statistic’s equation.
In present research range of kappa value 0.702 to 1 indicates proposed model attains great consistency. Both the values of MCC & kappa for all the algorithms are shown in Table 4.
Table 5. Statistical Measure after experimentation
Algorithm

MCC

Kappa(k)

Decision Tree

0.741

0.958

Random Forest

0.968

1

KNN

0.789

0.95

SVC

0.957

1

Linear SVC

0.541

0.857

3.4 Comparison with existing system
The proposed work’s findings are being compared to the results of other stateoftheart existing system in order to ascertain the proposed work’s trustworthiness.
Table 6. Comparative study between proposed models with existing model
Reference

Algorithm used

Outcomes from the research

[14]

PSO based KNN & SVM

Overall accuracy for PSO feature selection based KNN achieved the maximum accuracy 88.5%

[15]

SVM & hybrid K means SVM

Maximum accuracy obtained by the K means SVM with 90.64% , where k=10

[17]

Random Forest

Random Forest with seven important feature classify the CTG data with maximum accuracy 93.6%

[18]

Bagging approach with three different decision tree algorithms : Random forest , REP Tree & J48 & correlation feature selection were used

All the proposed algorithm achieved overall accuracy was about to 90%.

[20]

Naive Bayes, Decision Tree, Multi Layer Perceptron and Radial Basis Function

Maximum accuracy obtained by Decision Tree for 15 potential attributes about 93.3%

Present Research

Decision Tree, Random Forest , SVC , KNN, Linear SVC

Maximum accuracy obtained by SMOTE based Random Forest is about 98.01% for 23 attributes
