Setup and Experimentation of Models
A brief description of the hardware and software environments utilized to carry out the experiments is given below. The development environment of this study consists of an Intel Core i5-7600 CPU at 3.50GHz, 32 GB of RAM, and an NVIDIA GeForce GTX 1060 6 GB with 10 streaming multiprocessors (SM), 1280 CUDA cores, and 6144 MB of GDDR5 memory (192.19 GB/s bandwidth) connected in a PCI-Express x16 Gen 3 slot. On the other hand, we use Python 3.7 and its associated third-party libraries for building the models, processing the data, and visualizing.
Stratified K-Fold Cross Validation Technique
It is preferable to use the stratified k-fold cross-validation technique, as k-fold cross-validation is not recommended on the imbalanced dataset [44]. K-fold cross-validation is suggested to solve the generalization problem of the prediction model on an imbalanced dataset [45]. Our datasets employed in this study are imbalanced; therefore, in this study, we utilized the stratified k-fold cross-validation technique. In stratified k-fold cross validation, the dataset has to be split to make the same proportion between classes available in each fold of the dataset[44, 45]. Thus, we will shuffle the data before splitting since shuffle produces a better result.
Educational Data Analysis
The correlation between different features of the dataset is shown below (see Fig. 5). The correlation score between “Age” and “Study time” is (+ 0.086), which shows that there will be a positive and significant relationship between “Age” and “Study time”. The correlation score between “Age” and “Miss-Class” is (-0.069), which shows that there will be a negative and significant relationship between “Age” and “Miss-Class”. The correlation score between “Age” and “Health” is (+ 0.03), which shows that there will be a positive and significant relationship between “Age” and “Health”. The correlation score between “Age” and “Entrance” is (-0.0035), which shows that there will be a negative and significant relationship between “Age” and “Entrance”. The correlation score between “Age” and “CGPA” is (+ 0.0058), which shows that there will be a positive and significant relationship between “Age” and “CGPA”. The correlation score between “Study time” and “CGPA” is (+ 0.011), which shows that there will be a positive and significant relationship between “Study time” and “CGPA”. The correlation score between “Study time” and “Miss-Class” is (+ 0.002), which shows that there will be a positive and significant relationship between “Study time” and “Miss-Class”. The correlation score between “Study time” and “Health” is (-0.076), which shows that there will be a positive and significant relationship between “Study time” and “Health”. The correlation score between “Study time” and “Entrance” is (+ 0.0011), which shows that there will be a positive and significant relationship between “Study time” and “Entrance”. The correlation score between “Miss-Class” and “Entrance” is (+ 0.00097), which shows that there will be a positive and significant relationship between “Miss-Class” and “Entrance”. The correlation score between “Miss-Class” and “Health” is (-0.072), which shows that there will be a negative and significant relationship between “Miss-Class” and “Health” The correlation score between “Miss-Class” and “CGPA” is (+ 0.0075), which shows that there will be a positive and significant relationship between “Miss-Class” and “CGPA”. The correlation score between “Entrance” and “CGPA” is (+ 0.037), which shows that there will be a positive and significant relationship between “Entrance” and “CGPA”. The correlation score between “Entrance” and “Health” is (-0.0033), which shows that there will be a negative and significant relationship between “Entrance” and “Health”. The correlation score between “Health” and “CGPA” is (+ 0.064), which shows that there will be a positive and significant relationship between “Health” and “CGPA”. Moreover, CGPA has positive relationship with “Age”, “Study time”,” Miss-Class”, “Entrance result ”, and “Health”. “Health” has negative relationship with “Study time”, “Miss-Class”, “Entrance result” but, positive relationship with “CGPA” and “Age”.
Performance Comparison of Developed Models before Smoothing
Performance comparison on five models under Machine learning classier before smoothing are performed and ranked based on the accuracy score. These Machines learning classier were namely: (i.e., SVM, RF, KNN, GBOOST and DECISION TREE,) were applied to predict students’ performance (see Fig. 6).
Performance Comparison of Developed Models after Smoothing
Performance comparison on five models after smoothing under Machine learning classier are performed and ranked based on the accuracy score. These Machines learning classier were namely: (i.e., SVM, RF, KNN, GBOOST and DECISION TREE) were applied to predict students’ performance (see Fig. 7).
Figure 8 Confusion matrix of (a) Random Forest (RF), (b) Gradient Boosting Classifiers Model (GB), (c) Support Vector Machine (SVM) Model (d) K-Nearest Neighbor Classifier (KNN) Model and (e) Decision Tree (DT) Classifier Model
To develop a model An experiment was conducted on a student performance dataset using random forest classifiers. An experiment was conducted on default random forest parameters with a confusion matrix (see Fig. 8(A)). The result from the confusion matrix shows that out of 17,826 instances, 17,622 were correctly classified by the model, and 204 instances were incorrectly classified. Under the Random Forest classifier, out of 3528 samples of students supposed to be warned, 3488 instances were predicted correctly, and 40 samples were incorrectly classified. Under the Random Forest classifier, out of 3514 samples of students supposed to pass, 3404 instances are predicted correctly, and 110 samples are incorrectly classified. Under the Random Forest classifier, out of 3,633 samples of students supposed to be DROPOUT, 3,630 instances are predicted correctly, and 3 samples are only incorrectly classified. Under the Random Forest classifier, out of 3,561 samples of students supposed to be academic dismissal with readmission (ADR), 3539 instances are predicted correctly, and 22 samples are incorrectly classified.
Under the Random Forest classifier, out of 3,590 samples of students supposed to be academic dismissal with readmission (AD), 3561 instances are predicted correctly, and 29 samples are incorrectly classified. Under the Random Forest Classifier Model, training accuracy of 89.5% and testing accuracy of 89% were obtained (see Fig. 8(A)). The results of the random forest classifier revealed that the training accuracy of 89.5% and the testing accuracy of 89% obtained were comparable, thus there is no overfitting in the model. To develop a model, an experiment was conducted on a student performance data set using gradient-boosting classifiers. An experiment was conducted on gradient-boosting default parameters with a confusion matrix (see Fig. 8(B)). The result from the confusion matrix is that out of 17,835 instances, 13798 were correctly classified by the model, and 4037 instances were incorrectly classified. With Gradient Boosting Classifiers, out of 3537 samples of students supposed to be warned, 2455 instances are predicted correctly, and 1082 samples are incorrectly classified. With Gradient Boosting Classifiers, out of 3514 samples of students supposed to pass, 2500 instances are predicted correctly, and 1014 samples are incorrectly classified. With Gradient Boosting Classifiers, out of 3,633 samples of students supposed to be dropped out, 3,558 instances are predicted correctly, and 75 samples are only incorrectly classified. With Gradient Boosting Classifiers, out of 3,561 samples of students supposed to be Academic Dismissal with Readmission (ADR), 2471 instances are predicted correctly, and 1090 samples are incorrectly classified. Gradient Boosting Classifiers: Out of 3,590 samples of students supposed to be Academic Dismissal with Readmission (AD), 2814 instances are predicted correctly, and 776 samples are incorrectly classified. The result of the gradient boosting classifier revealed that the training accuracy of 77.7% and the testing accuracy of 77.3% obtained were comparable, thus there is no overfitting seen in the model. To develop a model, an experiment was conducted on a student performance data set using the SVM classifier model. An experiment was conducted with a confusion matrix (see Fig. 8(C)). The result from the confusion matrix is that out of 17,835 instances, 17,633 were correctly classified by the model, and 202 instances were incorrectly classified. With the SVM Classifier Model, out of 3537 samples of students supposed to be warned, 3480 instances are predicted correctly, and 57 samples are incorrectly classified. With the SVM model, out of 3514 samples of students supposed to pass, 3417 instances are predicted correctly, and 97 samples are incorrectly classified. With the SVM Classifier Model, out of 3,633 samples of students supposed to be dropped out, 3,632 instances are predicted correctly, and 1 sample is only incorrectly classified. With the SVM model, out of 3,561 samples of students supposed to be academic dismissal with readmission (ADR), 3537 instances are predicted correctly, and 24 samples are incorrectly classified. With the SVM Classifier Model, out of 3,590 samples of students supposed to be academic dismissal with readmission (AD), 3567 instances are predicted correctly, and 23 samples are incorrectly classified. With the SVM classifier model, an accuracy of 98.7% was obtained (see Fig. 8(C)). Under the support vector machine classifier model, an accuracy of 98.1% was achieved.
The results of the SVM classifier revealed that the training accuracy of 99.7% and the testing accuracy of 98.1% obtained were comparable. Thus, there is no overfitting in the model. To develop a model, an experiment was conducted on a student performance data set using the KNN classifier model. An experiment was conducted with a confusion matrix (see Fig. 8(D)). The result from the confusion matrix is that out of 17,835 instances, 15,270 were correctly classified by the model, and 2565 instances were incorrectly classified. With the voting classifier, out of 3,537 samples of students supposed to be warned, 3396 instances were predicted correctly, and 141 samples were incorrectly classified. With the KNN classifier, out of 3,514 samples of students supposed to be passing, 1336 instances are predicted correctly, and 2178 samples are incorrectly classified. With the KNN classifier, out of 3,633 samples of students supposed to be dropped out, 3561 instances are predicted correctly, and 72 samples are only incorrectly classified. With the KNN classifier, out of 3,561 samples of students supposed to be academic dismissal with readmission (ADR), 3456 instances are predicted correctly, and 105 samples are incorrectly classified. With the KNN classifier, out of 3,590 samples of students supposed to be academic dismissal with readmission (AD), 3521 instances are predicted correctly, and 69 samples are incorrectly classified. With the KNN classifier model, an accuracy of 85.6% was achieved (see Fig. 8(E)). The results of the KNN classifier revealed that the training accuracy of 86% and the testing accuracy of 85.6% obtained were comparable, thus there is no overfitting seen in the model. To develop a model, an experiment was conducted on the student performance data set using the Decision Tree Classifier Model. An experiment was conducted with a confusion matrix (see Fig. 8(E)). The result from the confusion matrix is that out of 17,835 instances, 17,633 were correctly classified by the model, and 202 instances were incorrectly classified. With the Decision Tree Classifier Model, out of 3537 samples of students supposed to be warned, 3480 instances are predicted correctly, and 57 samples are incorrectly classified. With the Decision Tree Classifier Model, out of 3514 samples of students supposed to pass, 3417 instances are predicted correctly, and 97 samples are incorrectly classified. With the Decision Tree Classifier Model, out of 3,633 samples of students supposed to be DROPOUT, 3,632 instances are predicted correctly, and 1 sample is only incorrectly classified. With the Decision Tree Classifier Model, out of 3,561 samples of students supposed to be Academic Dismissal with Readmission (ADR), 3537 instances are predicted correctly, and 24 samples are incorrectly classified. With the Decision Tree Classifier Model, out of 3,590 samples of students supposed to be academic dismissals with readmission (AD), 3567 instances are predicted correctly, and 23 samples are incorrectly classified. With the decision tree classifier model, an accuracy of 98.7% was obtained, as illustrated in Fig. 7 and Table 4(E).
Table 4 Classification Report of (A) Random Forest(RF),(B) Support Vector Machine Model(SVM),(C) Gradient Boosting(GB) Classifiers Model,(D) Decision Tree(DT) Classifier Model and (E) K-Nearest Neighbor Classifier(KNN) Model
(A)
Random Forest Model
|
|
Precision
|
Recall
|
F1-score
|
support
|
0
|
0.86
|
0.90
|
0.88
|
3590
|
1
|
0.86
|
0.90
|
0.89
|
3561
|
2
|
0.94
|
1.00
|
0.97
|
3633
|
3
|
0.91
|
0.80
|
0.85
|
3514
|
4
|
0.85
|
0.84
|
0.85
|
3537
|
Accuracy
|
--------
|
-------
|
0.89
|
17835
|
Macro avg
|
0.89
|
0.89
|
0.89
|
17835
|
Weighted avg
|
0.89
|
0.89
|
0.89
|
17835
|
|
(B)
Support Vector Machine Model
|
|
Precision
|
recall
|
F1-score
|
support
|
|
0.99
|
0.99
|
0.99
|
3590
|
|
0.99
|
0.99
|
0.99
|
3561
|
|
1.00
|
1.00
|
1.00
|
3633
|
|
0.97
|
0.97
|
0.97
|
3514
|
|
0.98
|
0.99
|
0.98
|
3537
|
|
|
|
|
|
|
--------
|
-------
|
0.99
|
17835
|
|
0.99
|
0.99
|
0.99
|
17835
|
|
0.99
|
0.99
|
0.99
|
17835
|
|
(C)
Gradient Boosting Classifiers Model
|
Precision
|
Recall
|
F1-score
|
support
|
0.73
|
0.78
|
0.76
|
3590
|
0.72
|
0.69
|
0.71
|
3561
|
0.90
|
0.98
|
0.94
|
3633
|
0.77
|
0.71
|
0.74
|
3514
|
0.73
|
0.69
|
0.71
|
3537
|
--------
|
-------
|
0.77
|
17835
|
0.77
|
0.77
|
0.77
|
17835
|
0.77
|
0.77
|
0.77
|
17835
|
|
(D)
Decision Tree Classifier Model
|
|
Precision
|
recall
|
F1-score
|
support
|
0
|
0.99
|
0.99
|
0.99
|
3590
|
1
|
0.98
|
0.99
|
0.99
|
3561
|
2
|
1.00
|
1.00
|
1.00
|
3633
|
3
|
0.97
|
0.94
|
0.96
|
3514
|
4
|
0.97
|
0.98
|
0.98
|
3537
|
Accuracy
|
--------
|
-------
|
0.98
|
17835
|
Macro avg
|
0.98
|
0.98
|
0.98
|
17835
|
Weighted avg
|
0.98
|
0.98
|
0.98
|
17835
|
|
(E)
KNN Classifier Model
|
|
Precision
|
Recall
|
F1-score
|
support
|
|
0.86
|
0.98
|
0.92
|
3590
|
|
0.84
|
0.97
|
0.90
|
3561
|
|
0.88
|
0.98
|
0.93
|
3633
|
|
0.97
|
0.38
|
0.55
|
3514
|
|
0.81
|
0.96
|
0.88
|
3537
|
|
--------
|
-------
|
0.86
|
17835
|
|
0.87
|
0.85
|
0.83
|
17835
|
|
0.87
|
0.86
|
0.83
|
17835
|
|
|
Hyper Parameter Tuning
Selecting the best hyperparameters could greatly affect the performance of the model [46, 47]. A given algorithm will have several optimisation techniques, each of which has its own drawbacks and pitfalls. In this study, experiments will be conducted on several optimisation methodologies to choose the best hyperparameters. These chosen hyperparameters will be employed in K-Nearest Neighbours, Support Vector Machines, Gradient Boosting, Decision Trees, and Random Forest models).Basically, there are two types of parameters. These are the default parameters of the model and hyperparameters. The learning process will be regulated by the parameters. Different learning rates or weights will be utilised to govern the learning process and uncover hidden knowledge in the data under the same machine learning model. In this study, we employed grid search-based hyperparameter tuning techniques. The reason we select grid search-based hyperparameter tuning over random search-based hyperparameter tuning is because grid search-based hyperparameter tuning techniques go through every possible combination of hyperparameters [48–50]. Thus, it will boost the testing accuracy of the model. In this study, we will employ the grid search-based hyperparameter tuning technique. The reason we select grid search-based hyperparameter tuning over random search-based hyperparameter tuning is because grid search-based hyperparameter tuning techniques go through every possible combination of hyperparameters. It will boost the testing accuracy of the model.
Performance Comparison of Developed Models after Hyper parameter Tuning
Performance comparison on five models after hyper tuning under Machine learning classier are performed and ranked based on the accuracy score. These Machines learning classier were namely: (i.e., SVM, RF, KNN, GBOOST and DECISION TREE) were applied to predict students’ performance (see Fig. 9).