Student Performance Prediction Using Machine Learning Techniques

DOI: https://doi.org/10.21203/rs.3.rs-1455610/v1

Abstract

With the emergence of the covid 19 pandemic, E-learning usage was the only way to solve the problem of study interruption in educational institutions and universities. Therefore, this field has garnered significant attention in recent times. In this paper, we used ten machine-learning algorithms (Logistic Regression, Decision Tree, Random Forest, SGD Classifier, Multinomial NB, K-Neighbors Classifier, Ridge Classifier, Nearest Centroid, Complement NB and Bernoulli NB) to build a prediction system based on artificial intelligence techniques to predict the difficulties students face in using the e-learning management system, and support related decision-making. Which, in turn, contributes to supporting the sustainable development of technology at the university. From the results obtained, we found the important factors that affect the use of E-learning to solve students' learning difficulties by using LMS.

Introduction

A long time ago, researchers have used applications of Artificial Intelligence (AI) to solve many problems of learning environments and teaching. AI, some defined it as machine intelligence, is one of the famous branches of computer science that aims to imbue software with the ability to analyze its environment using either predetermined rules and search algorithms, or pattern recognizing machine learning models, and then make decisions based on those analyses (Amrieh, Elaf Abu and Hamtini, Thair and Aljarah, 2016).

E learning and its systems (learning management systems LMS), have become standard tools in universities, and the fast development of information and communications technologies (ICTs) and their application to education has support this possible. Despite the great benefits of using LMS, there are many problems encountered in achieving learning goals, the most important of which is increasing student performance rates. Predicting the performance of students and classifying the data is eminent in the education sector, by using machine learning (ML) algorithms to understand learners’ learning is popular in the educational community (Ko & Leu, 2020).

Educational Data Mining (EDM) is the implementation of data mining methods, such as (ML) algorithms for analyzing available data at educational institutions to reach results, which support educational decision-making (Ghorbani & Ghousi, 2020).

In this paper, we used educational data from educational institutions of Saudi Aribia (KFU) to provide datasets for prediction. The list of data resources for our research as follows, the dataset of students who studied full Online and used Blackboard (LSM) to learn. Then we compared the performance of ten machine-learning algorithms, which are Logistic Regression, Decision Tree, Random Forest, SGD Classifier, Multinomial NB, K-Neighbors Classifier, Ridge Classifier, Nearest Centroid, Complement NB and Bernoulli NB, to reach the best technique for prediction, and then we extracted the most important features affecting students' performance.

Related Work

In This paper, we used machine-learning algorithms to predict Students’ learning difficulties using LMS, while many researches used it to predict the performance of the students. (Jayaprakash et al., 2020), focused on identifying the students at risk, built a model used Random Forest, Naïve Bayes and other ensemble methods, which classified the attributes, to use as a temporary or early warning mechanism to improve student performance. They concluded that factors such as gender, family size, parental status, maternal and paternal education, maternal and paternal function, are some of the influencing factors that can negatively affect student achievement. (Kime et al., 2019), used Interactive machine learning to classify students 'skills in calculus. They reached to use machine techniques by classifying skills and predicting the skills which added by expert teacher to improve the performance of his students.

(Bajpai et al., 2019), compared several machine learning algorithms to choose the most appropriate and efficient algorithm to predict researchers' academic performance. They reached to there are some algorithms were easy to implement and understand, some were cumbersome, and some may take huge computation time. (Hussain et al., 2018), used some of the machine learning algorithms to predict the most beneficial e-learning sessions for students. They found that the RF and DL algorithms are suitable for predicting beneficial sessions during e learning, and they reached from the prediction results to factors that affect the effectiveness of sessions such as family commitment, study environment and teaching style.

(Athani et al., 2017), proposed a prediction system using SVM algorithm. The results showed that SVM algorithm used to predict the performance of students and provide the departments of the institution, information about the status of students, and thus provide students with appropriate additional educational tasks that help them improve their academic performance. (Sorour et al., 2014), used SVM algorithm and ANN algorithm to predict students' scores according to their comments, and the results showed that prediction accuracy when using SVM was higher than prediction accuracy when using ANN.

Machine Learning

There are many Data Mining classification algorithms(classifier), however, our choices in this paper are these 10 well-known ones because of their critical features, which we briefly describe in the next paragraphs:

1-Logistics Regression (LR)

LR is a supervised algorithm that involve more dependent features, Logistic Regression is a linear model which is used to approximate the relationship between variables by using a logistic function(Peng et al., 2002).

2-K-Nearest Neighbor (KNN)

KNN select the nearest K neighbors of x0 and uses a plurality vote to decide the class mark of x0, Euclidean distances is applied as a distance metric, without prior knowledge and classes importance are equal (Song et al., 2007).

3-Decision Tree (Dt)

DT based on the model of regression and classification. The dataset is split into smaller subsets.With the highest degree of accuracy, these smaller data sets can make prediction. Decision tree method includes C4.5, CART, conditional tree, and C5.0 (Sharma & Kumar, 2016).

4-Naive Bayes Algorithm (Nb)

NB is used to compute the probability using Bayesian theory, It provides simplest implementation and little training time with highest accuracy while computing the probabilities of noisy data. NB method includes Multinomial NB, Bernoulli NB, and Complement NB (Wu et al., 2015).

5-Random Forest (Rf)

There are many decision trees in the random forest algorithm, each tree is built by samples with randomly sampled features. The decision is carried out by applying the majority voting on the decision trees, which is a successful method for merge unsteady learners with random tree selection variable (Breiman, 2001).

6-Stochastic Gradient Descent (SGD)

SGD is an iterative technique for optimizing an objective function with appropriate (e.g. differentiable or sub-differentiable) smoothness properties, the method uses shuffled samples to evaluate the gradients, hence it can be considered as a stochastic approximation of gradient descent optimization (Sathyadevan & Chaitra, 2015).

7-Ridge Classifier

This is a linear classifier that trained with the ridge regression: the class labels (K) are encoded in a one-of-K scheme and the scores are transformed with the winner-take-all method to estimated class labels (Peng et al., 2002).

8-Nearest Centroid

A classification model that assigns the output label of the class of the training samples to observations whose centroid (mean) is nearest to the observation (Kiranmayee et al., 2012).

Dataset Description

The dataset contains 500 records with 10 features for each student. Table 1 shows the feature name, values, and description


Table 1. Dataset features description.

 

Feature

Values

Description

1

Gender

M/F

student's gender

2

Level

L-01, L-02, L-03, L-04

Level student belongs

3

Section 

A,B,C

Section student belongs

4

Raised Hands

integer(0 : 100)

The number of times the student requested to speak in the lecture

5

Visited_Resources

integer(0 : 100)

The number of times a student access the contents of a course

6

Announcements_View

integer(0 : 100)

The number of times the student inspections the new announcements

7

Discussion

integer(0 : 100)

The number of times the student is participating in discussion groups

8

Acadimc_Adivsor_Satisfaction

good, bad

the Degree of Acadimc Adivsor satisfaction from student performance

9

Absence_Days

above-7, under-7

The number of days the student is absent

10

Topic

English, Programmimg, Calculus, Arabic, IT, Math, Chemistry, Biology, Physics, History, Quran, Geology

course topic


The target feature is the student level (class), Table 2 shows the level distribution over the level High (H), Medium (M), and Low (L)


Table 2. Distribution of students to levels

 

 

Frequency

Percent

Hight

152

30.4

Low

127

25.4

Midum

221

44.2

Total

500

100.0

Methodology

We worked on methods to enhance the accuracy of ML classification methods for the student performance. The performance of the classifiers has been tested on all attributes and selected features separately to compare the achieved accuracy. We identify the important features and improve the efficiency of the classification process. Figure 1 shows framework for predicting the HCV disease.

The classification technique is applied to predict the performance of students as a tool commonly used in prediction, The classifiers used in this paper are based on the algorithms commonly used in the literature.

Data Transformation

Data transformation is a critical step for eliminating inconsistencies in the dataset, this makes it more appropriate for data mining (Osborne, 2003).

Convert string to numeric variables: Most Data Mining algorithms work only on the numeric variable. Therefore, non-numerical data must be converted into numerical variables, the most common methods are encode string using a value between [0 and (N-1)] where N is the number of values. For example, the gender feature (F/M) is encoded to 0 and 1.

Data Partitioning

The dataset is to divide the data set into two sections of training data and test data. Test data represent the greater portion of the dataset and are used for classifier testing. The test data are used to assess the output of the classifier (Han et al., 2012), in our experiments we use 80% for training and 20% for testing.

Performance Evaluation

In our experiments the performance of the classification algorithms is determined, four standard evaluation metrics are used to evaluate the performance namely : accuracy, recall, precision, and f-score, and they are defined as follows:

where FN, FP ,TP,TN, and refer to False Negative respectively, False Positive, True Positive, and True Negative, and (Alyahyan & Düştegör, 2020).

Test Classifiers

To determine the best classifier in predicting the student performance in the dataset, we examine 10 Machine learning classifier namely: Decision Tree, Logistic Regression, Random Forest, SGD Classifier, K-Neighbors Classifier, Ridge Classifier, Nearest Centroid, Complement NB, and BernoulliNB, the result is shown in Table 3.

Table 3

Classifiers performance comparisons

 

Classifier

Accuracy

F1_score

Recall

Precision

1

Logistic Regression

0.719

0.716

0.719

0.726

2

Decision Tree

0.781

0.781

0.781

0.786

3

Random Forest

0.844

0.842

0.844

0.842

4

SGD Classifier

0.573

0.478

0.573

0.492

5

Multinomial NB

0.583

0.580

0.583

0.591

6

K-Neighbors Classifier

0.646

0.642

0.646

0.656

7

Ridge Classifier

0.677

0.667

0.677

0.686

8

Nearest Centroid

0.677

0.660

0.677

0.747

9

Complement NB

0.438

0.334

0.438

0.709

10

BernoulliNB

0.667

0.666

0.667

0.668

From the previous table, Random Forest gives the best performance with accuracy, F1_score, Recall, and Precision equal to 0.844, 0.842, 0.844, and 0.842, respectively.

Parameter Tuning

Parameter tuning is important because default values cannot be suitable for all tasks and do not produce the best results(Smit & Eiben, 2009), so we will apply parameter tuning in the classifier with the best performance from the previous experiment( i.e. Random Forest). Random Forest Hyperparameters we’ll be looking at:

We used the grid search technique which tests a collection of hyperparameters to determine the best values for a given task based on the accuracy of the validation. This method is computationally complex than simply using the model's default parameter values. Table 4 and Fig. 1 shows the effect of using hyperparameter tuning on the prediction performance, it is clear that the hyperparameter tuning improves the classifier performance.

Table 4

Performance comparison after tuning the parameters

Measure

Default parameters

Tuning Parameters

accuracy

0.844

0.864

f1_score

0.842

0.862

recall

0.844

0.864

precision

0.842

0.863

It is clear that the hyperparameter tuning improves the prediction performance.

And the best hyperparameters values were as follows:

N estimators

Min samples split

Min samples leaf

Max depth

Random state

300

2

1

None

0

Feature Importance

As we mentioned earlier, the dataset contains 16 features, but are all features equally effective on the prediction process? To find out the importance of the features we compute the feature importance for each feature using the Random Forest Classifier, Table 5 shows the score of the top four features(score > 0.5) and Fig. 2 shows the weight of each feature in the dataset ordered descending.

Table 5

Top 4 Feature scores

Feature

Score

Visited Resources

0.28234

Student Absence Days

0.21433

Raised hand

0.17795

Viewing announcements

0.11615

From Fig. 2 we can see that visited resources (How many times a student visits the contents of a course) is the most feature affecting the student performance, followed by student absence days (Total number of days the student is absent), the third feature in terms of importance is raised hands(The number of times the student requested to speak in the lecture), the fourth feature is announcements view(The number of times the student inspections the new announcements). The features that affect students ’performance are related to the students’ activities, their follow-up to academic content, and the regularity in attendance and follow-up.

Conclusion

Student performance prediction can help the educational institution to take timely actions, like planning for appropriate training to increase the success rate of students. Analyzing educational data can help in realize the desired educational goals. By applying the techniques of data mining, prediction models can be built to enhance student performance. In this paper, we collect a dataset represents a sample of student, to study the possibility of predicting student performance. We applied data mining techniques to the dataset and tested 10 classification algorithms and gave the random forest the best results with an accuracy ratio of 0.844. We applied the parameter tuning technique to obtain the best parameter values ​​that gave the best results. Indeed, the accuracy improved and became 0.864. Then we extracted the most important features affecting students' performance, and they were visited resources, absence days, raised hands, and announcements view.

Through the predictive model that has been built, there is an expectation of the performance and success of any student before taking the test and knowing whether the student’s performance during the academic semester will ultimately lead to his success and thus an attempt to amend any defect in the student’s performance before he fails or improve his academic level.

Declarations

Funding:  Not applicable. 

Availability of data and materialhttps://github.com/tarekhemdan/Student_DataSet/blob/main/Student_dataset.csv    

Disclosure of potential Conflict of Interest: The authors declare that they have no conflict of interest.

Ethical Statement: “All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.”

Consent Statement: “Informed consent was obtained from all individual participants included in the study.”

References

Alyahyan, E., & Düştegör, D. (2020). Predicting academic success in higher education: literature review and best practices. In International Journal of Educational Technology in Higher Education (Vol. 17, Issue 1). https://doi.org/10.1186/s41239-020-0177-7

Amrieh, Elaf Abu and Hamtini, Thair and Aljarah, I. (2016). Mining educational data to predict student’s academic performance using ensemble methods. International Journal of Database Theory and Application, 9(8), 119–136.

Athani, S. S., Kodli, S. A., Banavasi, M. N., & Hiremath, P. G. S. (2017). Student performance predictor using multiclass support vector classification algorithm. 2017 International Conference on Signal Processing and Communication (ICSPC), 341–346.

Bajpai, P., Chaturvedi, R., & Singh, A. (2019). Conjecture of Scholars Academic Performance using Machine Learning Techniques. 2019 International Conference on Cutting-Edge Technologies in Engineering (ICon-CuTE), 141–146.

Breiman, L. (2001). Random Forests. Mach. Learn., 45, 5–32.

Ghorbani, R., & Ghousi, R. (2020). Comparing Different Resampling Methods in Predicting Students Performance Using Machine Learning Techniques. IEEE Access, 8, 67899–67911.

Han, J., Kamber, M., & Pei, J. (2012). 3 - Data Preprocessing. In J. Han, M. Kamber, & J. Pei (Eds.), Data Mining (Third Edition) (Third Edit, pp. 83–124). Morgan Kaufmann. https://doi.org/https://doi.org/10.1016/B978-0-12-381479-1.00003-4

Hussain, M., Zhu, W., Zhang, W., Ni, J., Khan, Z. U., & Hussain, S. (2018). Identifying beneficial sessions in an e-learning system using machine learning techniques. 2018 IEEE Conference on Big Data and Analytics (ICBDA), 123–128.

Jayaprakash, S., Krishnan, S., & Jaiganesh, V. (2020). Predicting Students Academic Performance using an Improved Random Forest Classifier. 2020 International Conference on Emerging Smart Computing and Informatics (ESCI), 238–243.

Kime, K., Hickey, T., & Torrey, R. (2019). Refining Skill Classification with Interactive Machine Learning. 2019 IEEE Frontiers in Education Conference (FIE), 1–8.

Kiranmayee, A. H., Panchariya, P. C., Prasad, P. B., & Sharma, A. L. (2012). Biomimetic classification of juices. 2012 Sixth International Conference on Sensing Technology (ICST), 551–556.

Ko, C.-Y., & Leu, F.-Y. (2020). Examining Successful Attributes for Undergraduate Students by Applying Machine Learning Techniques. IEEE Transactions on Education.

Osborne, J. W. (2003). Notes on the use of data transformations. Practical Assessment, Research and Evaluation, 8(6).

Peng, C.-Y. J., Lee, K. L., & Ingersoll, G. M. (2002). An introduction to logistic regression analysis and reporting. The Journal of Educational Research, 96(1), 3–14.

Sathyadevan, S., & Chaitra, M. A. (2015). Airfoil self noise prediction using linear regression approach. In Computational Intelligence in Data Mining-Volume 2 (pp. 551–561). Springer.

Sharma, H., & Kumar, S. (2016). A survey on decision tree algorithms of classification in data mining. International Journal of Science and Research (IJSR), 5(4), 2094–2097.

Smit, S. K., & Eiben, A. E. (2009). Comparing parameter tuning methods for evolutionary algorithms. 2009 IEEE Congress on Evolutionary Computation, 399–406.

Song, Y., Huang, J., Zhou, D., Zha, H., & Giles, C. L. (2007). Iknn: Informative k-nearest neighbor pattern classification. European Conference on Principles of Data Mining and Knowledge Discovery, 248–264.

Sorour, S. E., Mine, T., Godaz, K., & Hirokawax, S. (2014). Comments data mining for evaluating student’s performance. 2014 IIAI 3rd International Conference on Advanced Applied Informatics, 25–30.

Wu, W., Nagarajan, S., & Chen, Z. (2015). Bayesian Machine Learning: EEG$\$/MEG signal processing measurements. IEEE Signal Processing Magazine, 33(1), 14–36.