Student’s Academic Performance Prediction – A Review

—Providing quality education to students is the main objective of higher education institutions. The need of identifying students with weak performances has been a rising problem and most teachers have relied on calculating the average of exam grades. The main objective of our project is to predict and identify the students who might fail in semester examinations. This would prove helpful for teachers in providing additional assistance to such students. This review is conducted to find out how different researchers have approached this problem, what the outcomes of their study are and how it can help us in improving the performance of students. This review shows that machine learning, clustering proves useful in predictions, but there is a lot more work to be done using this technology.


A. Backgorund Information
Prediction by analogy is so pervasive, that we normally don't notice it. Higher education institutions always aim to provide the best education and tutoring to their students. The alarming increase in drop-out rates in many institutes has gone unnoticed, and therefore there has been a need to identify students with weak performances in courses. This prediction technique would prove advantageous for teachers in providing additional assistance to weak students and also promoting the dedicated ones. The data of students is collected and utilized to meet the needs of students. Other approaches fail to notice the students' performance pattern over the course of passing semesters.
Machine learning algorithms have proven to be a helpful tool in predicting students' performance based on various factors for foreseeing poor performances over the course of their semesters. The at-risk students can be detected using their demographic data. Applying data mining algorithms on datasets could prove beneficial to all participants in educational institutes. It has been proved that most accurate machine learning algorithm for such prediction is Naïve Bayes classifier. The variables used for prediction included academic achievements from high school, entrance exam and attitude toward studying, including marks obtained in assignments. Social and school related features were also noted. Incremental approach in machine learning is important for real world predictions for reasons; the learning approaches must do some changes on the trained system so that unlearned knowledge can be proved useful. It is proven for the fact that student academic achievements are highly influenced by past grades and scores, but there are also other relevant variables contributing for accurate predictions.

B. Problem Statement
To predict and identify the students' academic performance to guide them for better results and provide quality education.

C. Problem Background
The main objective of higher education institutions is to provide quality education to its students. A good prediction of the students' performance is helpful to identify the low performance students at the beginning. The intention behind this research is the identification and extraction of knowledge for foreseeing poor and good performances.

A. Machine Learning: Classification, Prediction and Clustering
Universities have their own management system for students and their grades. By using that data and applying data science techniques on them, we can classify the students on their grades and for achieving this goal decision tree is a good classifier [1].
We can predict improvements in Students' Performance and group them under various categories to recommending the future options. Clustering techniques like K means are fruitful in this categorization of students on the basis of their performance as discussed by [3].
In another study, the authors stated that the difficult part in predicting the students' performance comes when we have features like behavior and attitude of students toward the studies. For which, vast amount of data is required and advanced data mining tools are needed to analyze, predict and visualize the patterns in the data. We can solve Students' behavioral issues by using the data mining techniques and tools like clustering, association rules and decision trees [4].
Academic performance of the students significantly change with changing feature selection algorithm as concluded by Samuel M., Nor Bahiah A. and Siti Mariyam S. (2019). Several classification algorithms are used like K nearest neighbors, Naïve Bayes, decision tree and discriminant analysis with these feature selection algorithms. Differential evolution is proposed as the better selection algorithm that predicts students' performance with better accuracy than the other feature selection algorithms [5]. Hashmia Hamsa, Simi Indiradevi and Jubilant J. Kizhakkethottam also used Decision trees with fuzzy genetic algorithm to identify for the same purpose [25].
According to Ihsan A. Abu Amra and Ashraf Y. A. Maghari (2018), classifying the students and performing predictions on their performance can also help the local ministry of education so that they can easily take desired actions by foreseeing the students' performance. It can also help teachers in evaluating students and work on the areas needed. K nearest neighbor and Naïve Bayes classifiers are the used techniques in this study [6]. KNN and SVM are also used by Huda Al-Shehri, Amani Al-Qarni, Leena Al-Saati, Arwa Batoaq, Haifa Badukhen, Saleh Alrashed, Jamal Alhiyafi and Sunday O. Olatunji to determine the performance of the student [22].
We can also use data mining to understand students learning process by using Learning Analytics. Learning Analytics and EDM are useful to identify the settings in which students learn to improve their performances and learning outcomes as discussed by Cristobal Romero and Sebastian Ventura (2013) [7].
According to a study (Adam A. & Mitch R., 2012), Degree Completion time is a major factor in determining the performance of students. The authors demonstrated the uses of Bayesian Networks using prerequisite graph results in better predictions for student performance and data of the graduated students and their completion time can be trained to the model for predicting the struggling students, their performance and degree completion time [11]. Yu Su, Qingwen Liu, Qi Liu, Zhenya Huang, Yu Yin, Enhong Chen, Chris Ding, Si Wei and Guoping Hu proposed a solution based on Relational Neural Network using Bidirectional LSTM that took both student exercising records and the text of each exercise to predict his performance [19]. Similar to BLSTM, LSTM is also a machine learning technique used by Boran Sekeroglu, Kamil Dimililer and Kubra Tuncal but it is unidirectional unlike BLSTM [27].
We can see that different machine learning algorithms are used for predicting students' performance in the articles discussed above but this is not that easy every time. We do get data that contains imbalance in classes and observation. This can create a significant decline in the accuracy rates. In the research done by S. Tanveer J., Raisul Islam R., Naheena H. and Rashedur M. R. (2015), this issue is addressed. If there is imbalance in the data set we can use synthetic minority over sampling technique. This technique can help in increasing the accuracy and make better predictions by decreasing the number of observations from the majority classes and balancing it with minority class [14].
It becomes easier for us to predict the performance if we have GBs of data to train and then evaluate these models. More data we have, the more accurate results we get. In contrast, we do have situations in which we don't have that huge dataset available. This makes it harder for the models to predict the performance accurately. Taking this issue into consideration, Lubna Mahmoud Abu Zohair (2019) conducted a research for small dataset and tried to find out the possibility of training and modeling it. Different models have been trained and evaluated for finding out the feasibility of more accurate prediction model [15]. Farshid Marbouti, Heidi A. Diefes-Duxand and Krishna Madhavan used early prediction technique to identify the students that are below the average and are at risk [20]. Jesus Silva, Karina Rojas, Alexa Senior Naveda, Rosio Barrios, Carlos Vargas Mercado and Claudia Medina's model is based on assembling multiple classifiers for determining the academic profile of students. In other words, it is called ensembling technique that sums up multiple classification to decide the final outcome [21]. ] Mrinal Pandey and S. Taruna also used ensembling techniques on multiple classifiers to predict the performance of students [26].
In the research done by Haotian Li, Huan Wei, Yong Wang, Yangqiu Song and Huamin Qu, Online Learning is considered for students and performance prediction is done to show the impact of online learning on dropout rates and can be helpful in implementing adaptive online learning [28].
Feature Selection plays a very important role in the final prediction. The work done by Havan Agrawal and Harshil Mavani shows the significance of different features and their impact on the final outcome. The model used in this approach is Neural Network [29].
An unsupervised machine learning algorithm Association Rule Mining is used by Anwar Ali Yahya and Addin Osman which focuses on finding out the patterns in the academics to propose a program design. However, the study is more relevant to academic program design and assessment than the prediction of the grades [23].
J.-P. Vandamme, N. Meskens and J.-F. Superby used discriminant analysis, neural networks, random forests and decision trees to identify the students that are at risk of dropping out in the first year of graduation [31].

B. Statistical Analysis and Matrix Factorization
We can also use Statistical Analysis for making predictions about students' performance. The significant relationship between the scores of different courses can be verified by using Statistical Analysis (Paired t-test) on Students' Examination Score in different subjects as discussed by Mazwin T., Halina H. and Nurul N. (2012) [2].
In another study (Asmaa El. & George K., 2016), the authors took domain knowledge into account. They selected certain features from the data to define the students and courses and then converted them to predictions using matrix factorization and popularity based ranking approaches. This approach is also useful for predicting top n course rankings for a student [12].
According to the study conducted by Agoritsa P. and George K. (2016), sparse linear and low rank matrix factorization are applied to predict the performance. These predictions are specific to students and courses. These methods identify the predictive subsets of preceding course on a course by course basis [13]. Aderibigbe Israel Adekitan and Odunayo Salau used linear and quadratic regression models for predicting the final graduation grade taking results of first 3 years as prior knowledge [18].

C. Visualization
Using visualization tools is also a good approach for finding patterns and cause effect relations in data. A really good approach to do this visualization is mapping the performances of students in prerequisite to the performance in later courses. In a study, the authors conducted 2 surveys at different time of semester to understand the patterns using graphs and plots to find out the course chain performance patterns of students [8].
In another study, the authors found that visualizing the trend and events like course repetition, failure, success etc. are helpful in isolating the causes of dropouts in universities and students leaving the major [9].

D. Presentation
Researchers have done a lot of research on the subject but very few of them thought of presenting a tool that can help students in improving their performance. A grade predictor tool is useful to explore the options and pathways for students. In a study, the authors described the usefulness of the GradeCraft tool and its usability for students in planning their coursework and future courses [10].
Students' Performance Analysis System (SPAs), provided by Chew Li S., Dayang H., Emmy D. and Mohammad H. (2014), is also a presentation of the analysis done by them that can assist the instructor in conducting the performance analysis of students [1]. SPRAR is also a tool that uses rule based association mining for predictions [24]. While Gritnet proposed by Byung-Hak Kim, Ethan Vizitei and Varun Ganapathi is based on Deep Learning [16].
Mack Sweeney, Huzefa Rangwala, Jaime Lester and Aditya Johri suggested a recommendation that can predict the results of next semester by taking grades of previous semester as input [30]. Another system is proposed by Mudasir Ashraf, Majid Zaman and Muheet Ahmed for predictions based on ensembling and filtering techniques [17].

III. RESEARCH METHODS
In this section, we will discuss the approach we have taken to conduct this research and then later we will see how different researchers approach the same problem.

A. Literature Review: Methodology
To conduct a systematic literature review, a plan is a must. The first part of the review is formulating problem that is already discussed in Section 1. The second part is to search for and collect the research papers to review. Third part is about reviewing the papers and the last part is analysis and classification of papers.
We have collected existing literature relevant to our subject from many different sources out of which 3 most used sources are mentioned:

1) IEEE Xplore 2) ACM Digital Library 3) SpringerLink
The articles are then divided into different categories according to the phases and techniques of data science lifecycle and then they are reviewed to get more insights about the tools and techniques used in the research process to predict the performance of students.

B. Methods and Techniques used by Researchers
The papers discussed have various techniques used to predict, classify, visualize and present the student's performance in different settings. The data is collected from the university database in most of the studies. Table I shows the techniques used in each study.
In the above studies, the most used technique is Data Mining. We can see how EDM works for predicting the performance of students in Fig. 1, given by Cristobal R. and Sebastian V. (2012) [7]. For EDM, we generally formulate the hypothesis first, raw data is then collected from Educational institutions which is transformed into the required form. Data is then analyzed and converted to the model acceptable form. The model is then trained with around 60%-80% of data. The remaining data is saved for testing and validation of the model. The model generated results and then evaluated and refined. It is an iterative process. Decision trees classifier is the graph based model that classifies the object on the basis of decisions taken considering the chances, cost, event outcomes and other conditions. It has also been used as frequently as Naïve Bayes [ Another technique used in the papers is clustering. Clustering is one of the most used methods in data science. In this approach we classify the points (students in this case) into groups called clusters. Clusters are the set of data points that have similar patterns. The students are placed in the cluster which has the most attributes and behaviors in common or close to the cluster [3][4] [7]. Association Rule Mining rules are also useful when finding hidden patterns in the grades data and find out the relationships between them [23] [24].
For classifications and clustering, an important factor is train-test split. When we collect we use more than half of the data for training the model and use the remaining data to validate the accuracy of the model. There are many ways to split the data for training and testing. In 70-30 (hold out), we take 70% of the observations as training data and the remaining 30% to test the model. In K Fold cross validation, we divide the data into K splits, use one set as the testing data and the remaining as training data. The validation is repeated for each fold as testing data as shown in Fig 2. Leave one out cross validation is the K Fold cross validation where K is equal to the sample size. That means each item is tested over the whole data.

Figure 2: K Fold Cross Validation
Graph based analysis helps in finding out the cause-effect of the situations which is used by a few researchers in their study [8][9] [11]. In the study, regression is used to analyze the performance. The research has approached the problem from two different aspects. One is the course specific regression and other is student specific regression. The study done for courses is specific to the courses only and the other side explores the course-student tuple [13]. In another study linear regression is used for early predictions that are built upon the bidirectional long short term memory (BLSTM) [16]. Quadratic Regression model used by Aderibigbe Israel Adekitan and Odunayo Salau achieved 89.15% accuracy with R 2 value between 0.955 and 0.957 [18].
For analyzing the data for student's academic performance prediction, different authors have used different techniques. Classifiers and machine learning algorithms are favorites among many researchers. Accuracy score is a very widely used analysis procedure in data mining and data science. In this approach, we compare the results from our model with the actual results and the difference between them is shown in percentage that is called the accuracy of the model. In Fig. 4, the best accuracy from the relevant studies is plotted in the graph to see how accurate the results are. According to this table, Naïve Bayes has the best accuracy of 94%.
KNN is also a good classifier which finds out the nearest neighbors of the data points and categorize them on the basis of these neighbors [5] Selection of the features to train the model significantly affects the outcomes of the model. In the study conducted by Samuel M., Nor Bahiah A. and Siti Mariyam S. (2019) [5], it has been shown that selection of features and using different algorithms can also affect the accuracy and performance of the classifier. In Table II, the accuracy of different feature selection algorithms against the classifiers is shown from the study. Differential evolution is the best feature selection algorithm and KNN is the best classifier according to this table.
One more interesting analysis done by Lubna M. (2019) is related to small datasets. This study is more like a comparative analysis for predicting the performance of students where data is not that large. In Table III, the comparison of these classifiers is shown. SVM performed the best in terms of accuracy in the case of small dataset [15].
In the study conducted by Asmaa El. and George K. (2016), matrix factorization has improved the results and student groups defined using the majors and academic level gave the best top-n ranking. The detailed results are discussed in their article [12].
Analyzing the performance of the classifier or regressor is important for researchers to draw conclusions. We must know different metrics of the classification done to decide whether it is a good classifier or not. Confusion matrix is a table that gives us those metrics. In which we have true positive, false positive, true negative and false negative. True positives are the observations that are true and were predicted corrected. True negatives are the negative values predicted accurately. While false positives and negatives are values that are not predicted accurately.
From the matrix in Table IV, we can calculate the following metrics for measuring the performance of the classifiers.
Accuracy is the rate of correctly predicted classes. This review has shown that there are several points in the data science life cycle where selection of techniques and approach can significantly change the outcomes of the study. We need to do a lot of effort for educational data mining because it has to go through a big roller coaster ride to predict the performance of students. It starts from collecting data, cleaning it and making it ready to process. It goes through feature selection algorithms and exploratory analysis. It bears the grind of classifiers, clustering algorithms, machine learning techniques for training. Different type of cross validations are needed to test the models and finally the outcomes and results can be shown. In some rare cases, these results are converted into tools.
Data collection has a great impact on the final results of the study. If we have a good data set, or in others words, a data more relevant to our study then we have better insights and better predictions. As we have already seen, how we can end up having less accuracy because of problems like incorrect data, inconsistent data and imbalanced data. it is necessary to clean the data and make it clear enough for the model. For feature selection we can use several methods like Singular value decomposition, Cholesky decomposition and Principal component analysis. These methods identify the features that are more relevant to the study and have more impact on the final results.
Ensembling and Filtering Techniques are also good techniques for Educational Data Mining where we use multiple models and sum up their predictions for final model [17] [26].

FUTURE WORK
The future work includes possible transformations of dataset, obtaining more information about students' performance and extra activities etc. in order to achieve more accurate results. This will lead to extracting more meaningful information from the available data and providing reliable predictions. One more important thing is the implementation of the tools for real world users. There are very few tools that are available for the general public. Dynamic data loading can also increase the accuracy and results of the models because we have new data coming dynamically which have new observations and better insights. We can also extend our work by adding techniques that merge the results from different classifiers. Ensemble classifiers like Adaboost (adaptive boosting) and Random forest are examples where we take predictions from different machine learning models and find out the best prediction out of them. This is a very good subject for researchers and many new things can be discovered on this topic.