The methodology that was utilized to achieve the prediction task in this study is outlined in this section. The utilization of the dataset, the application of preprocessing methods, and the implementation of artificial intelligence techniques, specifically machine learning and deep learning, are the three primary steps that comprise this methodology. Figure 1 presents the step-by-step of the AI conducted in this study.
3.1. Dataset Description and Merging Process
We utilized a dataset that was constructed by merging three publicly available datasets related to university student performance:
Dataset 1: Student Performance Prediction Dataset
Dataset 2: Students' Adaptability Level in Online Education
Dataset 3: xAPI-Edu-Data
Additionally, we obtained a dataset from the Kaggle website that specifically focuses on the academic accomplishments of college students in the Middle East. This dataset consists of 34 features encompassing a range of demographic, academic, and behavioral attributes.
The merging process involved organizing the content of each dataset into a single Excel file, ensuring that each dataset retained its distinct features and instances. Given that each dataset varied in the number of features and instances, we adopted a systematic approach to integrate them cohesively.
Firstly, Dataset 1, comprising 15 features, was placed in the initial columns of the Excel file. Following this, Dataset 2 was appended, starting from a new set of columns to avoid overlapping with the first dataset’s columns. Finally, Dataset 3 was added, ensuring that its features and instances followed sequentially from where Dataset 2 ended.
This method of merging resulted in a comprehensive dataset but introduced numerous missing values due to the differing feature sets across the original datasets. To address these missing values and maintain consistency, we employed specific imputation techniques. For numerical features, the missing values were filled using the mean of the similar values within the respective columns. This approach ensured that the numerical data remained statistically coherent. For categorical features, we used the mode of the respective columns, thereby preserving the categorical distributions within the dataset.
By applying these preprocessing steps, we created a unified and robust dataset that facilitated a consistent and reliable foundation for our AI algorithms to predict and analyze student performance effectively. This meticulous merging and preprocessing process ensured that the final dataset was comprehensive, with minimized biases and inaccuracies, ultimately enhancing the reliability of our study’s outcomes.
The primary attributes include gender (male or female), age, and institution type (government or non-government). IT Student (whether the student is currently enrolled in an IT program: Yes/No), etc... Figure 2 depicts the Dataset that specifically emphasizes the academic achievements of college students.
The dataset also contains information on the duration of the class attended, whether the student uses the Self Learning Management System (LMS) for self-study, the number of resources visited by the student, the number, the number of discussion posts made, the number of days the student was absent, the student's class level, and the student's final grade, etc.
Additionally, it includes the Adaptivity Level, which can be classified as Low, Moderate, or High. It also incorporates the Emotional State, which can be categorized as either Happy or Sad, as well as the Focus Status, which can be described as either Focused or Unfocused.
To facilitate predictive modeling, the features in this paper have been classified into five distinct class labels: Performance Level (Low, Medium, High), Final Grade (Pass, Fail), Adaptivity Level (Low, Medium, High), Emotional State (Happy, Sad), and Focus Status (Focused, Unfocused). The analysis and results sections of this paper will refer to these class labels as Class One, Class Two, Class Three, Class Four, and Class Five, respectively. This extensive dataset allows for the utilization of sophisticated machine learning and deep learning algorithms to forecast and improve different facets of student performance.
3.2. Preprocessing methods
Once the dataset was gathered, we utilized various preprocessing techniques, employing advanced artificial intelligence methods, to ensure its suitability for the prediction process. The following subsections give details of these steps.
These methods are crucial for enhancing the quality and precision of the predictions. The preprocessing stages encompass the following:
Performing Missing Values Check: This stage entails detecting any instances of missing data in the dataset. Missing values can have a substantial impact on the performance of machine learning models; therefore, they were appropriately addressed through imputation or removal.
To transform categorical features into numerical values, we employed the LabelEncoder technique. This approach involves replacing each distinct value in the categorical features with a corresponding numerical value, starting from 0. This transformation is essential because most machine learning algorithms necessitate numerical input.
Normalization: To guarantee that all features are confined within a consistent range, we implemented the MinMaxScaler technique. This method rescales the features to a range from 0 to 1, facilitating the acceleration of gradient descent convergence and guaranteeing that all features are treated with equal significance in AI techniques.
After completing the preprocessing, we utilized various Artificial Intelligence techniques to forecast multiple student-related outcomes, such as the Final Grade, adaptive level, emotional state (Happy or Sad), and focus status (Focused or Unfocused) during lectures. As illustrated in Fig. 4, The machine learning algorithms used are Random Forest (RF), Decision Tree (DT), XGBoost (XGB), and K-Nearest Neighbors (KNN). The selection of these algorithms was based on their resilience and efficiency in managing diverse datasets. In addition, four deep learning models to improve the accuracy of predictions are Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Artificial Neural Network (ANN).
A brief description of each algorithm is as follows:
Random Forest (RF)
The Random Forest methodology is a prominent option in machine learning, falling under supervised learning methods. It is flexible and can handle both classification and regression problems. This strategy leverages the power of ensemble learning, which combines numerous classifiers to solve complicated issues and improve model performance. A Random Forest classifier comprises many decision trees trained on different subsets of the dataset. The classifier increases prediction accuracy by combining predictions from various trees, commonly averaging them. Increasing the number of trees in the forest improves accuracy while reducing overfitting. Essentially, Random Forest combines the strengths of multiple decision trees to refine prediction accuracy [4].
Decision Tree (DT)
Decision trees are supervised machine-learning approaches used for regression and classification. They function by learning a series of nested if-else questions based on the data's properties and then predicting the outcome. The objective is to build a model that predicts the value of a target variable by applying fundamental decision rules derived from data attributes. The data is separated recursively based on feature values to boost information gain or minimize impurity at each step. This stage is continued until a specified point, such as maximum tree depth, or until further splits do not enhance the model's performance. Decision trees that are not appropriately trimmed may demonstrate overfitting.
XGBoost (XGB)
XGBoost, short for Extreme Gradient Boosting, is an optimized and distributed gradient boosting library celebrated for its efficiency, flexibility, and portability. It implements gradient-boosted decision trees engineered for rapid and high-performance computation. The fundamental process of XGBoost involves iteratively constructing multiple decision trees and refining them by minimizing a predefined loss function at each step. This iterative enhancement process underpins XGBoost's reputation for speed and accuracy, making it a preferred tool in machine learning competitions and diverse applications.
K-Nearest Neighbors (KNN)
The K-Nearest Neighbors (KNN) algorithm is a simple and intuitive method used for classification and regression. It works on the notion that data points with comparable features belong to the same class or have similar values. When given a new unlabeled data point, KNN selects the k nearest labeled data points from the training set and utilizes them to create predictions. It selects the most prevalent class label among its neighbors for classification but computes the average for regression. The KNN algorithm calculates the distance between the query instance and all training examples, sorts the distances to find the top k nearest neighbors, and assigns the class label by majority vote or finds the average of the closest neighbors for regression.
Convolutional Neural Network (CNN)
Convolutional Neural Networks (CNNs) are a type of neural network specifically built to interpret structured grid data like photographs. CNNs have grown in popularity due to their ability to learn spatial hierarchies of features from input data in an automated and adaptive manner. CNN’s major components are convolutional, pooling, and fully linked layers. Convolutional layers apply a series of filters to the input image, extracting high-level features like edges, textures, and forms. Pooling layers minimize the spatial dimensions of the data, lowering computational cost and helping to produce representations insensitive to tiny translations. These high-level characteristics are used by fully connected layers at the network's end to perform final classification or regression tasks.
Recurrent Neural Network (RNN)
Recurrent Neural Networks (RNNs) are designed to handle sequential data in which the order of the data points is critical. RNNs have a hidden state that maintains information about previous items in the sequence, allowing them to model temporal dependencies. Recent advancements in RNNs have concentrated on tackling challenges such as vanishing and bursting gradients, which can impede learning in extended sequences. Techniques include the use of gating mechanisms, which have broadened the application of RNNs to complicated sequence problems such as speech recognition, language modeling, and even real-time forecasting in the financial market.
Long Short-Term Memory (LSTM)
Long Short-Term Memory (LSTM) networks are a subset of RNNs that capture long-term dependencies in sequence data. LSTMs solve the vanishing gradient problem of standard RNNs by incorporating a memory cell that retains its state over long durations. The LSTM design has three gates controlling the information flow: an input gate, a forget gate, and an output gate. These gates enable the model to retain important information over extended periods, making LSTMs particularly effective for tasks such as language modeling, machine translation, and sentiment analysis.
Artificial Neural Network (ANN)
Artificial Neural Networks (ANNs) comprise layers of interconnected nodes (neurons), with each connection assigned a weight. ANNs typically consist of an input layer, one or more hidden layers, and an output layer. Each neuron's output is calculated by applying a weighted sum of its inputs followed by an activation function such as ReLU (Rectified Linear Unit) or sigmoid. Recent advances in ANNs include the creation of deeper and more sophisticated architectures such as Deep Residual Networks (ResNets) and the use of transfer learning, which involves fine-tuning a pre-trained ANN model for a specific job. These advancements have increased the applicability of ANNs to tasks including image identification, natural language processing, and complicated pattern recognition.