Recently, AI has been adopted in the computing field extensively and effectively. The benefits and enhancement due of AI in the education sector have been highlighted in the literature. A few examples of the application of AI in the educational sector, but not limited to, are applications of data analytics, predicting student enrolments, a recommendation system for career pathway or resource management, adaptive tutoring, prediction of student readiness for employment, monitoring and predicting student academic performance or identifying struggling students. Table 1presents a brief overview of the related previous works.
Table 1
A brief overview of related work
Early prediction of undergraduate Student’s academic performance in completely online learning: A five-year study (15)
|
Proposed a collection of AI models to predict student academic progress from LMS interaction data and student academic data like GPA and enrolment test data. The data consists of LMS log files, demographics, and academic achievement. No research methodology is identified.
|
Predicting Students’ Academic Performance Through Supervised Machine Learning (24)
|
Developed an AI based system to predict student performance from their demographical and LMS interaction data. The dataset comprises of demographical characteristics and LMS interaction data including gender, country, birthplace, view of the LMS content, quiz attempts, and assessment submissions. The nature of the dataset does not allow early prediction. The research methodology is not clear.
|
Predicting Students’ Academic Procrastination in Blended Learning Course Using Homework Submission Data (25)
|
Develop an algorithm to enhance students’ academic progress by detecting struggling students through their homework submission behaviours e.g., no submission or late submission. The nature of the dataset does not allow enough time to offer timely interventions and support to enhance student academic performance. No research methodology is identified to construct the predictive model e.g., DSR or DBR.
|
An Efficient Approach for Multiclass Student Performance Prediction based upon Machine Learning (26)
|
Predicted the students’ performance by using four classification algorithms.
The same dataset is used in other studies as well but with different ML classifiers (27, 28). The study used secondary school students, not HE and did not use of LMS data.
Used socio-economic attributes of students which do not allow timely identification of the at-risk student. The research approach is not based on the similarities of DSR and DBR principles.
|
Design, development, and evaluation of a mobile learning application for computing education (29)
|
Applied DSR approach to developing mobile learning application for HE for better student learning. The research approach is only based on the DSR approach and not on DBR principles or similarities between DSR and DBR. No AI (DL or ML) models are used to predict student academic performance.
|
Existing studies does not focus on the LMS big data to predict academic performance earlier in the learning pathways. Most of studies has used data generated from transitional on-campus educational settings or completely online settings and not much studies studied data generated by student interaction with LMS in blended learning. Also, most of the existing research did not highlight the significance of identification of at-risk students in early stages of studies. There is a need to investigate a real-time automated analytical solution to identify student at risk of failing earlier in blended learning environment to timely offer strategies and remedial measures to keep the student academic progress on track. Furthermore, most of the related studies from research methodology and DSR artefact construction and evaluation are insufficient considering: that these studies did not use integrated DSR and DBR methodology to layout the study to design and develop an artefact; these studies Big data analytics approaches but do not employ DSR or DBR or integrated DSR paradigm; these studies did not evaluate the DSR artefacts according to their complexity. However, existing literature can be leveraged to extrapolate to achieve the objective of this study, thus, forming the foundation of this study.
Integrated Design Science Research Methodology
Research methodology defines the guides and boundaries through which a study can be conducted ensuring its scientific value and significance. Researchers highlight research methodology as the most significant step to accomplish the purposes of the research. This study developed and used an innovative IS research methodology based on the similarities of two research approaches: DSR methodology from IS and Design based research (DBR) methodology. DBR is considered a DSR realization in the education sector to conduct research to develop and evaluate an BDAS as an IT and DSR artefact. DSR complements DBR and provides multi-paradigm perspectives to construct fundamental knowledge by researching social pragmatisms (30–32).
DSR approach suits the studies that will justify the research requirement and contribute to knowledge and development of the artefact (33). For example, Miah et. al.(34) have used the DSR framework to design a mobile based application for education; Carstensena and Bernhard (35) designed and improved teaching in the engineering education sector by utilizing the DSR methodology; Miah et. al. (36) utilized DSR approach to extend mobile health information system; and Miah et. al. (37) described development of the design of a DSS as method artefact. DBR methodology intends to achieve outcomes to improve student learning or enhanced understandings about teaching and learning or other educational phenomena (38). The similarities among both methodologies are:
-
Both are problem solving methodologies
-
Both approaches design from a viable practical perspective
-
Both approaches contribute to the knowledge based
-
Both reflect on the nature of the theory
-
Both produce the theoretical and practical artefact
-
Both have an iterative cycle of design and rigorous evaluation
The study followed an integrated DSR methodology (39) consisting of five phases based on the similarities of DSR and DBR leveraging a variation of Peffer’s DSR Methodology (33). The five phases, as shown in Fig. 1, are: (1) Problem Identification; (2) Solution analysis; (3) Artefact Design and Development; (4) Evaluation; (5) Outcome Communication.
The study begins with a detailed problem description and analysis of existing studies to drive the design requirements and objective of designing an BDAS from the literature. This formulates the design principles of design and development of DSR artefact for a later phase by executing Systematic Literature Review and Meta Analysis. Next, the study evaluates the findings to establish design considerations for BDAS. In the third phase, BDAS as a DSR artefact is designed, developed, and evaluated formatively by using AI data analysis techniques (ML and DL algorithms). In the final phases, the summative evaluation is carried out and the outcomes of the study are communicated as a contribution to the knowledge area.
Problem Identification and Objectives of the Artefact
In the initial phases of our integrated DSR research methodology, an extensive systematic literature review and meta-analysis (SLRM) was conducted about the application of AI based technology in HE regarding student academic progress. The systematic literature review aims to understand the trends of application of AI based technology to a wide spectrum related to monitoring and predicting student academic performance and identify the different AI algorithms and process of development of AI models. The SLRM is conducted by using the PRISMA(40) framework with defining a search protocol incorporating inclusion and exclusion criteria and providing rich findings. The SLRM highlighted the phases, algorithms and evaluation metrics used in the studies. These algorithms and evaluation metrics form the foundation of the design and development of BDAS.
The objective of designing and developing the BDAS is to train and evaluate a predictive model with classified data to predict the student's academic progress. The predictive model must be sufficiently accurate to identify students who are at risk of failing. The prediction can assist educators to implement strategies to enhance student learning and improve their academic performance. BDAS can be integrated into coursework for timely and accurate identification of student academic progress, especially for the student at risk. This timely identification of students at risk supports earlier intervention to improve their academic performance. The generic computational model consists of Data collection, Data pre-processing, data analysis with algorithms and evaluation. This generic model is tailored for each iteration of the design and development phase for BDAS. Each iteration utilized different pre-processing techniques and algorithms to achieve the objective of the BDAS. In case of educational big data, a large amount of real-time data is generated by LMS. The BDAS predictive model is trained on a set of training dataset and will be deployed and integrated with LMS using a data processing framework to section the real-time big data stream into small segments via pipelines to feed to BDAS to predict student academic performance for enhanced student academic progress and better decision making. The following figure (Fig. 2) shows the process of design and development of DSR artefact as the BDAS.
Big Data, LMS and Big Data Analytics
Big data technologies can play significant role in improving data processing, data storage, data analytics and visualization (41). Big data significant impact the transformation of learning process and adoption of relevant innovative technologies(13). The overview of big data analytics in HE is illustrated in Fig. 3. LMS platforms are considered as major source of big data and is an essential application to plan, deliver, monitor, and assess learning process e.g., Moodle, Blackboard, Canvas, Forma LMS, OpenOLAT. Moodle and Blackboard are most popular LMS platform. LMS platform has three key purposes: (i) Management of digital content material and student access record, (ii) Management of assessments and student progress, (iii) management of student feedback and interaction (42).
LMS generates rich and huge volume of data which increases the need of innovative solutions to improve learning and education management. There is also an emerging requirement of LMS integrated tools to interpret and manipulate the data generated by LMS (42, 43).
Big data is produced by users (educators, administrator, and students) interacting with LMS in different manners. For example, educators upload material to deliver digital course materials to their students and student access these materials for learning, students attempt the LMS based tests related to a specific concept or students submits the assessment documents on LMS. Big data analytics applies set of analytical techniques to extract useful information and provide insight from big educational data related to students’ learning behaviours, assessment scores, student learning styles, student logging in information, time spend on a task/module, assessment submission patterns, most visited page/content, completing a task or module or posting details about extracurricular activities (44) (45, 46).
Big data analytics allows to identify the real learning pattern of the students more accurately than the traditional practices. Big data analytics supports HE to make better and informed decision making based on the big data generated by LMS. It supports (42, 45, 47–49):
-
Customized and adaptive learning for better learning path
-
Plagiarism detection in student submissions to improve academic integrity
-
Student performance prediction for better course deliver planning
-
Course Selection or Recommendation System
-
Identification of students at risk based on their behaviour pattern to plan and delivery appropriate and timely interventions
-
Dropout prediction
-
Student participation and engagement measurement tracking to enhance learning experience
-
Strategic planning to achieve HE goals
AI algorithms take all input data at once and process it to provide output, which is not possible in big data analytics due to the high velocity and huge volume of the big data. There are multiple approaches to solve this issue and apply AI algorithms on educational big data e.g., high-performing computing infrastructure, parallel processing approach and/or data processing platforms for data segmentation. In this study, data processing platform is suggested to deploy BDAS artefact (42, 45).
Artefact Design and Development
An AI based DSR artefact is a complex artefact and designed according to the requirements and objectives identified in previous phases. Design approaches developed around contextual knowledge and general practices lead to enhanced artefact design (50). This study has used two sets of iterations to design and develop the BDAS as a predictive model based on existing approaches in literature: ML based predictive model; DL based predictive model. In this phase, we apply ML and DL algorithms to design and develop ML based and DL based predictive models as DSR artefacts to identify potential students at risk of failing accurately from a dataset based on student LMS interaction. This iterative approach in this phase provides continuous improvement of the construction of DSR artefact by evaluating various performance metrics by using the confusion matrix in each iteration. These performance metrics of different AI algorithms in each iteration are compared to select the best predictive model.
BDAS as a DSR artefact is constructed by a series of tasks consisting of Data collection, Data pre-processing, Data analysis with AI algorithms, Evaluation and successful decision marking (13, 51). All these tasks are tailored to develop and evaluate ML and DL based predictive models. The workflow of training an AI based artefact is illustrated in Fig. 4.
This study has sourced a freely available dataset comprising 230,318 instances of students’ activities and interactions with LMS to train the predictive model. The dataset consists of 13 features including time-series based features i.e., Session number, Student number, Exercise number, Activity name abbreviation, Start time of the activity, End time of the activity, Idle time during activity, Mouse wheel movement count, Mouse wheel click count, count of Mouse left click, count of Mouse right click, Mouse movement count and count of Keystroke. The dataset is pre-processed and normalized, and features are selected by correlational analysis to build a dimensional vector including categorised features. This transformed dataset is then used to train the predictive model by using ML and DL algorithms to detect students at risk of failing.
In the first iteration, five tree based ML supervised algorithms (J48, Random Forest, OneR, Decision Stump, NBTree,) are used to train and evaluate the predictive model. These tree based algorithms use a series of if-then decisions to generate highly accurate, easily interpretable predictions, to identify potential students at risk of failing. A booster ensemble technique is applied to the transformed dataset to further fine-tune it. The predictive model is trained and tested by using k-fold cross validation on the training and testing data using the above five ML supervised algorithm iteratively. In the final step, performance metrics are compared for all the predictive models based on five ML algorithms to select the most accurate predictive model to construct BDAS. In the real-time implementation of the BDAS, a data processing framework, e.g., Apache spark, will be used to receive and segment the real-time big data stream from LMS and decomposes the large data into small batches to be processed by the BDAS predictive model.
In the second iteration of continuous improvement of the design of the AI based artefact, two different data pre-processing techniques are used to modify the class distribution and augment the dataset to resolve the implications of an imbalance dataset. DL algorithms are made up of neural networks with several layers of differentiable nonlinear nodes. Three DL algorithms Long Short-term Memory (LSTM), Multi-layer perceptron (MLP) and Sequential Model (SM), are applied to train the augmented dataset which demonstrated higher classification accuracy of the prediction model and reduces false prediction. The higher classification accuracy and reduced false prediction mean a low instance of incorrectly not identifying students who are not at-risk, therefore addressing the objective of the general description of the BDAS as a DSR artefact.
Artefact Evaluation
The evaluation phases focus on whether the developed artefact has achieved the purpose it is designed for and it is a vital phase of a study in the DSR domain. The evaluation of the developed artefact within its context is a vital component of the evaluation strategy (52). In this study, BDAS as the artefact is evaluated by an innovative DSR evaluation framework to evaluate the utility, efficacy, and effectiveness (53, 54) of the artefact with hybrid evaluation requirements by using the Confusion matrix, given in Fig. 5. In addition, to train, test and evaluate an AI based predictive model the original dataset in sectioned into three (3) sections i.e., Training dataset, Testing dataset and Validation dataset. The predictive model is trained and testing on the training dataset and testing dataset respectively during the construction of the predictive model. The trained predictive model is evaluated to define a generalize predictive model by using the validation dataset.
The summative evaluation episodes highlight the outcome and impact of the implemented artefact in a context, thus performed towards the completion of the study. One of the summative episodes was performed to evaluate the effectiveness and efficacy of the predictive model by accurately identifying the students at risk early in the semester. Validation dataset is used to execute the terminal evaluation episode to evaluate the effectiveness the BDAS predictive model and generate a generalise the BDAS predictive model. The second and final summative episode, an ex-post evaluation, to evaluate the utility of real users with live unseen data is left for future work.