Using Machine Learning to Identify At-risk Students in an Introductory Programming Course at a Two-year Public College

Nationally, more than one-third of students enrolling in introductory computer science programming courses (CS101) do not succeed. To improve student success rates, this research team used supervised machine learning to identify students who are “at-risk” of not succeeding in CS101 at a two-year public college. The resultant predictive model accurately identies ≈ 99% of “at-risk” students in an out-of-sample test data set. The programming instructor piloted the use of the model’s predictive factors as early alert triggers to intervene with individualized outreach and support across three course sections of CS101 in fall 2020. The outcome of this pilot study was a 23% increase in student success and a 7.3 percentage point decrease in the DFW rate. More importantly, this study identied academic, early alert triggers for CS101. Specically, the rst two graded programs are of paramount importance for student success in the course.


Introduction
found student success rates at colleges and universities in CS101 to be approximately 67%. Seven years later, via meta-analysis, Watson and Li (2014) found the student success rates in CS101 to be essentially unchanged at 67%. This article discusses the development of an accurate early alert system using a neural network-based predictive system. Speci cally, this system utilizes a probabilistic neural network to accurately identify students who are "at risk" of not succeeding in their introduction to a programming course. The author de ne at-risk outcomes as any course grade less than or equal to a 72% course average. The research found ve graded measures (i.e., predictive factors), which combined provided accurate predictions for students who were unlikely to succeed in CS101. These measures can be treated as triggers for an early alert system that allows an instructor to approach an identi ed at-risk student with extra one-on-one course assistance with the goal of changing the trajectory of the student toward course success.
The programming instructor piloted the early alert system during fall 2020. The pilot implementation of the early alert system described in this article resulted in a 7.3 percentage point decrease in the D grade, Fail, Withdraw (DFW) rate and a 23% increase in student success for CS101 at the researchers' home institution.

Background
According to the College's Strategic Plan Annual in 2020-21, one of the strategic goals of the author's home institution is to "create an agile and responsive business model that responds to economic changes and focuses on helping all students achieve a high level of success in learning completion." This study directly facilitates the attainment of this goal by potentially helping computer science students be successful in the most signi cant gateway course in the two-year Associates Degree program at the school. The average student success rate in CS101 at this college historically stands at 61.8%, ve percentage points below the national average. As almost 40% of the students who are willing to consider computer science by taking CS101 are unable to move onto the next course, the importance of improving the success rate in introductory computer science becomes more pressing, especially given the economic need for software developers. According to the U.S. Bureau of Labor Statistics, the job outlook is expected to grow by 22% over the next 10 years U.S. Bureau of Labor Statistics (2021). The opportunity cost for students unable to advance in a eld ranked as providing the best jobs according to the U.S. News and World Report is substantial (v). Efforts to improve student success must be undertaken. This study proposes an early alert system in which, as the academic semester progresses, key assignments trigger alerts for an instructor to step in and intervene. Ideally, interventions should occur early enough during the semester to help improve student outcomes by the end of the semester.

Literature Review
Neural networks have been employed as a means to predict student success in numerous contexts dating back to the mid-1990s. Hardgrave and Wilson (1994) used neural networks to predict graduate student success. Naik and Ragothaman (2004) utilized neural networks to predict MBA student success. More recently, the mentor for this project found neural networks to be an effective method in predicting student success in developmental mathematics and thereby improving student success at a four-year public institution of higher education in 2007. In 2008, van Heerden, Aldrich, and du Plessis (2008 demonstrated the ability of neural networks to predict student success in medical school.

Hanover Research offers a comprehensive overview of Early Alert Systems in Higher Education
(2014). Important ndings from Hanover relevant to this research include the following: 1. "Early alert systems may be most effective when targeting speci c student populations, such as…atrisk students." (p. 3) 2. An early alert system "entails a 'systematic program' that comprises at least 'two key components ': alerts and intervention." (p. 5) This study focuses on the former component and contributes to the student success literature by considering how 'alerts' (i.e., triggers) are determined. The author believe that the accuracy of the alert component of an effective system is vital to the system's success. The author employ neural networks as the means to accurately classify students as either at risk or not at risk. Thus, the most impactful factors/inputs into the neural network are treated as triggers for the early alert system.
3. Early alert systems are utilized by the majority of institutions of higher education (p. 6). Speci cally, Noel Levitz found that 87.5% of public, two-year colleges have early alert systems in place. However, only 57.1% of these schools found their systems to be "very or somewhat effective." This study aims to improve the e cacy of early alert systems at the course level and hopefully improve the 57.1% perceived e cacy at two-year institutions. 4. Metrics/factors to consider in predictive systems can be categorized as either "pre-enrollment" or "postenrollment" factors (p. 11). This study utilizes postenrollment factors (i.e., student performance data on speci c graded items in Introductory Computer Science).
Probabilistic neural networks (PNNs), the type of neural network employed for this research, have been shown to be accurate in many diverse contexts, such as in stock market index forecasting (Kim & Hak Chun, 1998), various signal processing applications (Zaknich, 1998), plant classi cation using leaf structures (Wu et al., 2007), and bankruptcy prediction (Yang et al., 1999). This study demonstrates the applicability of a PNN to accurately predict student success to assist targeting interventions.
Machine learning has been successfully applied to identify at-risk students in previous studies. (Er, 2012) utilized a combination of three machine learning techniques (instance-based learning classi er, decision trees, and naïve Bayes) to accurately predict student success in the eld of information systems. Kotsiantis (2012) demonstrated how individual student assignments can be incorporated into the creation of a decision support system for tutors. This study differs from the existing body of research in several aspects: 1. This study demonstrates the applicability of machine learning to predict student success in an introductory programming course.
2. Second, this study demonstrates the applicability of neural networks to predict student success with a high degree of accuracy.
3. Last, the pilot study detailed in this work offers evidence that the identi ed early alert triggers can be successfully used to increase student success.
4. The ndings provide other computer science educators with a framework for the development of their own "at-risk" early alert systems.
An additional outcome from this study is the identi cation of factors that predict student success in introductory computer science courses. The identi cation of predictive factors impacting student success has been addressed by multiple researchers. Dalton, Moore, and Whittaker (2009) studied the impact of being a rst-generation and low-income student on student success. Karen Hamman (2016) published a study of factors that contribute to academic recovery. Millea, Willis, Elder, and Molina (2018) presented factors determining college retention and graduation rates.
After the predictive factors are determined and an accurate predictive system is constructed, an early alert system needs to be employed to improve student outcomes. Akhtar, Warburton, and Xu (2017) created a computer-based teaching system that employed a computer support collaborative learning environment designed to support lab-based CAD teaching. The ndings of Akhtar et al. suggest that embedded predictive analytics to target timely learning interventions could improve class performance. Faulconer, Geissler, Majewski, and Tri lo (2013) found that a campus-wide early alert system "has the potential to impact student success by enhancing in real time the lines of communication among student, instructor, and advisor" (p. 47).

Methodology
Page 5/18 The steps in this research project were as follows: 1. Data collection -Data collection for the pilot project entailed the collection, cleaning and coding of CS101 student records from the instructor's gradebooks to create the training and testing dataset for the predictive system. The research team then cleaned and organized the data according to the corresponding assignments across semesters. For example, the programming assignments, problem sets, and exams were organized across all semesters to create a single compiled gradebook. The author obtained data from historic archives saved in the instructor's gradebooks for the past seven years. The author then used these data in the creation of the neural networks. A student's record was included in the dataset only if the student had a recorded outcome at the end of the semester (i.e., a letter grade or a W for withdrawal). All students were enrolled in CS101 at the author's home institution. Demographically, the student records were approximately evenly divided in regard to gender, with 52% female and 48% male. Additionally, approximately 40% of the students enrolled were Native American. Regarding the program of study, approximately 90% of the students were majoring in one of the STEM elds. In the end, a total of 592 student records were compiled into the nal dataset to be used for neural network training and testing purposes. This sample size is the maximum number of complete student records and was not based upon any statistical calculation.
The goal of training and testing neural networks is to have as much data as possible to provide adequate training to create an accurate neural network.
2. Neural network type identi cation -Numerous neural network topologies exist, and they can perform differently given a speci c dataset. The research team utilized NeuroSolutions by nDimensional Neural Network software to create the neural networks tested for this study. NeuroSolutions offers a robust list of neural network topologies, shown in Table 2. This study tested 25 different network topologies. The list of 25 different network topologies includes representative neural network topologies from all the major neural network types (i.e., multilayer perceptrons, support vector machine, probabilistic neural network, regression networks and principal component analysis networks).
3. Neural network re nement -Once a neural network topology is determined, incremental improvements in accuracy can be realized via re nements.
a. Backward elimination -The researchers rst pruned the input space via backward elimination. Backward elimination involves removing a single predictor/factor, rebuilding the neural network and retesting to determine whether an improvement in accuracy is realized. If the network's accuracy improves with factor omission, then the factor is removed from the input space. The goal, in this situation, is to obtain predictive models with only inputs that improve predictive accuracy. By having fewer predictors, a model is less prone to noise within the data and is more generalizable in a production setting.
b. Threshold determination -Once a neural network with a high predictive accuracy is identi ed, the threshold for determining whether a student is at risk or not at risk can be varied as a means to nd an acceptable balance between Type I and Type II errors. For example, if a threshold of 0.5 is used, then the network output less than 0.5 is interpreted as "at risk." Then, a threshold value of approximately 0.5 can be tested to see how the overall neural network accuracy responds.
c. Sensitivity analysis -Finally, when a neural network with an acceptable balance between false positives and false negatives is found, researchers perform a sensitivity analysis to identify the most impactful predictors. Sensitivity analysis involves varying each predictor by a given number of standard deviations and examining how the neural network output responds.
4. Pilot Experiment -The researchers piloted the nal neural network in a pilot study during the spring 2021 academic semester.
The results from each of these steps are summarized in the next section.

Data Collection
The rst step in developing a predictive system via supervised learning is the acquisition of data to be used for neural network training and testing. For this project, the mentor's course grade books for the past seven academic years were collected and compiled. The mentor teaches ve sections of Introduction to Computer Programming I each academic year. After cleaning and coding the data, the author collected 592 complete rows of student data. A signi cant amount of time and care was spent aligning assignments (i.e., course topics) from one semester to the next and from one academic year to the next. In all, the author found 12 graded items common across all course sections. The author deemed the data both reliable and valid. Regarding reliability, the mentor of this research was 1. The only person who graded the 12 graded items 2. The only person who entered the data into the grading program, and 3. The only instructor for all course sections.
In addition, 1. The same grading scale and assignment weighting were used for all seven years.
2. The same textbook was used for all seven years (NOTE: several new editions were released, but no signi cant change was made to course content).

Neural Network Type Identi cation
This project used NeuroSolutions Professional by NeuroDimensional to construct and test neural networks. Forty-two neural network architectures were tested, with the PNN performing best on an out-ofsample dataset of 207 rows of student data. The PNN correctly identi ed 91.3% of the test data (see summary in Table 1). The top 25 performing neural networks and their corresponding accuracy on the out-of-sample dataset are listed in Table 2.
Neural Network Re nement Backward Elimination. The number of inputs was re ned/reduced using backward elimination, where each input was withheld to determine whether the predictive accuracy improved with its inclusion into the predictive system. The goal of backward elimination is to have only inputs that add to the nal predictive accuracy, thereby increasing the generalizability of the nal predictive system. After backward elimination, the input space consisted of twelve inputs.
The initial neural network included all graded items across the entire semester for the Fundamentals of Computer Programming I course, for a total of 15 inputs. After backward elimination, the nal neural network had 12 inputs, with the second bookwork assignment and the third exam being trimmed from the input space. The inputs of the nal PNN are summarized in Table 3.
The resulting neural network had an overall accuracy of 90.8% and is summarized in Table  3. The predictive accuracy was less than the accuracy of the original neural network type identi cation (91.3%). A neural network with fewer inputs is likely to be more generalizable in a production setting, thereby performing better with new, unseen data.
Threshold Determination. The most impactful incremental improvement occurred by adjusting the threshold of the neural network to a point that maximizes the area under the receiver operating characteristic (ROC) curve. The default threshold value is 0.5 for the PNN output. In other words, a neural network output greater than 0.5 would be interpreted as a student predicted likely to succeed. A network output less than 0.5 would be interpreted as a student who is not likely to succeed.
The threshold maximizing the area under the ROC curve is shown in Figure 1. The use of a threshold of 0.51 resulted in a sizable increase in predictive accuracy to 99.2%. This last re nement resulted in the nal neural network, summarized in Table 5.
Sensitivity Analysis. To create an early alert system, checkpoints/triggers need to be established at which students should be contacted regarding their progress in the course. The author conducted a sensitivity analysis to determine possible checkpoints across the 16-week course. Sensitivity analysis entails varying each neural network input by plus and minus two standard deviations about the mean and measuring the resulting output change in the current PNN across 50 steps on each side of the mean. The outcomes of the sensitivity analysis are described in Figure 2.
Fortuitously, three of the top ve graded inputs, with regard to sensitivity, occur within the rst three weeks of class. Three weeks into the semester should allow an instructor su cient time to individually help the identi ed at-risk students change their predicted course. Given the timing of these three inputs, the Bookwork 1, MadLib program and Property Tax Program assignments most likely set the tone of the course for the students. If a student has initial success in her rst programming endeavor, then this trend is more likely to continue. The author hypothesizes that if an instructor focuses heavily on students' success in the rst two programming assignments in COSC 118, then a sizable increase in student success can be realized. The 8.4 percentage point increase in accuracy accompanying the slight modi cation of the PNN's threshold from 0.50 to 0.51 suggests that many students are on the cusp of being successful. The author believe that a focused effort to enhance student performance on the rst couple of programming assignments could result in a sizable increase in the student success rates for introductory programming courses.
By more closely examining the sensitivity analysis for the three most impactful factors, the MadLib Program, Property Tax Program, and Exam 2, one can see how various scores on these items change the neural network output. These relationships are depicted graphically in Figure 3. The sensitivity graphs for all three inputs have sigmoid "S"-shaped curves, suggesting that whatever slight increases in these three assignments can be made, then a corresponding incremental increase in neural network output will result.
These ndings suggest that beginning computer science students could bene t greatly by having initial success in their programming efforts. Making struggling students aware of the schools' student success resources relating to programming early in the semester (e.g., tutoring, o ce hours, open lab time) could have a dramatic, positive impact student outcomes.

Pilot Study
Pilot Intervention. During the fall 2020 semester, in an effort to assess the effectiveness of the early alert system, the rst author of this study used the rst three graded items as triggers for interventions taken by the instructor to assist students in their coursework. The three triggers, Bookwork 1, the MadLib program and the Property Tax Program, were all completed and graded within the rst three weeks of class.
The instructor began the semester by telling the students about the paramount importance of beginning the semester with a strong start by making perfect submissions for the rst couple of programs. The instructor repeatedly and strongly emphasized and nally demonstrated the use of the posted rubrics in Canvas to ensure that the students understood how their program would be graded. Then, if a student failed to submit one of the three assignments, the instructor individually contacted the student via email and, if no response was received, then by phone to remind the student of the impact not submitting one of these assignments could have on her or his course outcome. The instructor sent similar emails to students who did poorly on any of the trigger assignments, reminding the students about the use of rubrics and the need to submit complete work to optimize their nal course grade.
Pilot Study Results. The student outcomes from fall 2020 are compared to the outcomes from the fall 2019 semester. It should be mentioned upfront that the 2020 semester fundamentally differed from the 2019 semester due to COVID-19. In response to the pandemic, the instructor opted to offer the sections of CS101 in a live online format, where the class met via Zoom twice a week during the regularly scheduled class time. Fall 2020 marked the rst time the instructor taught online and the rst time CS101 was offered online at the school. However, the instructor had recently completed a Quality Matters course entitled "Improving Your Online Course" in anticipation of the need to move his courses online. Given the situation, one would reasonably expect the course success rate to drop precipitously for fall 2020. The opposite, however, occurred. Student success rates actually increased, as detailed in the 2X2 contingency table shown in Table 6. Using a freely available online 2X2 contingency table calculator from Vassarstats.net (http://vassarstats.net/tab2x2.html), a chi-square test of independence showed that there was no signi cant association between academic semester and course outcome, X 2 (1, N = 93) = 0.62, p = 0.43. The lack of statistical signi cance (p < 0.05) may be attributable to sample size, the minimal treatment undertaken, or to the extraordinary learning environment resulting from being a student during the COVID-19 pandemic.
While the pilot study did not yield a statistically signi cant result, the outcome of the pilot study suggests that the treatment may be effective. The DFW rate dropped from 31.8% in fall 2019 to 24.5% in 2020. The 23% increase in student success and the 7.3 percentage point decrease in the DFW rate support the continued use of the system. Additionally, 83% of the students who had a DFW outcome in fall 2020 had not submitted at least one of the three triggers, con rming the validity of the early alert system and the identi ed early alert triggers. This provided the instructor with adequate evidence to continue the early alert system for spring 2021 and fall 2021. By doing so, the instructor attained a 16.7% DFW rate for spring 2021 and a 15.6% rate for fall 2021.

Further Research
This paper describes the creation of a highly accurate predictive system for identifying at-risk students. Hopefully, an increase in student success rates will be realized. Further research is needed to determine the most appropriate/successful interventions that will work for students at the author's home institution. Other institutions of higher education wanting to create their own predictive system will need to do so using a similar methodology but with the data from their own introductory computer science courses. Additional research needs to be performed to determine the applicability of this study to other elds of study (i.e., other gateway courses with high DFW rates).
Ideally, this study could be treated as a general framework for identifying academic, early alert triggers for other disciplines.

Conclusion
This study demonstrated the ability of a neural network-based predictive system to accurately identify students who were at risk of not succeeding in the introductory programming class at a two-year public institution of higher education. A probabilistic neural network was used to accurately classify 99% of students in an out-of-sample test dataset of 207 students.
The author view this study as a rst step to increasing student success rates in introductory CS courses at 2-year public colleges. This article can serve as a framework for other early alert systems for other gateway courses. The next step is to explore treatment options and determine their e cacy. The rst attempt at treatment options was piloted by the mentor of this study in fall 2020. While the pilot study did not yield statistically signi cant results, the study provided su cient evidence for the mentor to continue using the early alert system.
The ultimate goal of this study is to increase student success rates in introductory CS courses, thereby increasing the number of degrees and certi cates awarded. As over one-third of beginning CS students are stopped by the rst course in the program of study, it is incumbent upon computer science educators to nd solutions to help all students interested in computer science be successful. Given the tremendous shortage of quali ed information technology professionals both nationally and globally, these efforts can de nitely result in positive social change by helping students who have already expressed an interest in computer science to be successful.

Declarations
Availability of Data and Material: The data that support the ndings of this study are available on request.

Not applicable
Acknowledgments:

Not applicable
Compliance with Ethical Standards Ethical Statement. The author declares that he has no con icts of interest. All procedures performed in this study involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Consent Statement. Informed consent was obtained from all individual participants included in the study. Overall accuracy (53+136)/207 = 91.30% Table 2 Top 25     Note. * = p = 0.43 Figure 1 Increase in area under the ROC curve across different thresholds.

Figures
Note. (N = 592 student records used in threshold calculations).

Figure 2
Sensitivity about the Mean.
Note. (N = 592 student records used in sensitivity about the mean calculations).