Data Collection
The first step in developing a predictive system via supervised learning is the acquisition of data to be used for neural network training and testing. For this project, the mentor’s course grade books for the past seven academic years were collected and compiled. The mentor teaches five sections of Introduction to Computer Programming I each academic year. After cleaning and coding the data, the authors collected 828 complete rows of student data. A significant amount of time and care was spent aligning assignments (i.e., course topics) from one semester to the next and from one academic year to the next. In all, the authors found 12 graded items common across all course sections. Authors deemed the data both reliable and valid. Regarding reliability, the mentor of this research was the only person:
- Who graded the 12 graded items
- Who entered the data into the grading program
- The only instructor for all course sections
In addition,
- The same grading scale and assignment weighting were used for all seven years.
- The same textbook (NOTE: several new editions were released, but no significant change was made to course content)
Neural Network Type Identification
This project used NeuroSolutions Professional by NeuroDimensional to construct and test neural networks. Forty-two neural network architectures were tested, with the PNN performing best on an out-of-sample dataset of 207 rows of student data. The PNN correctly identified 91.3% of the test data (see summary in Table 1). The top 25 performing neural networks and their corresponding accuracy on the out-of-sample dataset are listed in Table 2.
Neural Network Refinement
Backwards Elimination. The number of inputs was refined/reduced using backwards elimination where each input was withheld to determine if the predictive accuracy improved with its inclusion into the predictive system. The goal of backwards elimination is to have only inputs that add to the final predictive accuracy, thereby increasing the generalizability of the final predictive system. After backwards elimination, the input space consisted of twelve inputs.
The initial neural network included all graded items across the entire semester for– Fundamentals of Computer Programming I course for a total of 15 inputs. After backwards elimination, the final neural network had 12 inputs, with the second bookwork assignment and the third exam being trimmed from the input space. The final PNN had the inputs summarized by Table 3.
The resulting neural network had an overall accuracy of 90.8% and is summarized in Table 3. The predictive accuracy was less than the accuracy of the original neural network type identification (91.3%). A neural network with fewer inputs is likely to be more generalizable in a production setting, thereby performing better with new, unseen data.
Threshold Determination. The most impactful incremental improvement occurred by adjusting the threshold of the neural network to a point that maximizes the area under the receiver operating characteristic (ROC) curve. The default threshold value is 0.5 for the PNN output. In other words, a neural network output greater than 0.5 would be interpreted as a student predicted likely to succeed. A network output less than 0.5 would be interpreted as a student who will not be likely to succeed.
The threshold maximizing the area under the ROC curve is shown in Figure 1. Thus, the use of a threshold of 0.51 resulted in a sizable increase in predictive accuracy to 99.2%. At this point, this last refinement resulted in the final neural network summarized in Table 5.
Sensitivity Analysis. To create an early alert system, checkpoints/triggers need to be established at which point students should be contacted regarding their progress in the course. The authors conducted a sensitivity analysis to determine possible checkpoints across the 16-week course. Sensitivity analysis entails varying each neural network input by plus and minus two standard deviations about the mean and measuring the resulting output change in the current PNN across 50 steps on each side of the mean. The outcomes of the sensitivity analysis are detailed in Figure 2.
Fortuitously, three of the top five graded inputs, in regards to sensitivity, occur within the first three weeks of class. Three weeks into the semester should allow an instructor sufficient time to individually help the identified at-risk students change their predicted course. Given the timing of these three inputs, Bookwork 1, the MadLib program and the Property Tax Program, these assignments most likely set the tone of the course for the students. If a student has initial success in their first programming endeavor, then this trend is more likely to continue. The authors hypostasize that if an instructor focuses heavily on students’ initial success on the first two programming assignments in COSC 118, then a sizable increase in student success can be realized. With an 8.4 percentage point increase in accuracy with the slight modification of the PNN’s threshold from 0.50 to 0.51, this suggests many students are on the cusp of being successful. The authors believe that a focused effort to increase student performance on the first couple of programming assignments could result in a sizable increase in the student success rates for introductory programming courses.
By examining more closely the sensitivity analysis for the three most impactful factors: the MadLib Program, Property Tax Program, and Exam 2, one can see how various scores on these items change the NN output. These relationships are depicted graphically in Figure 3. The sensitivity graphs for all three inputs have sigmoid “S”-shaped curves, suggesting that whatever slight increases in these three assignments can be made, then a corresponding incremental increase in neural network output will result.
These findings suggest that beginning computer science students could benefit greatly by having initial success in their programming efforts. Making struggling students aware of the schools’ student success resources relating to programming early in the semester (i.e., tutoring, office hours, open lab time, etc.) could dramatically positively impact student outcomes.
Pilot Study
Pilot Intervention. During the fall 2020 semester, in an effort to assess the effectiveness of the early alert system, the first author of this research used the first three graded items as triggers for interventions taken by the instructor to assist students in their coursework. The three triggers, Bookwork 1, the MadLib program and the Property Tax Program, were all completed and graded within the first three weeks of class.
The instructor began the semester by telling the students about the paramount importance of beginning the semester with a strong start by making perfect submissions for the first couple of programs. The instructor repeatedly and strongly emphasized and finally demonstrated the use of the posted rubrics in Canvas to ensure that the students understood how their program would be graded. Then, if a student failed to submit one of the three assignments, the instructor individually contacted the student via email and then by phone if unsuccessful via email reminding the student of the impact of not submitting one of these assignments could have on her or his course outcome. The instructor also sent similar emails to students who did poorly on any of the trigger assignments reminding the student on the use of rubrics and the need to submit complete work to optimize their final course grade.
Pilot Study Results. The student outcomes from fall 2020 are compared to the outcomes from the fall 2019 semester. It should be mentioned upfront that the 2020 semester fundamentally differed from the 2019 semester due to COVID-19. In response to the pandemic, the instructor opted to offer the sections of CS101 in a live online format where the class met via Zoom twice a week during the regularly scheduled class time. Fall 2020 marked the first time teaching online for the instructor and the first time for CS101 at the school to be offered online. Albeit, the instructor recently and fortuitously completed a Quality Matters course entitled “Improving Your Online Course” in anticipation of the need to move his courses online. Given the situation, one would reasonably expect the course success rate to drop precipitously for fall 2020. The opposite, however, occurred. Student success rates actually increased, as detailed in the 2X2 contingency table shown in Table 6. Using a freely available online 2X2 contingency table calculator from Vassarstats.net (http://vassarstats.net/tab2x2.html), a chi-square test of independence showed that there was no significant association between academic semester and course outcome, X2 (1, N = 93) = 0.62, p = 0.43. The lack of statistical significance (p < 0.05) may be attributable to sample size, the minimal treatment undertaken, or to the extraordinary learning environment resulting from being a student during the COVID-19 pandemic.
While the pilot study did not yield a statistically significant result, the outcome of the pilot study suggests that the treatment may be effective. The DFW rate dropped from 31.8% in fall 2019 to 24.5% in 2020. The 23% increase in student success and a 7.3 percentage point decrease in the DFW rate support the continued use of the system. Additionally, 83% of the students who had a DFW outcome in fall 2020 had at least one of the three triggers not submitted, confirming the validity of the early alert system and the identified early alert triggers. This provided the instructor with adequate evidence to continue the early alert system for spring 2021. In doing so, the instructor attained a 16.7% DFW rate for Spring 2021.
Further Research
This paper details the creation of a highly accurate predictive system for identifying at-risk students. Hopefully, an increase in student success rates will be realized. Further research will be needed to determine the most appropriate/successful interventions that will work for students at the authors’ home institution. Other institutions of higher education wanting to create their own predictive system will need to do so using a similar methodology but with the data from their introductory computer science courses.
This research could also be treated as a general framework for identifying academic, early alert triggers for other disciplines.