School Attendance Prediction among Students with Autism: A Deep Learning-based Framework

13 Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that affects the areas of social communication and behavior. The term “spectrum” refers to the wide range of symptoms observed across 15 individuals with ASD. Many children with ASD experience difficulty with daily functioning at school and 16 home. ASD prevalence increases in the United States, with the most recent prevalence of 1.9%. Given the 17 wide range of social and learning, difficulties experienced by children with ASD, it is paramount that they 18 are able to attend school to receive the appropriate range of interventions. 19 School absenteeism (SA) is a significant concern given its association with many negative 20 consequences such as school drop-out. Early prediction of SA would help school district to implement 21 effective interventions to ameliorate this issue. Due to its heterogeneity, students with ASD show within-group differences concerning their SA. 23 This research introduces a deep learning-based framework for predicting short-and long-term SA of 24 students with ASD. The Long Short-Term Memory (LSTM) algorithm is used to predict short-term SA. Similarly, Multilayer Perceptron (MLP) and Random Forest (RF) algorithms are used to predict long-term 26 SA. The proposed framework achieves a high accuracy of 89% and 90% to predict short-term and long- 27 term SA, respectively. 31 32 33

For some individuals diagnosed with ASD, they will experience a range of significant impairments 45 that adversely impact their quality of life 4 . Thus, it is paramount that school age children with ASD are able 46 to consistently access interventions and specialized educational instruction in school settings 5 . For example, 47 attending public school enables students with to interact with their typically developing peers, which may 48 increase their social development, a key area of need for children with ASD. In the same regard, there are 49 many special education schools that provide intensive, specialized interventions and learning opportunities 50 for children with ASd 6,7 . 51 Recent reports suggest that children with ASD miss school more than other clinical populations, 52 leading to fewer opportunities for these students to benefit from school-based interventions. For example, 53 5%-28% of typically developing students are reported to have missed school days while this percentage 54 jumps to 40%-53% among students with ASD 6 . Similarly, the percentage of chronic absenteeism (CA), 55 defined as missing more than 10% of the annual school days, among typically developing students and 56 students with ASD is 13% and 23%, respectively 6 . These statistics clearly illustrate that SA 57 disproportionally affects children with ASD and can serve to negatively impact the effectiveness of ASD 58 school-based interventions 6,7,8 . 59 At present, there is limited research on risk factors for SA in children with ASD. Some areas that 60 require further attention include: (1) examination of gender, anxiety, depression, and challenging home 61 settings 9 ; (2) examination of possible associations between SA and specific child characteristics. For 62 instance, students with particular health conditions (e.g., asthma) miss more school days when they 63 experience severe symptoms 9 ; and (3) group-level comparison between students with ASD and typically developing students with respect to SA risk factors. Besides the importance of these studies, the problem 65 of SA prediction can also be addressed because of the possibility to provide timely interventions to improve 66 the SA. To the best of our knowledge, this is the first study that introduces Machine Learning-(ML) and 67 Deep Learning-(DL) based frameworks for predicting SA of students with ASD. The following paragraphs 68 discuss the challenges of the SA prediction problem and the advantages of using ML and DL techniques 69 over the conventional statistical analysis tools. 70 SA prediction aims to establish the probability for each student of the number of missed school 71 days in the future. If sufficiently accurate, such information could allow school districts and at-home 72 caregivers to understand SA patterns and perhaps divert attention and resources to specific children with 73 enough time to intervene adequately. This in turn would enable students to attend school regularly and 74 benefit from school-based interventions and services. 75 Students with ASD are heterogeneous as they show different symptoms and severity of symptoms 76 that might also change with time which can present as a challenge to prediction of SA 9 . For instance, in our 77 case Fig.1 shows that students with equal or similar attendance rates show very different individual 78 attendance patterns. The figure also shows that ASD students of different risk factors (e.g., food allergy) 79 are similar with respect to their attendance rate. These findings suggest a hypothesis that the group-level 80 analysis of SA risk factors does not necessarily explain the SA behavior of the ASD students at the 81 individual level, which is important in designing customized school-based interventions for SA. 82 Predicting SA at the individual level requires mining the SA history of each student. The authors 83 decided to recast the SA prediction problem into a time-series based sequence prediction to accomplish 84 this. Therefore, we used the students' attendance and maladaptive behaviors, modeled as a time series, as 85 input data to predict their SA and CA behaviors in the future. Methodology-wise, we used ML and DL 86 techniques because they outperform the conventional techniques (e.g., ARIMA). More details in this regard 87 will be given in the following section. 88 The main hypothesis of this research is twofold: (1) each student with ASD shows different SA 89 patterns (as shown in Fig. 1); and therefore, (2) SA is better predicted at the individual level. These hypotheses led the authors to utilize a framework that employs a combination of DL, ML, and time series 91 modeling techniques to model and predict the individual SA and. These techniques are adopted because 92 they outperform the conventional statistical techniques (e.g., ARIMA) in learning the complex patterns and 93 long-range dependencies of the temporal data (e.g., SA behavior). 94 The results are expected to provide early predictions of when each student will be absent and which 95 students might be at risk of CA in the future. The present research project uses a real dataset for a population 96 of 120 students with ASD. The data was collected at a private special education school in a mid-Atlantic 97 state. More details about the data will be provided in the following sections. 98 The first objective of this research is to propose a short-term prediction framework to predict the 99 association between ASD and other factors, such as parental, perinatal, prenatal, and neonatal, are also 117 investigated and discussed in the literature 11, 12,13 . 118 The advances in genetics research lead to a growing interest in discovering what causes ASD from 119 a genetic perspective. This question is still challenging, and its answer is arguable. While many studies 120 show that autism traits are heritable, the responsible gene factor(s) is(are) not commonly defined 4 . Some 121 research shows that different gene expressions cause different traits or symptoms of ASD. On the contrary, 122 other studies concluded that different traits could be linked to the same underlying genetic expression 11 . 123 As a parallel research stream to ASD diagnosis, a significant amount of research investigates 124 multiple comorbid symptom patterns, psychiatric disorders, and medical conditions. For example, children 125 with ASD are reported to have different sleeping patterns, such as bedtime resistance, night waking, sleep 126 anxiety, and many others 14 . Food aversion (e.g., eating refusal), social anxiety, and aggressive behavior are 127 all considered as phenotypes of ASD 15,16 . 128 The learning capabilities of students with ASD, in relation to the academic goals of school, is 129 another research focus. For example, multiple studies show that reading comprehension skills are negatively 130 affected by ASD 17 . Many other studies show that students with ASD have different school-related behaviors 131 as compared to typically developing students 18 . Students with ASD tend to miss more school days than 132 typically developing students 10 . To the best of our knowledge, the risk factors of SA behavior are not yet 133 well-defined. The following sections will discuss this area in more detail.

SA and CA risk factors 136
SA is problematic for its long-term impact on students a range of student outcomes. Recent reports 137 show that 13%-16% of US students are chronically absent. This percentage represented eight million 138 students in 2015 4 . The percentage of CA among autistic students is twice that of typically developing 139 students. Given these alarming figures, the association between SA and ASD has been inadequately 140 The school environment has a significant effect in this regard. For example, classroom behavior 148 management is significantly associated with SA. Also, the transition between classes, grades, 149 developmental stages, as well as learning demands are other challenging risk factors for ASD 7,18 . 150 Awareness has been raised to the schools' role in managing and controlling SA through early and 151 well-designed interventions. SA prediction is critical for schools to effectively improve their students' 152 attendance. To accomplish this, schools need to know, in advance, when and for how long each student 153 might be absent. This will give the schools enough time to plan for proper and effective t interventions. 154

Statistical models for SA and CA prediction 155
Many research studies investigate the SA and school refusal factors using different statistical 156 techniques. For example, the chi-square test has been used to assess the association between SA and other 157 factors such as gender, presence of siblings, and single parents 19 . ANOVA and chi-square tests have been 158 employed to thoroughly investigate the social, emotional, and behavioral characteristics of autistic students 159 who have SA 9 . Simple hypothesis-testing techniques are also used to prove the significance of anxiety for 160 school as a predictor factor of SA 20 . Simple statistical analysis has been deployed to address the relationship 161 between CA and some developmental disabilities of students with ASD 10 . Regression analysis techniques 162 are also used to test the significance of different factors as SA predictors 21, 22,23 . 163

Machine learning in education. 164
ML is a set of powerful techniques widely used to analyze and obtain useful insights from 165 multivariate and complex data. Interest is growing to harness ML capabilities in the area of education 166 research. For example, the association mining algorithm is used to discover the students' behavioral factors homogeneous groups of similar learning styles 24 . Also, the student's drop-out possibility is predicted using 169 logistic regression and decision tree algorithms 24 . Many other applications have been developed to 170 introduce ML algorithms into other school-related applications such as students' achievement prediction 24 . 171 SA is another research focus of education research literature. For example, intensive research work 172 has been directed at defining the risk factors of SA 25 . To the best of our knowledge, ML and DL algorithms 173 have not been used to predict the SA behavior of students with ASD or any other child population. This 174 research aims to fill this literature gap by introducing a ML/DL framework for SA and CA prediction among 175 students with ASD. 176

Short term SA prediction (univariate and multivariate) 178
This research proposes a DL-based framework for predicting the short-term SA of students with 179 ASD. First, a univariate LSTM forecasting model is proposed to provide early predictions of the students' 180 SA behavior dependent upon their attendance history. Expanding upon this, a multivariate LSTM model is 181 then employed by enriching the data source with the students' maladaptive (socially inappropriate) behavior 182 history (e.g., aggressive behavior). As shown in Fig. 3, adding maladaptive behavior improves prediction 183 accuracy and precision while it slightly decreases prediction recall. These results encourage us to dig deeper 184 into investigating the causality relationship between maladaptive and SA of students with ASD. Such an 185 investigation will help design more customized SA interventions that consider these two essential 186 phenotypes of ASD. For example, more customized in-class learning activities could be implemented to 187 improve the students' adaptive behavior, which result in better school attendance. 188 From a practical perspective, it is of value to know for how far ahead the proposed model can 189 satisfactorily predict SA. Fig. 3 shows that the model is recommended to be used for a time horizon of 10 190 school days ahead with an accuracy of (90%), and acceptable precision of (80%). As expected, the overall 191 quality of the prediction decreases as the forecasting lead value increase. It implies that the SA of students 192 with ASD might change over time. So, consistent updating mechanisms (e.g., mobile apps) should be in Long-term SA prediction (scenario I + scenario II) 195 Fig. 4 shows examples of MLP and RF's capability to satisfactorily predict long-term CA even 196 when only short attendance history is available. The two algorithms are used to learn the CA history of 120 197 students with ASD to predict whether each student will be chronically absent over the upcoming three 198

months. 199
Using different train/test splits, MLP consistently provides better results in terms of all the 200 performance metrics. The prediction performance of the two algorithms is summarized in Table  201 1considering the 3 months and 12 months scenarios. 202 Our results also highlight the causality relationship between maladaptive behavior and SA of 203 students with ASD. More research effort is needed to address this issue quantitatively through different 204 techniques, such as social networks and association mining algorithms. In our opinion, the more the 205 dynamics of ASD phenotypes are investigated, the more the SA interventions will be customized and 206 efficient. Moreover, these research results are expected to encourage school districts to collect, track, and 207 intelligently analyze school-related data, which result in the improvement of overall education quality. 208

Conclusion 209
In this research, the ML-and DL-based framework is proposed for the SA and CA prediction of students 210 with ASD. First, the input data is modeled as a time series to represent the students' attendance and 211 maladaptive behavior history. LSTM algorithm is used for short-term SA prediction. MLP and RF 212 algorithms are then used for long-term CA prediction. Both models show a promising capability to predict 213 SA and CA behavior for 10 school days and three months ahead, respectively. The results are expected to 214 help in designing customized interventions to manage SA effectively. Future research includes (1) 215 improving the adopted algorithms' performance through hyperparameters optimization, and (2) enriching 216 the proposed framework's data source using other characteristics and behaviors to predict SA and CA. 217

Methods 221
This research introduces a ML-and DL-based framework to handle short-term SA and long-term 222 CA problems for students with ASD. LSTM algorithm is used for the first problem. In this regard, univariate 223 and multivariate forecasting models are built. Students' attendance history is used as input for the univariate 224 model, while the multivariate model considers the history of students' maladaptive behavior as another data 225 input. The univariate model predicts students' SA based on their attendance behavior. In contrast, the 226 multivariate model depends on students' attendance and maladaptive behaviors to predict their SA. For the 227 CA prediction problem, the individual characteristics are added to the attendance history to enrich the data 228 source. Two different scenarios are also hypothesized for students with long and short attendance history, 229 as detailed later. 230

Data description 231
This research's targeted population includes 120 students with ASD with significant impairment 232 and who have an average age of six years, and of which 79% are male, while 21% are female. Also, the 233 population has an attendance rate of 90%, while 23% are reported chronically absent. The data was collected 234 from the Institute for Child Development (ICD) in Binghamton, NY. The research presented in this study 235 was approved by the Binghamton University's Institutional Review Board (IRB). Also, all methods utilized 236 in this study for data collection were carried out in accordance with relevant regulations. The informed 237 consent was waived off in this study and it was approved by the Binghamton University Human Subjects 238 Research Review Committee (HSRRC), which is the IRB responsible for the review of research. 239 Table 2 provides more details about the demographic characteristics of the targeted population. 240 We first investigated whether the students' individual characteristics (e.g., communication skills, 241 motor skills, emotional control, and others) are significant predictors of their SA behavior. This 242 investigation is motivated by the lack of research that addresses the relationship between individual 243 characteristics and SA 2 . Statistical hypothesis testing is applied, and the results, depicted in Fig. 1, show no  244 association between these characteristics and the SA of the targeted population. The results also support 245 our hypothesis that (1) SA is heterogeneous and should be predicted at the individual level, (2) SA is better predicted depending on its history. The association between maladaptive behavior and SA is discussed in 247 the literature 4. Therefore, maladaptive behavior will also be used, in this research, to predict SA. This is also 248 supposed to help design customized interventions to improve SA behavior that considers different ASD 249 phenotypes. 250

Short-term SA prediction 251
Data preprocessing for short-term SA prediction 252 To predict short-term SA, the history of students' attendance and maladaptive behavior is first 253 modeled as a time series. Data transformation includes binary encoding of attendance time series (1: 254 attendance, 0: absence) and normalizing the time series of maladaptive behavior. Then, the data is 255 restructured to take the shape of supervised ML-like data using a rolling forecasting technique such that a 256 sequence of ( − ) past events are used to predict the future event ( ) at time ( ) where ( ) is the value 257 of the lag parameter. Thus, the entire time series of each student is partitioned into given labels of ( ) 258 sequences each of length ( ) as features in addition to ( ) events ( ), to be predicted. For validation 259 purposes, the data is split using three training-testing thresholds, as will be illustrated later. Other secondary 260 data cleaning steps are also accomplished. 261

LSTM algorithm 262
LSTM is a popular recurrent DL algorithm that is used for to mine the hidden patterns of sequential 263 data 23 . Many LSTM variations have been introduced to enhance its capability (e.g., diamond LSTM and 264 bidirectional LSTM) 23 . The LSTM areas of application are manifold, which include time series analysis, 265 natural language processing, and others 26 . In this research, LSTM will be used for the first time to predict 266 the SA behavior among students with autism. 267 In this research, the SA of each student is modeled as a time series. Unlike the typical forecasting 268 techniques (e.g., ARIMA and SARIMA), LSTM is known for its capability to learn the long-term 269 dependencies of sequential and temporal data 26 . For this reason, LSTM will be used in this research for 270 short-term SA modeling and prediction. It is worth mentioning that typical forecasting techniques (e.g., ARIMA) perform well on the seasonal and linear time series. However, they are less powerful in their 272 ability to capture the long-term dependencies of sequential data than DL (e.g., LSTM) 26 . 273 Opposite to the typical DL algorithms, the neurons at each hidden layer are replaced by memory 274 cells that work together with three types of gates: input, forget, and output gates. This characteristic enables 275 the LSTM algorithm to avoid the gradient vanishing problem. In this sense, LSTM is proven in the literature 276 for its superiority of learning and predicting/to learn and predict long sequential data 27 . 277 To fulfill the scope of this research, univariate and multivariate LSTM forecasting models are built. 278 The time series of students' attendance history are used for to train the univariate model as a single input. 279 However, the dataset of the multivariate model is enriched by adding the time series of students' 280 maladaptive behavior in addition to school attendance. Fig. 5 illustrates how the proposed model works. 281 LSTM algorithm with a rolling forecasting technique is employed in this research to predict future 282 SA. As with any DL algorithm, LSTM performance is a function of multiple architectural parameters (a.k.a 283 hyperparameters). Tuning these parameters is critical to optimize LSTM accuracy. Multiple optimization 284 algorithms have been introduced in the literature for this purpose 28 . Parameters optimality is beyond our 285 scope in this research because the main focus will be on the introduction of a new framework for SA 286 prediction for students with ASD. 287 SA prediction is addressed as a forecasting problem in this research. Therefore, LSTM performance 288 is also a function of two main forecasting parameters: lag and lead. While lag refers to the amount of history 289 needed to predict the next future event, the lead parameter's value represents the number of future events 290 that could be predicted at once using the given lag value. Table 3 summarizes all the LSTM hyper-291 parameters values, which include the forecasting lag/lead values, adopted in this research. 292 Three training-testing split settings are employed for better model validation. Each of these settings 293 is embedded with a rolling forecasting technique that trains the LSTM model using different data portions. 294 In the same regard, accuracy, precision, and recall are adopted to evaluate the model's performance for each 295 of the validation settings. Accuracy reflects the model's overall prediction quality, while the two other metrics check the model's capability to predict the attendance events correctly. Fig. 4 shows the model 297 performance over different validation settings. 298

Long-term CA prediction 299
Data preprocessing 300 In long-term CA prediction, the main objective is to predict whether a particular student will be 301 chronically absent over the upcoming three months. This problem is handled as a pattern recognition 302 problem using MLP and RF algorithms. A combination of a 12-month attendance history and 15 individual 303 characteristics (e.g., medical restrictions, allergy restrictions, and atypicality score) have been used as 304 features. Binary encoding is used to model the monthly attendance history as a binary sequence in addition 305 to the individual binary characteristics (e.g., medication and allergy restrictions). Moreover, the individual 306 numerical features (e.g., age) are normalized. The future CA status is labeled as a binary sequential pattern. 307 For example, (100) means the student will be chronically absent in the second and third months. 308 Data balancing is necessary to avoid learning bias. Therefore, the input data is also balanced using 309 the standard oversampling technique. Different training-testing splitting thresholds are applied to validate 310 the model. This step will be discussed in detail later in this section. To further validate its robustness, we 311 applied our model to a hypnotized scenario where some students have a short history of school enrollment 312 (three months). The results show our framework's ability to predict CA even for recently enrolled students 313 with a relatively short CA history. 314

MLP and RF algorithms 315
In this research, long-term CA behavior is also predicted. The problem is formulated as a pattern 316 recognition problem. Each pattern represents the status of students' CA for three months ahead. MLP and 317 RF are two commonly used algorithms for pattern classification problems in the literature 27,28 . 318 MLP is one of the most common ANN with a broad spectrum of applications. It has a powerful 319 capability to approximate non-linear functions by learning the hidden complex patterns in large, complex, 320 and noisy 27 data. MLP architecture generally consists of one input and one output layer in addition to at 321 least one hidden layer. Inspired by the human brain structure, each layer includes multiple neurons that work as knowledge processing units. Neurons in each layer are connected to the other layers' neurons 323 through artificial links that hold some value of weights. The backpropagation algorithm is commonly used 324 to train MLP and optimize its weights such that the error function converges to its global or local minima. 325 RF is a state-of-the-art machine learning algorithm with outstanding prediction and feature 326 selection performance. RF works simply as an ensemble learning algorithm that aggregates 327 ( ) independent and deep tree predictors into one powerful final model. In this sense, RF has an 328 outstanding capability to learn complicated and irregular patterns 28 . In more detail, the FR algorithm trains 329 MLP and RF have been used to handle the long-term CA prediction as a pattern recognition 332 problem. We applied both algorithms considering two scenarios of twelve-and three-month long histories 333 of school attendance. These scenarios are hypothesized to investigate the robustness of the proposed 334 framework to predict CA for students with different attendance history lengths. The hyperparameters 335 optimization step is not considered as it is beyond the scope of this research. Table 3 summarized the model 336 parameters that are used for each algorithm. 337 To validate the adopted models' performance, we tested the results using different data splits to 338 train the models using different data portions. In addition, accuracy, recall, and precision metrics are also 339 used to investigate the quality of our predictions.  The model is reliable to be used for predicting ten school days ahead with 90% accuracy and 80% precision.