In this study, we present the utility of deep learning classification model using a mobile-based video data that predicts the presence of DD by distinguishing the facial characteristics between children with DD and those without by extracting 68 facial landmarks from the faces and generating derivatives such as head pose estimation (pitch, yaw, roll) and landmark point distance. The deep learning classification model using mobile-based video data predicted the presence of DD with an average accuracy of 88% and found that in the pitch (head nodding) variable, children with DD have a significantly wider distribution than those without. Through the model's interpretation process, we identified important predictive variables, including the pitch variables, which all showed statistically significant differences in the distribution between children with DD and those without.
Caregiver questionnaires including Quantitative Checklist for Autism in Toddlers (Q-CHAT), Autism Behavior Checklist (ABC), Ages & Stages Questionnaire (ASQ), and the Parents’ Evaluation of Developmental Status (PEDS) are the most commonly used screening methods for DD. Of them, Q-CHAT showed a sensitivity of 71–75% and a specificity of 63–65%.31,32 ABC was reported to have a sensitivity of 78.4%.33 PEDS, which consists of two open-ended questions and eight yes/no questions completed by parents, showed sensitivities of 78.9% and 54.9% in severe and moderate-to-severe delays, respectively, and a specificity of 79.6%.34 ASQ-3 showed sensitivities of 60.0% and 53.1% in severe and moderate-to-severe delays, respectively, and a specificity of 89.4%.35 Thus, in terms of the accuracy of detection, our classification model seems to have comparable performance (88%) compared with the existing methods for screening.
Several digital screening methods for DDs were suggested in previous studies. Most web-based developmental surveillance programs are trials of online versions of established questionnaires. The Web-Based Modified Checklist for Autism in Toddlers with Follow-up interview (M-CHAT/F) is a checklist scored by parents and implemented as a two-stage screening test, in which a positive result prompted follow-up interview to clarify or correct the failed items. When administered by primary care pediatricians, the web-based M-CHAT/F had a sensitivity of 59% and a specificity of 71%.36 In another study that used the digital M-CHAT-revised with Follow-up, accurate documentation in the electronic health record of screening results increased from 54–92% and appropriate action for children screening positive increased from 25–85% compared with the results from the paper form of the M-CHAT.37 In addition, the smartphone application PEDS operated by community healthcare workers was shown to have a close correspondence with the gold standard paper-based PEDS tools operated by health professionals.38 Most smartphone screening applications also focus on developing questionnaires answered by parents or medical professionals.39 ASDTests is an application that is based on the Autism-Spectrum Quotient and Quantitative Checklist for Autism in Toddlers and evaluates the possibility of having autistic traits.15 Cognoa is a mobile screening application that consists of both parental questionnaires and home video recording, and has a sensitivity of 75% and a specificity of 62%.13,39 These studies suggest that web-based or mobile-based screening tools could be reliably used for screening DD. Because web-based or mobile-based screening tools are quicker, cheaper, and more accessible, they could be helpful in improving the early identification of DD.
Some recent studies evaluated DD using digital observational methods via analyzing gazes, faces, or behaviors. Eye-tracking algorithms have shown progress in their potential use for screening ASD in rural areas.16,17 Vargas-Cuentas and colleagues16 recorded videos of participants watching social or non-social videos and analyzed the image frames from the video. Fujioka and colleagues17 used infrared light sources and cameras to record the eye position. In one study from Bangladesh, a machine learning classifier trained by data from ADOS and ADI-R was able to detect developmental delay and autism by analyzing the behavior portrayed in home videos, and showed a sensitivity and accuracy of 76%.40 Strobl and colleagues14 also developed a smartphone application in which the participants’ gaze was analyzed by an eye-tracking algorithm. These studies show that digital methods could be used for the screening of DD.
Our study showed that facial landmark analysis, among mobile-based methods, could play a significant role in the detection of DD. In previous studies examining head pose and facial expressions, Happy and Routray24 used the Facial Action Coding System (FACS), which classifies facial expressions using salient facial patches and showed a 94.14% of expression recognition accuracy. This study differs from ours in that the FACS extracts a maximum of 19 facial patches, while our kit extracts 68 face landmarks; in addition, they used facial expression databases using 329 images in total, while our study directly collected data from 89 children. Another study used a computer vision-based head tracking program (Zface) to demonstrate the differences between typically developing children and children with ASD.41 their results differ from our research because they were able to find the differences in the speed and quantity of head movement in yaw and roll but not in pitch. In another study, children with ASD and those with ADHD were differentiated with an accuracy of 94% via a Red-Green-Blue-Depth sensor from a depth measurement camera.42 this study is similar to our work in that there was a difference in facial expressions using FACS, but at the same time different from our results in that the study targeted adults aged 18 and older and there was a difference in head movements in yaw. While these studies are computer-based programs and require special-purpose equipment, our study used a mobile-based application and can thus be more convenient and easy to use. In one study, children watched movies on a smart tablet while the embedded camera recorded their facial expressions. Then, the computer vision analysis automatically tracked the facial landmarks and used them to classify the facial expressions into three types (Positive, Neutral, and Other) with a maximum sensitivity of 73%, with different results depending on the type of movie being shown; notably, children with ASD displayed neutral expressions more often than children without ASD.43 This study differs from ours in that we evaluated not only children with ASD but those with DD.
Based on our results, we carefully suggest that facial landmarks and head poses may be used as screening tools for children with DD. A recent study that quantified head movement dynamics (displacement and velocity) showed that children with ASD had greater head movement dynamics than those without ASD.41 Several papers hypothesized that turning away may be an adaptive strategy for individuals with ASD to regulate the overwhelming amount of information,44,45 which may explain the atypical head movement of individuals with ASD. Therefore, using facial landmarks as a method of screening could aid the early identification of children with DD.
There are several limitations to this study. First of all, we were unable to find significant differences in facial landmarks or head pose when children were shown social videos and non-social videos. Second, our study did not analyze the results of the subgroups of DD (i.e., ASD, ID, LD). Third, since children with incorrect data were excluded, the sample size is relatively small and thus has limited generalizability. Fourth, we do not know whether these findings are limited to certain age groups. Fifth, our study did not consider body motion information because we used videos that only recorded the children's faces.
Despite these caveats, our study evaluated the utility of digital methods, especially mobile-based methods, in the screening of DD in community-based preschool children. Our results provide preliminary evidence that the deep-learning classification model using mobile-based children’s video data could be used for the early detection of children with DD.