Our study aims to enhance the diagnostic precision for Autism Spectrum Disorder (ASD) through the integration of advanced computational methodologies applied to the Quantitative Checklist for Autism in Toddlers (QCHAT) datasets from Kaggle. This section delineates the experimental protocols, ensuring reproducibility and clarity in describing the methods utilized for text data extraction and analysis, ASD vs. TD (Typical Development) classification, and topic modeling. As illustrated in Fig. 1, the core of our approach lies in a six-step workflow leveraging deep learning techniques.
Data Acquisition
The QCHAT is a pivotal tool in the early identification of ASD, comprising questions designed to capture the broad range of ASD behaviours. Its structured format and emphasis on quantifiable behaviours make it an invaluable resource for machine learning applications, allowing for the extraction and analysis of patterns that may indicate ASD presence. The selection of QCHAT datasets from Kaggle was driven by their relevance and potential to improve diagnostic processes through deep learning techniques. Sources: Utilized two Kaggle ASD QCHAT datasets, one focusing on toddlers ([https://www.kaggle.com/datasets/fabdelja/autism-screening-for-toddlers]) and the other on children([https://www.kaggle.com/datasets/uppulurimadhuri/dataset]). To better understand the specific datasets used, please refer to Table 1 and Table 2. Table 1 presents a comparative analysis of demographic and clinical characteristics for toddlers, focusing on the ASD group (n=728) and the TD group (n=326). Table 2 provides a similar analysis for children, comparing the ASD group (n=1,074) and the TD group (n=911) across various demographic and clinical variables. By examining these characteristics in the tables, we gain a deeper understanding of the data and the populations it represents. These datasets were chosen for their relevance and potential in enhancing ASD diagnostic processes through deep learning.
Table 1. QCHAT data1 for Toddler. This table presents a comparative analysis between the Autism Spectrum Disorder (ASD) group (n=728) and the Typically Developing (TD) group (n=326) across various demographic and clinical variables. The variables include age (months for toddlers), sex, ethnicity (categorized into nine groups), and family history with ASD. The age groups are categorized based on developmental stages: 12-18 months as the early developmental stage, 19-24 months as the middle developmental stage, 25-30 months as the late developmental stage, and 31-36 months as the transition stage. Data are presented as n counts (n) and percentages (%). Statistical analyses were performed using Chi-square tests for categorical variables to compare the distribution of age, sex, ethnicity, and family history of ASD between the ASD and TD groups. The p-values for age, sex, and ethnicity variables indicate statistically significant differences between the groups, with p < 0.001 suggesting strong evidence against the null hypothesis of no difference. For the family history of ASD, the p-value of 0.728 surpasses the alpha threshold (0.05), indicating no significant difference between the groups.
Variable
|
ASD Group n = 728
|
TD Group n = 326
|
p-value
|
Age Months, n (%)
|
12-18
|
99 (13.6)
|
77 (23.62)
|
< 0.001
|
19-24
|
137 (18.82)
|
43 (13.19)
|
25-30
|
164 (22.53)
|
54 (16.56)
|
31-36
|
328 (45.05)
|
152 (46.63)
|
Sex, n (%)
|
Male
|
534 (73.35)
|
201 (61.66)
|
< 0.001
|
Female
|
194 (26.65)
|
125 (38.34)
|
Ethnicity, n (%)
|
African
|
39 (5.36)
|
14 (4.29)
|
< 0.001
|
Hispanic
|
30 (4.12)
|
10 (3.07)
|
Latino
|
20 (2.75)
|
6 (1.84)
|
Middle Eastern
|
96 (13.19)
|
92 (28.22)
|
Other Asians
|
212 (29.12)
|
87 (26.69)
|
Others
|
34 (4.67)
|
9 (2.76)
|
Pacifica
|
7 (0.96)
|
1 (0.31)
|
South Asian
|
40 (5.49)
|
23 (7.06)
|
White European
|
250 (34.34)
|
84 (25.77)
|
Family History with ASD, n (%)
|
No
|
613 (84.2)
|
271 (83.13)
|
0.728
|
Yes
|
115 (15.8)
|
55 (16.87)
|
Table 2. QCHAT data2 for Children. This table presents a comparative analysis between the Autism Spectrum Disorder (ASD) group (n=1,074) and the Typically Developing (TD) group (n=911) across various demographic and clinical variables for children. The variables include age (in years, ranging from 0 to 18, segmented into five developmental stages: 0-2 years as the infancy stage, 2-5 years as early childhood, 6-8 years as middle childhood, 9-12 years as late childhood, and 13-18 years as adolescence), sex, ethnicity (categorized into nine groups), and family history with ASD. Data are presented as n counts (n) and percentages (%). Statistical analyses were performed using Chi-square tests for categorical variables to compare the distribution of age, sex, ethnicity, and family history of ASD between the ASD and TD groups. The p-values for age, sex, ethnicity, and family history variables are provided, indicating the level of statistical significance. Specifically, the p-value for age groups shows that there is not a statistically significant difference between the ASD and TD groups in the age distribution with a p-value of 0.076, suggesting no strong evidence against the null hypothesis for age. However, the analyses for sex, ethnicity, and family history of ASD show statistically significant differences with p < 0.001, indicating strong evidence against the null hypothesis of no difference in these variables between the groups. The Alpha threshold set at 0.05 was used to determine statistical significance, with p-values below this threshold indicating significant differences.
Variable
|
ASD Group n = 1,074
|
TD Group n = 911
|
p-value
|
Age Months, n (%)
|
0-2
|
20 (1.86)
|
28 (3.07)
|
0.076
|
2-5
|
224 (20.86)
|
152 (16.68)
|
6-8
|
275 (25.61)
|
236 (25.91)
|
9-12
|
147 (13.69)
|
124 (13.61)
|
13-18
|
408 (37.99)
|
371 (40.72)
|
Sex, n (%)
|
Male
|
963 (89.66)
|
484 (53.13)
|
< 0.001
|
Female
|
111 (10.34)
|
427 (46.87)
|
Ethnicity, n (%)
|
African
|
39 (3.63)
|
14 (1.54)
|
< 0.001
|
Hispanic
|
30 (2.79)
|
10 (1.1)
|
Latino
|
20 (1.86)
|
6 (0.66)
|
Middle Eastern
|
96 (8.94)
|
307 (33.7)
|
Other Asians
|
343 (31.94)
|
262 (28.76)
|
Others
|
34 (3.17)
|
9 (0.99)
|
Pacifica
|
7 (0.65)
|
1 (0.11)
|
South Asian
|
40 (3.72)
|
218 (23.93)
|
White European
|
465 (43.3)
|
84 (9.22)
|
Family History with ASD, n (%)
|
No
|
590 (54.93)
|
740 (81.23)
|
< 0.001
|
Yes
|
484 (45.07)
|
171 (18.77)
|
Text Data Extraction and Analysis
-
Sentence Transformer Mapping: Employed sentence transformers (e.g., 'all-MiniLM-L6-v2') to map each QCHAT questionnaire item to ASD-specific terms, using cosine similarity for precision. This mapping was based on a comprehensive set of 3,336 ASD-related terms identified by Zhao et al. (2022) [4], highlighting our commitment to leveraging detailed ontological insights in ASD diagnostics.
-
Expert Review and Selection: Each questionnaire item's top ASD term mappings, determined by highest cosine similarity, were reviewed by an ASD clinical expert. This ensured the most accurate term was selected for each item, significantly enhancing the dataset's quality for subsequent analysis. Supplementary Table S1 shows Predefined ASD terms mapped to Q-CHAT
ASD vs. TD Classification
-
Fine-Tuning Process: RoBERTa models were fine-tuned on the Kaggle QCHAT dataset (toddlers) using specific hyperparameter adjustments, including learning rate, batch size, and number of training epochs, to optimize performance for ASD-related language patterns. Fine-tuning adapted these models to the nuances of ASD diagnostic language and questionnaire responses, involving adjustments to better align them with the ASD diagnostic context and tailor them to recognize ASD-related patterns in toddler evaluation data. The model, initialized for a sequence classification task with two classes, incorporated environment variables for optional GPU usage. We employed the following training hyperparameters: 5 training epochs, batch sizes (16 for training, 8 for evaluation), and a learning rate of 2e-5. The rationale behind these choices was informed by preliminary experiments that identified configurations yielding the highest classification accuracy.
-
Transfer Learning: We employed a transfer learning approach to leverage the knowledge acquired during pretraining and optimize performance on the second ASD-related dataset. A pretrained RoBERTa model was fine-tuned on the second Kaggle dataset. The following hyperparameters were used during fine-tuning and these hyperparameters were carefully selected to facilitate the model's adaptation to the target dataset while mitigating overfitting: Batch sizes: 8 (training and evaluation), Learning rate: 3e-5, Regularization: Weight decay (1e-8)
-
Model Application: The fine-tuned RoBERTa models were subsequently applied to the second Kaggle QCHAT dataset (children) for ASD vs. Typically Developing (TD) classification. The refinement process enhanced the models' ability to discern nuanced differences in ASD-related responses, aiming to improve classification performance on this new dataset.
Topic Modeling
The choice of the Latent Dirichlet Allocation (LDA) algorithm for topic modeling was based on its efficacy in uncovering hidden thematic structures within large text corpora. To best reflect the complexity and nuances of ASD behavioural patterns in the dataset, we carefully tuned LDA hyperparameters through a combination of grid search and expert judgment. This process aimed to extract coherent and interpretable topics that offer insights into the behavioural dimensions of ASD. A perplexity score of -3.46 and a coherence score of 0.79 guided selecting 5 as the optimal number of topics.
Model Performance Evaluation
-
Performance Measure: We carefully selected evaluation metrics to capture the multifaceted nature of ASD diagnostic models. AUROC assessed the model's discrimination ability across thresholds, while the confusion matrix, F1 score, precision, and recall provided detailed insights into accuracy, sensitivity, and specificity. Together, these metrics offer a robust evaluation framework.
-
Model Validation Techniques: To validate our models, we employed a common technique of splitting our dataset into training, validation, and testing sets (as detailed in Table 3). The training set was used to adjust the models' weights, the validation set facilitated hyperparameter tuning and prevented overfitting, and the testing set served as the final benchmark for model performance on unseen data. This tripartite data split ensured a rigorous validation process, bolstering the credibility of our findings and demonstrating our commitment to developing a model with practical diagnostic potential.
Python 3.9 was used for the analysis. Some of the partial results of the pipeline can be reached in a public repository (https://github.com/skwgbobf/ASD_Kaggle.git)
Table 3. Comparative Performance of BERT, RoBERTa, and Model 1 in Classifying ASD versus TD Children Across Multiple Dataset Classification of Autism Spectrum Disorder (ASD) versus Typically Developing (TD) children using BERT, RoBERTa, and Model 1, fine-tuned on Toddler Data (Data1) and applied through transfer learning on Children Data (Data2). The dataset is summarized using descriptive statistics, with data presented as counts (n) and percentages (%) to indicate the distribution across Test, Train, and Validation datasets for each classification model.
Condition
|
ASD (1) Group
|
TD (0) Group
|
ASD vs TD Classification using BERT fine-tuned on Data1 (Toddler Data), n (%)
|
Test Data
|
70 (66.04)
|
36 (33.96)
|
Train Data
|
579 (68.68)
|
264 (31.32)
|
Validation Data
|
79 (75.24)
|
26 (24.76)
|
ASD vs. TD Classification using RoBERTa fine-tuned on Data1 (Toddler Data), n (%)
|
Test Data
|
219 (69.09)
|
98 (30.91)
|
Train Data
|
409 (69.44)
|
180 (30.56)
|
Validation Data
|
100 (67.57)
|
48 (32.43)
|
ASD vs. TD Classification using Model 1, fine-tuned and applied as transfer learning on Data2 (Children Data)
|
Test Data
|
322 (49.54)
|
274 (42.15)
|
Train Data
|
607 (54.64)
|
504 (45.36)
|
Validation Data
|
145 (52.16)
|
133 (47.84)
|