In this machine learning study, we differentiated help-seeking children and adolescents with ADHD from those not having the condition with an accuracy of 66.1% using real-life clinical data from hospital records. Excluding demographic features (age and gender) resulted in a comparable accuracy. An automated feature selection achieved the best performance using a combination of 19 most predictive features across attention and intelligence domains and symptom ratings. The accuracy might be further increased using datasets without missing data. Features describing the consistency of parent and teacher ratings ('consistency index') did not outperform conventional features. Our study suggests that ADHD can be identified using data from clinical records even in a mixed, help-seeking population of children and adolescents.
Machine learning studies require large amounts of data31 which may be challenging to collect by recruiting participants for a specific study but are readily available in clinical databases. Moreover, the results from experimental studies might not generalize to a clinical setting, where clinicians are commonly confronted with multiple/concurrent disorders and/or various potential differential diagnoses. Thus, we showed that SVM in combination with real-life, comprehensive clinical data could yield an above-chance classification accuracy and detect individuals with ADHD among those having none or different condition(s).
To our knowledge, the highest achieved accuracy in studies of ADHD patients and healthy individuals were 89.5%8. Although we used more features than this study, the resulting accuracy was lower. This might be because many help-seeking individuals in our sample received other diagnoses associated with symptoms that may mimic ADHD (such as attention deficits in depression, increased activity in tic disorders, etc.). Thus, the two groups (ADHD vs. "something else") are not as clearly differentiated from each other as it would be the case when distinguishing between individuals with confirmed ADHD diagnoses and those not showing any symptoms at all. Previous studies aiming at distinguishing more than one disorder from typically developing controls reported lower classification accuracies than studies aiming at classifying typically developing individuals and patients with one condition36.
Age and gender were shown to be useful for diagnostic and prognostic tools based on machine learning in previous studies37,38. This was also the case in the current study. In this study, instead of identifying physiological patterns typical of ADHD, we aimed to train a classifier to identify ADHD based on data available from medical records. As typical age and gender distributions of ADHD may naturally be reflected in this data structure, which may constitute a sampling bias, conducting a second analysis without these features was essential. This analysis without age and gender still revealed a significant classification accuracy, demonstrating that the neuropsychological features and ADHD-specific ratings on their own are sufficient to identify ADHD in a mixed patient sample.
Previous studies have opted not to include clinical ratings in the analysis to avoid possible subjective biases8. We addressed this issue by using the consistency index above, which did not outperform conventional ADHD-specific features like parent/teacher-rated symptoms. The automatic feature selection also only emphasized a rather unspecific symptoms like peer relations, aggression, and teacher negative impression bias. These results suggest that clinical ratings capturing broader ADHD-related behavioral irregularities (i.e., not simply pertaining to ADHD core symptoms) as reported by different sources using the Conners-3 questionnaire are informative when aiming to identify ADHD amongst a help-seeking clinical population. This may reflect the notion that the rather qualitative "clinical impression" of ADHD plays a significant role in the diagnostic process39,39. Similarly, this may also be interpreted as showing that a rather broad functional impairment associated with ADHD symptoms (in regards to social interactions, for example) is indicative of diagnostic classification in the clinical setting. This issue could be examined further by including clinician rating scales or those capturing the degree of functional impairment40.
Among the neuropsychological measures, the total IQ score did not rank among the most predictive features. Previous machine learning studies suggesting IQ to be a predictive feature5,6 included IQ scores as part of an overall "phenotypic" feature that also contained aspects like age and gender, making a specific interpretation impossible. In addition, these studies only focused on the distinction between individuals with ADHD and typically developing controls, thus reducing the validity of the results for clinical practice. It is the goal to distinguish ADHD from disorders or norm variants of behavior mimicking ADHD symptoms. Interestingly, the processing speed subscale ranked highest of all IQ-related features in the single feature classification. This may reflect the previously reported relevance of this aspect of neuropsychological processing16 when comparing individuals with ADHD and healthy controls. Within the automatic feature selection, reaction time variability and accuracy in tests capturing tonic/phasic alertness and inhibition ranked numerically higher than mean reaction times. While a previous study suggested that objective neuropsychological measures considerably underscored rating scales in distinguishing ADHD from healthy participants8, our results show that these scores in general indeed contribute to classification when identifying individuals with ADHD in a mixed help-seeking population. This supports the notion that objective measures like those employed in the current context are indeed important elements of the diagnostic process of ADHD as has been suggested previously41.
This study has the following limitations. First, we could not include broader clinical measures such as the CBCL as possible features due to too many missing values. These measures might have provided more specific information on differences between diagnostic entities. Similarly, father ratings also needed to be excluded due to missing data (although father ratings were included in the consistency index where possible). Retraining the classifier without missing data achieved a further increase in the classification accuracy. This suggests, that an effort to simplify the diagnostic process in order to reduce the probability of missing data might increase the performance of automated classifiers. Second, although we tested generalizability indirectly using the permutation test, an independent validation sample would provide more precise information on the generalizability of our classifier. Third, the relative importance of single features needs to be interpreted carefully while considering the low classification accuracy differences between the features and the relatively high standard deviations of the achieved accuracies. Although our results suggest that some features might be superior to others, we cannot conclude that there are single outstanding features in our sample that distinguish individuals with a definite ADHD diagnosis from those with another or no psychiatric diagnosis. Overall, a further increase in classification performance might be achieved by using larger samples with more complete data on all clinically relevant features rather than adding new ones. Our results do not provide full implications for exclusion and/or prioritization of specific clinical ratings in future studies.