Computer-aided diagnosis of psychiatric distress in children and adolescents using deep interaction networks: the CASPIAN-IV Study

49 Early diagnosis of psychiatric disorders among children can reduce the risk of adverse psychosocial 50 outcomes in adulthood. We aimed to design a computer-aided screening tool to examine the association 51 between modifiable risk factors and psychiatric disorders in a developing country. Ten thousand three 52 hundred fifty students, aged 6–18 years from all Iran provinces, participated in this study. We used 53 feature discretization and encoding, stability selection, and regularized group method of data handling 54 (GMDH) to classify the comprehensive risk factors to depression (the prevalence of 20.1%), worriedness 55 (23.7%), and emotional problems (11.1%). Self-rated health was the most important feature. The selected 56 modifiable factors were eating breakfast, screentime, salty snack for depression, physical activity, salty 57 snack for worriedness, (abdominal) obesity, sweetened beverage, and sleep-hour for emotional problems 58 classification. The area under the ROC Curve (AUC) of the GMDH was 0.88 [CI 95%: 0.87-0.89], 0.79 59 [0.77-0.80], and 0.70 [0.68-0.72] for depression, worriedness, and emotional problem outcomes, 60 respectively. The GMDH provided a deep interaction network to introduce important features that 61 univariate modeling had not identified. It significantly outperformed the state-of-the-art (adjusted p <0.05; McNemar's test). It is thus a promising new psychiatric screening tool for children and adolescents.


67
Mental health and illness are a public health concern 1 . Nowadays, the prevalence of non-communicable 68 diseases (NCDs), such as mental distress, is rapidly increasing, and the prevention of their associated

75
Psychiatric distress is the most common mental health issue, affecting many children, and is considered 76 one of the leading causes of the global burden of disease 5 . The National Comprehensive Cancer Centre (NCCN) defines distress as an unpleasant emotional experience of a psychological problem, such as 78 depression, worriedness, and panic 6 . If children's psychological distress remains untreated, their 79 development is significantly influenced.

81
Children worldwide are affected by a similar mental disorder as adults 7 . Psychiatric distress in childhood 82 is related to an increased risk of harmful events, including drug addiction and poor educational 83 performance 8 . Mental disorders are common in the Eastern Mediterranean Region (EMR), including 84 Iran and neighboring countries, and are the leading cause of years of life lived with disability (YLDs).

85
In EMR, depression was accounted for the most Disability-Adjusted Life Years (DALYs), and 86 worriedness ranked second in 2013 9 . In summary, depression, and worriedness, two essential 87 components of psychiatric distress, are among the illness and disability leading causes in adolescents 10 .

89
Many studies have attempted to investigate the association between several factors and psychiatric 90 distress among children. Risk factors associated with such psychiatric distress appear to be modifiable, 91 at least partly, through the link between these characteristics and lifestyle factors. In general, the literature 92 review shows that the spread of psychiatric distress changes depending on various factors such as gender,

98
Few studies have been conducted on using data mining for psychiatric disorders diagnosis, and to the 99 best of our knowledge, none of them comprehensively considered various determinants and their 100 interactions 3,12,13 . Thus, this study aims to classify the risk factors associated with psychiatric distress 101 based on the demographic, lifestyle, socioeconomic status, and family history of diseases in a large 102 sample of children and adolescents. The Group Method of Data Handling (GMDH), proposed by enrolled subjects) were analyzed. Overall, the percentages of 6-10, 11-14-and 15-19-year-old age groups 117 were 33.7, 35.0, and 31.3, respectively, and 50.1% of the population was boys. The prevalence of having 118 worriedness, emotional problems, and depression were 23. 7%, 11.1%, and 20.1%, respectively. 119 Specifically, the prevalence of having a worriedness problem was 20.6% and 26.8%, in boys and girls, 120 respectively. 8.9% of boys and 13.2% of the girls suffered from emotional problems, while 18.5% of 121 boys and 21.7% of girls experienced depression. The distribution of demographic variables, family 122 history of diseases, and lifestyle factors was presented in different psychiatric groups (Tables 1 and 2).

123
The pairwise association between the input features and the outcome variables was shown in Figure 1.

125
Tables 1 and 2 about here. relationship between poor SRH, depressive and anxiety symptoms among university students with high 155 academic stress 18 .

156
Moreover, a longitudinal study of adolescent health in the United States demonstrated that one 157 of the main factors associated with persistent depressive symptoms was poor self-rated general health 19 .

158
Although there was no significant correlation between SRH and depression in our dataset (Rank Biserial 159 rrb=0.002; p=0.819) (Figure 1), its interactions with screentime, diet, and breakfast were selected by the 160 GMDH network (Figure 2). It is how the interaction network identifies hidden factors. However, the 161 univariate analysis could not identify this factor. It was similar for emotional problems, where there was 162 no significant correlation between SRH and emotional problems in our dataset (Rank Biserial rrb=0.008; 163 p=0.390) ( Figure 1); its interactions with milk type, abdominal obesity, and beverage consumption were 164 selected by the GMDH network ( Figure 3).

166
One of the most important risk factors for children, the physical activity level, was selected for 167 those suffering from worriedness in our study ( Figure 4). The beneficial effects of regular physical 168 activity on health are indisputable in modern medicine 20 . Furthermore, a large amount of exercise plays 169 an essential role in minimizing the worry in clinical settings 21 . Although the correlation between physical 170 activity and worriedness was very low in our study (Rank Biserial rrb=-0.087; p=<0.001) (Figure 1), its 171 interactions with the others were selected by the GMDH network ( Figure 4).

173
In our study, sleep hour was selected as one of the factors for an emotional problem ( Figure 3).

174
It is in agreement with previous studies 22 . Adverse general health outcomes are associated with the 175 indicators of sleep problems, such as short sleep duration. Another study on 11,788 pupils from 11 176 different European countries showed the negative association between sleep time hours per night and 177 emotional problem 23 . Although the correlation between sleep hour and emotional problems was very 178 low in our study (Rank Biserial rrb=-0.080; p=<0.001) (Figure 1), its interaction with the milk type during 179 infancy was selected by the GMDH network ( Figure 4). It was shown in the literature that there is a 180 relationship between breastfeeding and sleep quality in infants 24 . Also, breastfeeding is related to 181 behavior disorders in children and adolescents 25 . However, there was no significant correlation between 182 sleep hour and milk type in our study (Rank Biserial rrb=0.005; p=0.634) (Figure 1).

184
One of the notable factors in our research was screen time (Figure 2). Some studies showed that 185 children who watch TV for more than two hours a day usually have lower self-esteem, lower school 186 performance, and unhealthy eating habits 26 . Such consequences would lead to psychological distress in 187 young children 27 . The consequences reported by these articles are in agreement with our findings of the 188 positive association between screen time and suffering from depression. Although the correlation 189 between screen time and depression was very low in our study (Rank Biserial rrb=0.056;p=<0.001) 190 ( Figure 1), its interactions with SRH and breakfast were selected by the GMDH network ( Figure 4).

192
A predictor of depression and worriedness was salty snack consumption (Figures 2,4). In 193 general, there were few studies in this field 28 . It is also demonstrated that 12 to 13-year-old Norwegian adolescents with healthy dietary patterns have better mental health condition 29 . Although the correlation 195 between salty snack consumption and depression or worriedness was very low in our study (depression: 196 Rank Biserial rrb=0.030; p=0.002, worriedness: Rank Biserial rrb=0.044; p<0.001) (Figure 1), its 197 interactions with SRH and breakfast, and with age category, SES category, and the physical activity were 198 selected for depression, and worriedness, respectively (Figures 2,4).

200
Breakfast is one of the most important meals. The prevalence of breakfast skipping is increasing 201 among adolescents. Previous studies showed that breakfast intake is related to mental problems 30 .

202
Another study showed that skipping breakfast at least four times a week was significantly associated 203 with a higher depressed mood score 31 . Our finding is consistent with such results on the association of 204 breakfast consumption with depression ( Figure 2). Although the correlation between breakfast 205 consumption and depression was relatively low in our study (Rank Biserial rrb=0.107; p<0.001) ( Figure   206 1), its interactions with the others were selected ( Figure 2).

208
In our study, the GMDH network was used as a classifier. This network is incremental and 209 expands with regularized least squares (RLS), a convex algorithm, but it also generates the interaction  classifications. The (false alarm) FA of the proposed system ranged from 3% to 31% when classifying 217 depression and emotional problems. However, the proposed system's statistical power was always higher 218 than 70% in the entire outcomes. Moreover, false discover rate (a.k.a., 1-precision), ranged from 13% to 219 78%, for classifying depression, and emotional problem.

221
The proposed GMDH network had the best and worst classification performance (MCC) for 222 depression, and emotional problems, respectively (Table 4). The dataset was highly imbalanced for 223 emotional problem outcome (the prevalence of 11.1%). However, the entire performance indices were 224 consistent in different test folds (Table 3). It must be mentioned that the accuracy of diagnostic 225 classification in mental disorder studies is not usually high 32 . In the meanwhile, there could be two 226 reasons why the GMDH significantly outperformed the MLP classifier. First, the MLP is a fully 227 connected network, while the GMDH is not (Figures 2-4), resulting in more parameters in the MLP.

228
Second, the cost function of the GMDH was customized for the imbalanced data, while the cross-entropy 229 was used for the MLP that could be improved by using weighted cross-entropy 33 .

231
The advantage of our study is the large sample size. Moreover, it analyzed the comprehensive factors 232 related to mental illnesses to design a screening program to monitor direct and indirect modifiable risk 233 factors. However, it is a cross-sectional study, and no casualty can be inferred. The other limitation was the possible bias in self-reported answers of participants. In conclusion, our study emphasized the 235 modifiable risk factors of psychiatric distress, including breakfast, salty-snack, sweet beverage 236 consumption, consumption, screentime, (abdominal) obesity, sleep hour, and physical activity.

237
Implementing the proposed screen tool could help its early diagnosis and monitoring in children and 238 adolescents.

242
A large population of the fourth study of a national surveillance program, entitled "Childhood and

243
Adolescence Surveillance and Prevention of Adult Non-communicable disease" (CASPIAN-IV) were 244 analyzed in our project. Detailed methodology is published elsewhere 34 . We briefly describe the study 245 population.

247
The population and sampling method

249
A sample of 14,880 students aged 6-18 years were selected by multi-stage sampling from schools of

259
Outcome variables

261
The World Health Organization-Global School-based Student Health Survey (WHO-GSHS) was used in 262 our study. It covers alcohol, tobacco use, hygiene, physical activity, mental health, dietary behaviors, 263 violence, protective factors, and unintentional injuries among children and youths. After translating 264 questions into Persian and simplifying the hard questions, the questionnaire's reliability and validity were 265 assessed 35 . We considered psychiatric distress as the depression, worriedness, and emotional problems,

266
where the later one included confusion, insomnia, anxiety, angriness, and worthlessness 36 . The three 267 indicated psychiatric distress outcomes were taken into consideration in this study. The psychiatric 268 distress components were assessed by the questions presented in the Supplementary Table S1. The first 269 five questions were used to identify emotional problems in our research. Those who experienced at least 270 3 out of 5 problems, every day, more than once a week, or once a week were defined as "adolescents 271 with emotional problems." An indicator of depression was a positive answer to the 6 th question. In the 272 last question, students who were worried most of the time or always so that they could not sleep at night, were considered worried 37 . Thus, our goal is to design and implement three binary classifiers to diagnose 274 depression, emotional problems, and worriedness using the following input variables.

276
Input variables

278
The input variables considered in our psychiatric distress classification system are as follows 38

286
Measurements 287 288 In this study, age was categorized as 6-10, 11-14, and 15-19 years 39 . The screen time was considered as 289 a categorical variable and consisted of the time spent on watching television (TV)/video and computer 290 games during leisure time, less than or equal to 4 (≤ 4h) defined as low, and greater than 4 (>4h), as high habit was considered using tobacco products (cigarettes, pipe, hookah, etc.) every day, while the passive 299 smoking was considered, as exposure to tobacco smoke was used by others or second-hand smokers 36 .

300
Subjects with either passive or active smoking were considered as smokers and non-smokers; otherwise.

301
The general state of participant's health was determined by the self-rated health (SRH) variable, asking 302 "How would you describe your general state of health?" on the GSHS questionnaire, with the categories 303 of "good," "moderate," and "bad" 37 . Life satisfaction was evaluated by asking questions about the degree 304 of satisfaction with their life, using a tenth-point scale from 10 = very satisfied to 1= very dissatisfied.

305
The scores below 6, was signified low and high satisfaction, otherwise 42 . Body image was assessed using 306 the question, "What do you think regarding your body size?"; the answer to this question was obtained 307 with the following options "much too fat," '" a bit too fat," "about the right size," "a bit too thin," " much 308 too thin." For the analysis, the variable was divided into overweight (much too fat and a bit too fat), 309 underweight (much too thin and a bit too thin), versus normal weight cognition 43 . Breakfast consumption 310 was categorized into three groups as non-skipper (those eating breakfast 5-7 days a week, semi-skipper 311 (those eating breakfast 3-4 days a week), and skipper (those eating breakfast 0-2 days a week) 44 . The

312
students were asked about the frequency of salty snack consumption, categorized as "seldom or never," 313 "weekly," and "daily" consumption 45 . The family size was categorized as "less than or equal to 4" or "greater than 4". The number of close friends was categorized as nothing, one, two, three, or more. The 315 nutrition plan was assessed as "adherence to a weight-modifying plan based on a special diet" or not, 316 otherwise 43 . Sugar-sweetened beverage consumption (i.e., soda, soft drinks) was categorized as "daily," 317 "weekly," "seldom or never" 45 . The consumption of fast foods (pizza, fried chicken, cheeseburgers, 318 hamburgers, and hot dogs) was categorized into three groups: daily, Weekly, seldom or never. The 319 education level of mothers was categorized into three groups: Illiterate, diploma, and university degrees.

320
Participants' birth weight (BW; g) was asked from their parents and then categorized into three groups; 321 low (BW <2,500 g), normal (BW: 2,500-4,000 g), and high (BW >4,000 g) 46 . We also assessed whether 322 breastfeeding was done for the children and adolescents during their infancy 47 , and the variable milk 323 type was categorized as breast milk (1) and others (0) otherwise. Moreover, we considered the family 324 history of sudden death (yes or no) and also the family history of cancer (yes or no) of the first-degree 325 relatives of the subjects enrolled in the study.

329
In our study, trained healthcare providers performed anthropometric measurements at school. All 330 measurements were conducted with calibrated instruments, according to standard protocols 36 . Height 331 was measured in the standing position, barefooted while shoulders touch the wall. It was recorded to the 332 nearest 0.2 cm. We measured weight shoeless, and in lightly dressed condition to the nearest 200 g. Waist 333 circumference (WC) was measured by a non-elastic tape to the nearest 0.2 cm.

335
We calculated the BMI as weight in kilograms, divided by height in meters squared (m 2 ). The subjects 336 were classified as underweight, healthy weight, overweight or obese, if BMI was <5 th percentile, between 337 5 th -85 th percentiles, higher than 85 th percentiles (i.e., BMI categories), respectively 48 . Abdominal obesity 338 was defined as WC to height ratio (WHtR) of more than 0.5 49 .

340
Feature extraction

342
The interval variables (e.g., age, BMI, birth weight, family size, number of close friends, and sleep and

354
Twenty-five input variables were used in GMDH (i.e., layer zero), while the outcomes depression, 355 worriedness, and emotional problems were separately used as outputs. At the first layer, each pairwise 356 interaction of the inputs was considered as a neuron. Suppose that the pair xi,j, and xi,k (features no. k and 357 j from the subject no. i) are combined to generate the estimated outcome ̃ at the first layer using the 358 second-order polynomial model shown in equation (1).

375
At the first layer, each neuron's RLS coefficients were estimated on the estimation set using the above 376 procedure. Each neuron's performance was then assessed on the validation set, and the corresponding

377
MCC values were calculated. The top 10 neurons with better performance than the previous layer's 378 neurons were selected at maximum, and their pairwise interactions were analyzed at the next layer. The

379
network is built up layer by layer during training until the stopping criterion based on the "early-stopping" 380 strategy is achieved. Whenever the validation set's performance is reduced at the next layer, the output 381 of the current layer's best neuron was selected as the output of the entire GMDH network.

383
Comparison with the state-of-the-art

395
The validation framework

397
In our study, 3-fold cross-validation with stratified sampling 56 was used.

407
Statistical analysis

409
All variables were reported, as the frequency and percentage, since they were categorical. The χ 2 analysis

420
The datasets generated during and/or analyzed during the current study are available from the 421 corresponding author on reasonable request.       Note. N number of people who are in each category.