Participants and procedures
Participants were 70,777 schoolchildren and adolescents age 8 to 18 years. They completed online questionnaires—among which the RCADS-25—as part of a screening program for health and socio-emotional problems from September 2017 to August 2018 at 345 schools. These schools voluntarily registered for the screening program and were located in urban and rural areas throughout the Netherlands.
Additionally, a convenience sample of the above-mentioned schools was approached for extra data collection procedures by youth health care professionals, who carried out the screening program at the schools. The selection of schools was based on their screening program planning and with the aim of including a variation of urban and rural schools in this study. If schools granted permission, informed consent and assent for the data collection procedures were sent to parents and children or adolescents.
The additional data collection procedures varied per school, class, age, and level of education, aiming to obtain maximum variation in the various subsamples for examining different psychometric properties. To examine test-retest reliability, 277 participants completed the RCADS-25 a second time under the same test conditions two to four weeks after completing the RCADS-25 for the first time during the usual screening. To examine criterion validity, 110 other participants had a semi-structured interview after the usual screening (i.e., the Schedule for Affective Disorders and Schizophrenia for School-Age Children Present and Lifetime Version [K-SADS-PL] [21, 22]). Within this group of interviewees, maximum variation was aimed at in scores below, between, and above the 90th and 95th percentiles of the RCADS-25 subscales in order to avoid spectrum bias. To examine hypotheses for construct validity, an other 545 participants completed an extra questionnaire during the usual screening: 269 participants completed the SCARED-NL [23-25] and 276 participants completed the CDI-2 [11, 26, 27]. All participants received a small gift (€1-5 in value); participating schools received a gift voucher of €25-50 depending on the number of participating classes.
Revised Child Anxiety and Depression Scale - short versions (RCADS-25, RCADS-20)
The RCADS-25  is a short self-report questionnaire for children and adolescents age 8 to 18 years that measures broad anxiety through 15 items and MDD through ten items in accordance with the DSM-IV descriptions of anxiety disorders and MDD. All 15 anxiety items and five out of ten MDD items were used to examine the psychometric properties of the RCADS-20 . Items are scored on a four-point Likert scale: 0 (never), 1 (sometimes), 2 (often), and 3 (always), resulting in a range of total scores from 0 - 45 for the broad anxiety scale, from 0 - 30 for the RCADS-25 MDD scale with ten items (MDD-10), and from 0 - 15 for the RCADS-20 MDD scale with five items (MDD-5); higher total scores indicate a more severe level of anxiety or MDD.
Schedule for Affective Disorders and Schizophrenia for School-Age Children Present and Lifetime Version (K-SADS-PL)
The K-SADS-PL [21, 22] is a semi-structured interview in which symptoms of DSM-IV diagnoses are assessed in children and adolescents age 6 to 18 years. The K-SADS-PL starts with an unstructured introductory interview, followed by a screen interview, and ends with five supplementary interviews that may or may not be conducted depending on the scores in the screening interview. Symptoms are scored by presence and severity on a 0 to 3 rating scale: 0 indicates no information is available, 1 indicates a symptom is not present, 2 indicates a subthreshold presence of a symptom, and 3 indicates a threshold presence of a symptom. Previous studies have found a sufficient inter-rater agreement of the screening interview (99.7%) and of assigning present diagnoses (98%) , a good test-retest reliability (kappa = 0.76 - 0.90 for MDD; kappa = 0.80 - 0.84 for any anxiety disorder [21, 28]), and a moderate to sufficient convergent validity (r = 0.45 - 0.47 for MDD, p < 0.01 ; t = 4.52, p < 0.01 for any anxiety disorder ).
In the present study, three trained interviewers conducted the screening interview and, regardless of the scores on the screening interview, the supplementary interviews of the current episodes of the panic disorder, separation anxiety disorder, avoidant disorder/social phobia, overanxious/generalized anxiety, obsessive-compulsive, and depressive disorders, while being blinded to the RCADS-25 scores. Interviewers had a Bachelor’s or Master’s degree in Psychology or Psychiatric nursing, and they regularly held peer meetings to give feedback and to promote uniformity in the administration of the interviews. All interviews were audio-recorded; five randomly selected recordings per interviewer were scored a second time by an external fourth trained interviewer to determine the inter-rater agreement. The observed inter-rater agreement of the screening interview ranged from 87% - 100% for the anxiety disorders and was 87% for MDD. The observed inter-rater agreement of derived diagnoses ranged from 87% - 100% for the anxiety disorders and was 100% for the MDD diagnosis.
Screen for Child Anxiety Related Emotional Disorders-Dutch Version (SCARED-NL)
The SCARED-NL  is a self-report questionnaire for children and adolescents age 7 to 19 years that measures childhood anxiety disorders in conformamce with the DSM-IV-TR (i.e., separation anxiety, panic disorder, specific phobia, social phobia, obsessive-compulsive disorder, posttraumatic and acute stress disorder and generalized anxiety disorder) through 69 items. Respondents are asked to indicate how often they experience the described situation: “never or almost never,” “sometimes,” or “often,” which are scored as 0, 1, or 2. By adding up these item scores, a total score is calculated that ranges from 0 - 138; higher total scores indicate a more severe level of anxiety. Concerning these total scores, previous studies have demonstrated a sufficient test-retest reliability (ICC = 0.8)  and a moderate to good convergent validity (r = 0.67 - 0.88) [31-34]. In the present study, alpha was 0.96 for the total score.
Children’s Depression Inventory-2 (CDI-2)
The Dutch translation of the revised CDI [26, 27], which is a self-report screening instrument for a depressive syndrome in children and adolescents age 8 to 21 years, was used. It consists of 28 items, with each item presenting three sentences that describe different severity levels of a symptom; respondents are asked to report which sentence describes their situation best. The severity levels per item are scored as 0, 1, and 2, which can be added up to a total score that ranges from 0 – 56; a higher total score indicates a more severe level of a depressive syndrome. Previous research findings regarding the psychometric properties of the total score have revealed moderate to sufficient results in a general population concerning the test-retest reliability (r = 0.60) and convergent validity (r = 0.77) . In the present study, alpha was 0.85 for the total score.
Missing data and selection bias
Missing data for RCADS-25 scores were handled in accordance with the RCADS-25 Child Version Scoring Program 3.1 . This program prescribes mean replacement when there are three or fewer missing items on the broad anxiety scale and two or fewer missing items on the MDD-10 scale. Concerning the MDD-5 scale, mean replacement was performed when there was one missing item. Missing data for the SCARED-NL and the CDI-2 were handled the same way: mean replacement was performed when there was no more than one missing value per five items per subscale. Cases with more than the allowable missing items were excluded from analyses.
Potential selection bias in the additional data collection procedures was examined through multilevel logistic regression analyses. These analyses were adjusted for school and class by means of a random intercepts model. Odds ratios were calculated for children and adolescents who did and did not complete the additional data collection procedures on the one hand, and gender, age group (i.e., 8-12 years and 13-18 years), and scores below and above the 90th percentile of the RCADS-25 scales on the other, since these scale scores were skewed to the right. This is in line with current practice, as children and adolescents scoring above the 90th percentile are invited for further investigation by school nurses and physicians. The 90th percentiles of the RCADS-25 broad anxiety and MDD-10 scale were determined in a national representative sample with respect to gender, age, region, ethnicity, household size, and social class, based on Statistics Netherlands data from 2017 (see Supplementary file 1) .
Construct validity (structural validity and hypotheses testing), reliability (internal consistency and test-retest reliability), and criterion validity were assessed for the RCADS-25/RCADS-20 broad anxiety scale, RCADS-25 MDD-10 scale, and RCADS-20 MDD-5 scale separately, since anxiety and MDD are considered to be different constructs in the DSM.
Construct validity: structural validity
Confirmatory factor analyses (CFAs) were conducted to examine the unidimensionality of the separate subscales. Unidimensionality refers to the extent that item responses on a scale are driven by the latent trait the scale purports to measure . First, a one-factor model fit was examined with the broad anxiety data, the MDD-10 data, and the MDD-5 data. Second, a bi-factor model fit was examined with the broad anxiety data only, since the anxiety scale was developed by exploratory bi-factor modeling . In the bi-factor model, all anxiety items were allowed to load on a general broad anxiety factor as well as on one of five orthogonal group factors (i.e., SAD, SOC, GAD, OCD, and PD) in accordance with the description of Ebesutani and colleagues . In both the one-factor and bi-factor model fit tests, item responses were indicated as ordered, the diagonally weighted least squares model estimation was used, and mean and variance adjusted test statistics were calculated. Model fit was assessed through four indices: a scaled comparative fit index (CFI), a scaled Tucker-Lewis index (TLI), a scaled root mean square error of approximation (RMSEA), and a standardized root mean square residual (SRMR). A CFI or TLI close to or higher than 0.95, and an RMSEA close to or lower than 0.06 or an SRMR close to or lower than 0.08 were considered as indicators of a good fit . Further, an RMSEA between 0.06 and 0.1 was considered as mediocre , and CFI or TLI values between 0.90 and 0.95 were considered as an acceptable fit ; all remaining scores were considered as indicators of an unacceptable fit.
In the case of a lack of a one-factor model fit and a sufficient bi-factor model fit, it was examined whether the broad anxiety scale was unidimensional in essence. Essential unidimensionality was examined by calculating the omega hierarchical, which is a statistic that estimates the proportion of variance in raw scores attributable to the general factor ; an omega hierarchical of at least 0.8 was considered as an indicator of a scale that is unidimensional in essence. Essential unidimensionality can also be assessed by the explained common variance (ECV), which refers to the ratio of variance explained by the general factor divided by the variance explained by the general factors and the group factors . However, the interpretation of ECV, if used as a unidimensional measure in the context of a bi-factor model, depends on the percentage of uncontaminated correlations (PUC) . PUC is a statistic of the percentage of inter-item correlations accounted for by the general factor only . The PUC of the broad anxiety scale is considered high (i.e., 0.86) , since PUC values greater than 0.8 indicate a low risk of bias when treating a multidimensional scale as unidimensional . Since the PUC is high, the ECV was considered as less important as an indicator of unidimensionality .
Internal consistency was assessed by calculating a Cronbach’s alpha per subscale. An alpha equal to or greater than 0.70 was considered as sufficient. However, in the case of a bi-factor model fit to the anxiety data, Cronbach’s alpha can be misleading . In that case, omega hierarchical in combination with omega total can be regarded as appropriate model-based reliability indicators . Omega total refers to the proportion of the total variance attributable to the general and group factors. Omega hierarchical and omega total were derived from the standardized factor loadings from the confirmatory bi-factor model; they were considered sufficient if they were equal or greater than 0.8.
Test-retest reliability was assessed by calculating an intraclass correlation coefficient (ICC) and its 95% confident interval. ICCs were calculated by the use of a single rater, absolute agreement, and a two-way mixed effect model. ICCs of 0.70 or higher were considered as sufficient.
Criterion validity was assessed by calculating receiver operating curves (ROCs). One ROC was calculated for the broad anxiety scale in comparison with any anxiety disorder according to the K-SADS-PL (i.e., panic disorder, separation anxiety disorder, avoidant disorder/social phobia, overanxious/generalized anxiety, and/or obsessive-compulsive disorder). Two ROCs were calculated for the MDD-10 and MDD-5 scales in comparison with depressive disorders according to the K-SADS-PL.
Construct validity: hypotheses testing
Construct validity was assessed by testing four hypotheses based on inferences from previous studies. It was determined a priori that construct validity of the separate subscales would be sufficient if at least three out of four hypotheses were confirmed.
Our first hypothesis was that girls have higher mean scores than boys on the broad anxiety scale, MDD-10 scale, and MDD-5 scale [1, 2, 6, 14]. This hypothesis was tested by multilevel linear regression analyses, adjusted for school and class by the use of a random intercepts and random slope model. Mean differences were expected of at least one point on the anxiety scale , one point on the MDD-10 scale [12, 14], and 0.5 point on the MDD-5 scale .
Our second hypothesis was a positive correlation of 0.6 to 0.7 between the broad anxiety scale and the two MDD scales, since anxiety and depression have been found to be comorbid [1, 2, 6, 44]. These correlations are results of previous RCADS research [13, 15].
Our third hypothesis was a positive correlation of at least 0.70 between the broad anxiety scale and the SCARED-NL, of at least 0.65 between the MDD-10 scale and the CDI-2, and of at least 0.6 between the MDD-5 and the CDI-2. These correlations were expected, since related constructs are measured and previous studies have reported comparable results [14, 16]. In addition, fewer items decreases reliability, and the test-retest reliability of the CDI-2 has shown to be moderate .
Our fourth hypothesis was a positive correlation of at least 0.6 between the broad anxiety scale and the CDI-2, and between the MDD scales and the SCARED-NL; in addition, the correlations between the broad anxiety scale and the CDI-2 and between both MDD scales and the SCARED-NL were expected to be lower than the correlations between the broad anxiety scale and the SCARED-NL and between the MDD scales and CDI-2. This hypothesis was based on the fact that slightly different constructs are measured that were expected to correlate highly.
All analyses were conducted in SPSS version 21, with the exception of the CFAs, which were performed using the lavaan package in RStudio version 1.1.463, and of the multilevel regression analyses, which were performed in Stata Intercooled 15.