2.1 Search strategy and selection criteria
This review was structured in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [15]. A study protocol was published in the Open Science Framework (OSF) in advance of the study (https://osf.io/g7w3v).
All Norwegian repeated cross-sectional surveys were eligible for inclusion, provided they assessed general mental health problems (i.e., symptoms of anxiety and depression) among 13- to 24- year-olds and made attempts to attain a representative sample of the general youth population. Each survey had to include data collected on at least two occasions using similar recruitment and outcome measures. Since this is a study relating to mental health problems among the youth population, in general, surveys from clinical- or at-risk samples were not included.
2.2 Data collection process
The studies eligible for inclusion in this review were known in advance by Norwegian youth health researchers and experts. The principal investigators from all Norwegian youth surveys were invited to collaborate in advance on this study. The study data were drawn from open repositories or via direct contact with the principal investigators of each youth survey in 2022.
A major advantage of this approach was that it allowed us to request information regarding relevant outcomes or indicators directly from the individual youth survey administrators, information which is not widely available or reported on previously.
Data extraction and verification was done by three authors (TP, SAN and LB).
2.3 Operationalization of mental health problems
There were two different operationalizations of mental health problems in the surveys included in this study, both of which were self-report symptom checklists: The Hopkins Symptom checklist (HSCL) and the Health Behaviour in School-aged Children Symptom Checklist (HBSC-SCL).
2.3.1 The Hopkins Symptom Checklist (HSCL)
The HSCL was developed as a broad measure of mental health problems, defined by the frequency of symptoms of mental health problems in clinical and non-clinical samples. The instrument originally consisted of 90 items [16]. However, shorter formats (5-25 items) have since been developed, focusing mainly on symptoms of anxiety and depression. These short versions have been comparable with the longer versions and perform well (the correlation between the different versions of the HSCL ranges from .91 to .97) [17]. The respondents were asked to what extent the symptoms have bothered them over the past seven or fourteen days. The sample items were “feeling hopeless about the future”, “feeling everything is an effort”, “suddenly scared for no reason” and “feeling tense or keyed up”. The responses were recorded on a 1-4 scale, ranging from “not at all” to “a little” to “quite a lot” to “extremely”, with higher scores signifying more severe mental health problems. The responses were averaged to produce a total HSCL score.
2.3.2 Health Behaviour in School-aged Children Symptom Checklist (HBSC-SCL)
The HBSC-SCL was designed to assess mental health problems according to the frequency of symptoms in non-clinical youth samples. The HBSC-SCL measures eight symptoms: headache, abdominal pain, backache, feeling low, irritability or a bad mood, feeling nervous, experiencing sleeping difficulties and dizziness. Young people were asked how often they had experienced these symptoms over the past six months. The responses were recorded on a 1-5 scale, namely, “about every day”, “more than once a week”, “about every week”, “about every month” and “rarely or never”. Greater symptom frequency indicated more significant mental health problems [18]. The responses were averaged to produce a total HBSC-SCL score. Previous research supports the validity and reliability of the instrument [19,20].
2.4 Data analysis
2.4.1 Individual survey analyses
We extracted key variables from each survey in a harmonized manner. This included the mean symptom scores for each individual participant and their standard deviations, stratified by survey year, sex, and age.
To examine the secular changes of mental health problems within each survey, we fitted a series of linear and logistic regression models. Firstly, we performed linear regression analyses to investigate the secular change in the mean symptom scores for each survey separately. To provide a standardized measure of effect size, we z-transformed (set the grand mean equal to zero and a standard deviation equal to one) the symptom scores within each survey separately for males and females. In these models, the mean symptom scores (dependent variable) were regressed on survey year and age (independent variables). The survey year was dummy coded using a backward difference contrast coding scheme, whereby each survey year was compared to the prior level (i.e., 2002 vs. 1998; 2005 vs. 2002). This generated n-1 contrasts, where n is the total number of survey years. All models were fitted separately for males and females.
Secondly, we performed binary logistic regression analysis to investigate the secular change in proportions of individuals scoring above a problematic symptom score threshold. We defined the problematic symptom score by employing a cut-off of a mean score >2. This is a threshold commonly used in many Norwegian youth surveys to indicate high symptom load and is the suggested threshold for identifying a mental disorder in the shortest versions of the Hopkins Symptom Checklist-5 (HSCL-5) [17]. The dependent variable in these models was a binary variable, denoting whether an individual scored above or below the cut-off. The independent variables were survey year and age, and the models were run separately for males and females.
For each survey, we plotted the mean symptom scores and proportions of individuals scoring above the cut-off threshold of >2 across time. As a cut-off threshold of >3 is also commonly used in the included surveys, we plotted the proportions of individuals scoring >3 across time for descriptive purposes. For each model, we reported the associations between the dependent variable, and survey year and age. The SPSS software [21] was used for these analyses and alpha was set to .05 for all analyses.
2.4.2 Meta-analysis
For each individual survey, we collected means and standard deviations of the outcome measures. These estimates were then pooled and synthesized using a multilevel meta-regression analysis as the primary meta-analytic technique, by utilizing the package, “Metafor”, in the R statistical environment [22,23]. As the mean symptom scores from different surveys varied on a relative scale, the mean symptom scores were log-transformed to express the outcome score on a comparable metric, indicating relative change on the log scale. This procedure “normalizes” the relative differences between the surveys, thereby rendering the differences between the surveys interpretable and ensuring the validity of inference [24].
The data had a natural multilevel structure with three levels; outcomes from individual participants from the primary surveys (level 1), outcomes summarized from individual surveys at each data collection (level 2) and outcomes clustered within each survey (level 3). We expected that the outcome within the surveys would be fairly homogeneous, but with substantial variance between surveys, due to the differences in the instruments used, the sampling procedure and the time period of the data collection. To model and correct for this complex three-level structure, we modeled a three-level meta-regression, examining within-cluster heterogeneity at level 2 (the variation of the true effect size within studies) and between-cluster heterogeneity at level 3 (variation between surveys) using restricted maximum-likelihood estimation. The I2 statistic was computed as an indicator of heterogeneity in percentages, with values 0-50% indicating no heterogeneity, 50-75% indicating moderate heterogeneity and 75-100% indicating substantial heterogeneity [25].
The amount of (residual) heterogeneity accounted for in the full three-level model can be regarded as a (pseudo) R2 value (corresponding to the interpretation of a traditional adjusted R2 value) and is the percentage of the variance explained [26]. The full model (three levels) was compared to a reduced model (two levels) to assess model fit using the likelihood ratio test.
The final three-level meta-regression results are presented as the relative change in the (log-) mean symptom scores (95% CI) per year, including differences by sex, and the interaction effect between sex and year. In addition, regression estimates are adjusted for age and centered at the mean age across the samples (age 17). Alpha was set to .05 for all analyses.