Participants
Participants were from the longitudinal Zurich Project on Social Development from Childhood to Adulthood (z-proso) study[14]. Z-proso began in 2004 when the participants were aged 7 and entering primary school. For the present analyses we focus on the most recent waves of data when participants were aged 15, 17, and 20. Information on the sociodemographic composition of the sample and the Ns available for each variable (smallest N = 762) are provided in Table S1 of Supplementary Materials. The sample included approximately equal numbers of males and females and from a wide range of sociodemographic backgrounds, reflecting the diversity of the underlying same-aged population of Zurich, Switzerland. Previous research has suggested that the sample achieved good representativeness of the underlying population and has been subject to little non-random attrition over time [15].
Relevant data were available for two sets of waves for the counterfactual analysis. Specifically, for age 17 anxiety and depression we had treatment (reading engagement) data at age 15 and covariate data at age 13. In addition, for age 20 anxiety, depression, and psychosis-like symptoms we had treatment (reading engagement) data at age 17 and covariate data at age 15. We, therefore, conducted two sets of analyses using data from the age 13–17 waves and age 15–20 waves.
For the RI-CLPM analysis, data were available for 4 waves (age 13, 15, 17, 20] for the anxiety and depression outcomes; however, psychosis-like symptoms were available at only age 20. As such, only the effects of reading on depression and anxiety could be explored using this approach.
Measures
Outcomes
Anxiety and depression were measured using the self-reported Social Behaviour Questionnaire [16] administered at the age 17 and 20 wave of z-proso. At age 20, four items captured anxiety (e.g., fearfulness) and nine items captured depression (e.g., unhappiness, lethargy). Responses were recorded on a five-point Likert-type scale from never to very often.
Slightly different sets of items were available at different ages for these outcomes.. At ages 11–17, four items each measured anxiety and depression. Previous psychometric analyses have supported the reliability and validity of the scores from the SBQ in the current sample [17–20]. Omega reliability for anxiety and depression at age 11 was .69 and .65 respectively; at age 13 was .72 and.74; at age 15 was .74 and .74; at age 17 was .77 and .77 and at age 20 was .80 and .90. Psychosis-like symptoms were measured using an abbreviated six-item version of the Community Assessment of Psychic Experiences Scale [21] at age 20 only. Psychotic-like experiences were measured with six items (e.g., hearing voices). Omega reliability for this scale was .73.
Treatment Variable
Reading engagement was measured using a single-item self-report of reading frequency at the ages 13, 15 and 17 waves of z-proso and asks participants how often they ‘read a book or magazine’ on a five-point scale from never to (almost) every day between Monday and Friday.
For the purposes of the counterfactual analyses, we dichotomised the item such that those who answered ‘never’ or ‘less than once a week’ were designated the control group and those who answered ‘on 1 to 2 days’, ‘on 3 to 4 days’, or ‘(almost) every day’ were designated the treated group. As a sensitivity analysis, we also implemented non-bipartite matching which treats the treatment variable as ordinal
Counterfactual Analysis Matching Variables
Matching variables were selected based on plausibly having an association with the treatment and outcome and included sociodemographic, mental health and wellbeing markers, prosocial/antisocial traits, social risk/protective factors, and school-specific risk/protective factors. The measures are used for matching are comprehensively described in Supplementary Materials and included gender, migration background, socioeconomic status, internalising problems, ADHD symptoms, general trust, self-control, substance use, aggression, prosociality, delinquency, parental involvement, social support, bullying victimisation, teacher bond, class bond, school difficulties, and school achievement in German, maths, and school motivation. All covariates used for the age 20 outcome analyses were measured prior to age 17 (at the age 15 wave or earlier) and all covariates used for the age 17 outcomes were measured prior to age 13 except ISEI and migration status, which were based on data from both age 13 and 15. School achievement variables were measured at age 12 because once youth transition to secondary school, achievement ratings are only valid when comparing youth within the same stream.
Statistical Procedure
R code is provided at: https://osf.io/f23jb
Missingness
Unless data are missing completely at random (MCAR), treatment effects estimated using propensity scores to match or weight the data will be biased and less efficient than alternative methods that attempt to model the missingness [22, 23]. We, therefore, where possible implemented methods that can provide unbiased parameter estimates under a missing at random assumption (MAR) (see Bottigliengo et al., 2021]. Specifically, for our matching-based analyses with our dichotomised treatment variable, we used multiple imputation using a ‘within’ approach that conducts the propensity score analysis within each dataset and pools the results using Rubin’s rules. For this we used a Bayesian multivariate chained equations approach (a fully conditional specification approach) implemented in the mice package in R statistical software [25], with 10 imputations. All covariates and outcome variables were included in the imputation model [26]; however, we focused on the cases with complete treatment data given that missingness in these variables was fairly minimal. All variables in the imputation model were predicted from all other covariates measured at all waves using predictive mean matching. A separate imputation model was built for each set of outcome analyses (age 17 versus 20).
We used a single imputation approach for the non-bipartite matching-based analyses as multiple imputation is not straight-forward to implement for non-bipartite matching. For our RI-CLPM analysis we used full information maximum likelihood estimation (FIML) to deal with missingness. This also provides unbiased parameter estimates under the assumption of MAR.
Group Comparisons
We began by conducting independent samples t-tests to compare those with and without frequent reading engagement. We also examined group differences using nearest neighbour propensity score matching based on a logistic regression propensity model. Given that we had relatively even numbers of participants in each group, we used 1:1 matching, implemented using the MatchIt package in R statistical software [27].
If matching could be achieved (defined by standardised mean differences between the matched treated and control groups of <|.15| for the covariates), we used a separate linear regression for each outcome: anxiety, depression, and psychosis. There has been some debate regarding whether paired or ‘matched’ analyses are optimal given that the dependency in the data is induced by the propensity model and this does not necessarily imply that outcomes will be highly correlated within pairs [28]. Indeed, this approach can lead to inflated type 2 error rates and alternative approaches such as including the matching variables in the main analysis model can lead to better type 2 error rates [28]. We therefore implemented the approach suggested by [28] in which the mental health outcomes were regressed on the treatment and all matching variables in the matched sample. This also has the advantage of helping to remove residual confounding related to the measured variables after matching. We implemented two versions of these analyses: in which prior reading engagement was versus was not included in the set of matching variables.
We also implemented an optimal non-bipartite matching version of the analysis where the treatment variable is modelled as ordinal [29]. In this method, participants with different levels of the treatment variable but similar propensity scores can be matched. Further, in optimal non-bipartite matching, rather than using a greedy matching approach that can result in poor matches for later-matched pairs, an algorithm based on Derigs shortest augmentation path algorithm [30] is used. To implement the non-bipartite matching method, we first estimated treatment propensity scores for the participants using an ordinal logistic regression model. We then used these to derive the following distance measure:
$$\varDelta \left({x}_{i},{x}_{j}\right)= \frac{{\left({\widehat{\beta }}^{T}{x}_{i}-{\widehat{\beta }}^{T}{x}_{j}\right)}^{2}+ϵ}{{\left({Z}_{i}-{Z}_{j}\right)}^{2}}$$
(1)
Where \({\widehat{\beta }}^{T}x\) is the propensity score, \(Z\) is the treatment dose level, and \(ϵ\)is a vanishingly small but strictly positive number. This disallows matches between individuals with the same treatment dose. To control the amount of residual confounding, matches were also disallowed between pairs differing in distance by more than 0.15SD units by setting the relevant entries in the distance matrix to very large numbers. We then implemented optimal non-bipartite matching based on the distance matrix constructed using this measure using the R package nbpMatching [31].
RI-CLPM
Random-intercept cross-lagged panel models (RI-CLPMs)were fit to the longitudinal reading engagement and mental health data. The RI-CLPM can be thought of as an extension of the standard CLPM, which models the lag-1 autoregressive and cross-lagged relations between constructs over time. However, it has been noted that the standard CLPM conflates between-person and within-person effects, where it is the latter that is typically of interest in developmental theory and for informing interventions [8]. The RI-CLPM was proposed as a way to disaggregate these between- and within-person effects. It does so by specifying random intercept factors for each variable (person random effects representing individual differences in the overall levels of each variable) and fitting the cross-lagged structure to the resulting time-specific residuals. The random intercept factors are allowed to covary; they estimate the between-person association between constructs (e.g., the extent to which people with overall higher levels of reading engagement tend to have lower overall levels of mental health issues). The time-specific residuals after modelling these random intercepts represent individuals’ deviations from their baseline level of a construct and can be used to test whether being higher or lower than usual on one construct is associated with escalations or declines in another. For example, it can be used to test whether reading more at a given time is associated with better mental health concurrently and at the next wave. This represents the within-person cross-lagged effects of reading on mental health. All lag-1 autoregressive and cross-lagged effects between the reading and relevant mental health variables were modelled, as were the concurrent (residual) covariances between reading and the relevant mental health variable at each time point. Models were fit using the lavaan package in R statistical software with maximum likelihood estimation and separate models fit for each mental health outcome.