Literature Search
Figure 1 outlines the PRISMA flow diagram describing identification and selection of studies for inclusion in the systematic review. The initial database search identified 9,105 studies with 5 additional studies identified through searches of previous studies’ bibliography. From these, 493 duplicate studies were removed, while 8,270 studies were deemed to have not met the eligibility criteria based on reviewing abstracts and titles, leaving 342 studies eligible for full text review. Inter-rater agreement between the two primary reviewers of inclusion of studies based on abstracts and titles was 96.6%. A further 318 studies were subsequently excluded at the full text review stage based on applying inclusion/exclusion criteria (see Figure. 1 for reasons). Inter-rater agreement between the two primary reviewers of inclusion of studies based on full texts was 93.8%. In total, 24 full text RCTs were identified as eligible. Publication of these studies were between 2005 and 2020.
Study Characteristics
The characteristics for each study are summarised in Table 1. The pooled sample size of participants across 24 studies was N= 2,329 (range 14-161) parents. Where parent gender was reported in studies, the majority were female mothers [45-58]. The remaining studies reported on measures provided by a ‘parent’ but did not specify the gender [59-67] . Most children treated across the studies were male (64%). The average age of the child was 5.88 years (SD= 1.82; range 3.81 to 10.35 years). Selection of children into individual studies were based on them having either diagnosis of ADHD, CD or ODD [37, 47, 52, 59, 67] , the children having externalising scores in the clinical range [37, 46, 48, 50-53, 61, 63, 65, 66, 68, 69], or elevated symptoms of externalising behaviours that caused parental concern for their child and/or parenting [37, 45, 48, 50, 51, 56, 60-62, 64, 70-72]. Additionally, some studies included additional inclusion criteria: premature birth [68] giftedness [64], at risk for maltreatment [72], or placed in a licenced nonrelative foster home [53].
In relation to the interventions evaluated, five evaluated Parent-Child Interaction Therapy [45, 52, 53, 68, 72], eight studies evaluated Incredible Years Parent Program [37, 47, 51, 56, 59, 62, 65, 71], five studies evaluated Triple P Positive Parenting Program [60, 63, 64, 66, 67], three studies evaluated Parent Management Training, the Oregon model [48, 61, 70], one study evaluated ACT Raising Kids Safe Parenting Programme [46], one study evaluated Well Parent Japan [69], and one study evaluated Brief Parent Training [50]. A total of eight studies delivered the intervention in individual-based format [45, 50-52, 68-70, 72] and 16 studies delivered the intervention in group-based format [37, 46-49, 53, 56, 59-67, 71]. Studies were conducted in a variety of settings including community mental health services/clinics [45, 62, 71], community family services [46, 50, 52, 61, 70], child centres [63, 65], child psychiatric outpatient clinics [47, 56], social service centres and nursey schools [52], child welfare agencies [53], and university clinics [37, 59, 67]. Seven studies did not report the study settings [48, 60, 64, 66, 68, 69, 72]. Most studies compared the intervention group to a waitlist control group (79%). One study compared to an alternative active treatment (Family Creative Therapy) [45], and four compared BPT to service as usual [50, 53, 63, 70].
Only 15 out of 24 studies reported information on the number of treatment sessions attended by parents. Treatment sessions ranged from 1 to 23 sessions. Eight studies reported the mean number of sessions attended [45, 51, 52, 56, 65, 70-72], five reported the percentage (%) of parents attending all or some of the sessions [48, 50, 62, 66, 69], and two reported on the average hours of intervention [50, 61]. Session length ranged from 1-5 hours.
Instruments used to measure internalising symptoms included the Child Behaviour Checklist (CBCL) [73] and the Strengths and Difficulties Questionnaire (SDQ) [74]. Specifically, 12 studies [37, 45, 48, 52, 53, 56, 59, 62, 66, 68, 69, 72] used the CBCL internalising subscale, which combined scores on the anxious/depressed, withdrawn, and somatic complaints scales [73], while two studies [50, 61] used the CBCL anxious/depressed scores. Additionally, two studies [46, 70] used the SDQ internalising subscale calculated from scores on the emotional and peer problems scales [75]. A further eight studies [47, 51, 60, 63-65, 67, 71] used only scores from the SDQ’s emotional problems scale. No studies assessed internalising outcomes using diagnostic instruments. Only two studies [37, 56] reported fathers’ assessment of internalising symptoms separate to the ‘mother’ or ‘parent’ report.
Study Quality: Risk of Bias
Figures 1 and 2 in supplementary information shows risk of bias ratings for the included studies. Studies were judged to include some degree of risk of bias. For example, some studies were unclear on how random sequences were generated to avoid selection bias [37, 56, 72]. Further, information regarding how this sequence was concealed from study personnel was also not specified [37, 48, 51-53, 56, 59, 64, 67, 69, 72]. Two studies [48, 64] had evidence of attrition bias as there was no loss to follow up data. All studies appeared to be free from all other sources of possible bias apart from one study [62] which had evidence of recruitment bias from cluster randomisation, and one study [37] with conflict of interest.
Narrative review of child internalising outcomes following BPT
A summary of internalising outcomes following participation in BPT across studies is provided in Table 2. A total of 12 studies reported statistically significant findings that favoured BPT over control for internalising symptoms, with a range of small to large treatment effect as measured by Cohen’s d (d= 0.004-1.4), medium to large for partial eta squared (η2 = 0.084-0.449) and small for R squared (R2 = 0.08). Interventions that reported statistically significant change in internalising symptoms following treatment were Parent-Child Interaction Therapy [52, 53, 68, 72], Brief Parent Training [50], Incredible Years [56, 62, 69, 71], Triple P [66, 67] and Well Parent Japan [69]. One of these studies [50] reported statistically significant findings that favoured BPT over primary care services as usual, while all other studies reported findings that favoured BPT over waitlist control. Ten of these studies evaluated outcomes using the CBCL internalising outcome measure [37, 50, 52, 53, 56, 62, 66, 68, 69, 72]. Studies that reported significant findings also had either ‘mother’ or ‘parent’ report on outcomes. Where fathers’ assessment of outcomes was reported, BPT did not significantly reduce internalising symptoms. Three of the four studies coded as having samples of children with high internalising symptoms reported statistically significant findings that favoured intervention over control [52, 56, 67]. No other patterns of findings were identified in relation to study inclusion criteria, child age, treatment delivery format (individual versus group), or treatment duration.
Meta-analysis of treatment effects of BPT on internalising outcomes
A total of 16 studies out of 24 which reported on post-treatment outcomes were included in the meta-analysis ensuring homogeneity in assessing the immediate treatment effect of BPT on internalising symptoms compared to waitlist control. Four studies [51, 63, 65, 71] were removed due to only evaluating follow up outcomes at 6 and 12 months. A further four studies [45, 50, 53, 70] that compared two active treatments, rather than waitlist controls, were removed. Finally, due to the small number of studies providing fathers’ assessment separately, only the studies that indicated ‘parent’ and ‘mother’ reported scores were included. This also ensured that the children included in the analysis were independent observations. Random-effects meta-analysis (see Figure 2) demonstrated a statistically significant small post-treatment effect for BPT to reduce internalising symptoms compared to waitlist controls (g = -0.41, 95% CI -0.57, -0.25, z = 4.99, p = 0.00001). Heterogeneity was found to be moderate and significant (I2 = 44%, p = 0.03). Inspection of the funnel plot demonstrated no evidence for publication bias (Figure 3 in supplementary information).
Moderation analyses examining the influence of child, treatment, and study factors
Tests for subgroup differences indicated that there were no statistically moderating effects for child age, baseline scores of internalising symptoms, treatment format, measurement scales, or programme type. More specifically, subgroup meta-analyses indicated that BPT had: (i) statistically significant small treatment effects for studies coded as having younger and older age group samples; 2-6 years (k= 11, g= -0.37, 95% -0.56, -0.17, P= 0.0002) and 7-12 years (k= 5, g=-0.52, 95% -0.83, -0.20, P= 0.001), (ii) small significant effects among studies coded to have low average internalising scores at baseline (k=12, g= -0.34, 95% -0.52, -0.17, p= 0.0001), while having moderate significant effects among studies coded to have high average internalising scores at baseline (k=4, g= -0.62, 95% -0.96, -0.27, p= 0.0005), (iii) small significant effect among studies with group treatment formats (k=13, g= -0.33, 95% -0.49, -0.18, p= 0.0001, I2= 28%), while having a significant large treatment effect among studies with individual treatment formats (k=3, g= -0.82, 95% -1.34, -0.30, p= 0.06), (iv) significant small treatment effects among studies that used the CBCL (k=11, (g= -0.43, 95% -0.64, -0.22, p=0.02, I2= 54%) and studies that used the SDQ (k=5, g= -0.36 95% -0.61, -0.11, p=0.005, I2= 20%), and (v) a significant small effect was found for studies using the Incredible Years programme (k=5, g= -0.35, 95% -0.56, -0.14, p= 0.001), significant small effect for studies using Triple P (k=4, g= -0.46, 95% -0.74, -0.19, p= 0.001) and a significant large effect for studies using PCIT (k=3, g= -0.82, 95% -1.34, -0.30, p= 0.002).
Sensitivity analysis
A post-hoc sensitivity analysis was conducted to protect against overestimation of the final effect size due to substantial heterogeneity in treatment effects. Effect sizes and moderation analyses were recalculated by removing the study with the largest SMD in each subgroup in each analysis. Outcomes from calculating the overall treatment effect, tests comparing subgroup effects, and forest plots are provided in Figures 4 – 8 in supplementary information. The significance of the overall treatment effect and subgroup effects did not change because of removing the studies with the largest SMD in each subgroup. Further, results of comparing subgroup effects remained the same as the results from moderation analysis reported above.