The search strategy yielded 3580 citations. Citations were screened for duplicates and 1722 were removed. One article was identified through hand searching references in key known references (Jørgensen et al., 2021). Thus leaving 1859 records to be screened at title and abstract level. Upon review of title of abstract, it was apparent that articles screened out were not related to the target population (e.g. did not specifically examine BPD); not a randomised controlled trial (e.g. reviews, qualitative, single cohort case studies or observational studies, or questionnaire/survey studies); and/or not testing a psychological intervention (e.g. pharmacological). Of the remaining studies, 67 full-text studies were reviewed for eligibility. Excluded articles were randomized controlled trials not related to the target population (e.g., analyses did not include children, adolescents, or young adults with BPD symptoms); or without measures of functioning (e.g., Schuppert et al., 2012). The references of the final seven articles were scanned and further possible articles were screened, however, no further studies were identified through reference lists. See Fig. 1. for a PRISMA flow diagram of this process (Page et al, 2021). Table 1 provides details on the seven eligible articles included in this systematic review.
Table 1
Study | Trial Design & Setting | BPD criteria met (diagnostic framework) | Total n (%f/m) | Age, years (range and mean, SD) | Intervention | Comparison | Duration | Functional Outcome (primary or secondary) | Time points | Effect on functioning and between group effect sizes at post-treatment |
Chanen et al., 2008 | Outpatient, Australia, RCT | > 2 and additional risk factor (DSM–IV) | 78 76/24% | 15–18 (16.4, 0.9) | CAT | GCC | 24 weeks | SOFAS (primary) | 0, 6, 12, and 24 months | Both groups improved from baseline in functioning, which was sustained at 24 months. No significant difference between groups at follow-up until 24 months at which point GCC was better. Rate of change was quicker in CAT. |
Gleeson et al., (2012) | Outpatient, Australia, RCT | > 4 (DSM–IV) | 16 81.25/18.75% | 15–25 (18.4, 2.9) | CAT + SFET | SFET | 17 weeks + 2 booster sessions | SOFAS (secondary) | 0, EOT and 6 months | Significant improvement in functioning from baseline to EOT and 6 months. Experimental group had better functioning at 6 months and EOT. |
Pistorello et al., (2012) | Outpatient, USA, RCT | > 3 and least one act of lifetime NSSI and or suicide attempt (DSM–IV) | 63 70.95/19.05% | 18–25 (20.86, 1.92) | DBT | O-TAU | 12 months | SAS-SR (secondary) | 0, 3, 6, 9, and 12, and 18 months. | Significant improvement between baseline and all timepoints on both conditions (symptoms and functioning). Better improvement for experimental condition compared to those in the comparison condition at post-treatment and final follow-up |
Mehlum et al., (2016) | Outpatient, Norway, RCT | > 2 and history of at least 2 episodes of self-harm, at least 1 episode within the last 16 weeks; (DSM–IV) | 77 88.31/11.69% | 12–18 (15.6, 1.5) | DBT-A | EUC | 19 weeks | C-GAS (secondary) | 0, 19 weeks, and 71 weeks | Both groups showed significant improvement in functioning at post-treatment and at 71 weeks. Minimal difference between experimental and control group in functioning. |
Asarnow et al., (2021) | Outpatient, USA, RCT | > 3 and at least 1 lifetime suicide attempt, elevated past-month suicidal ideation (DSM–IV) | 173 94.22/5.78% | 12–18 (14.89, 1.47) | DBT | IGST | 6 months | SAS-SR (secondary) | 0, 3, 6, 9, and 12 months. | Both groups showed significant improvement post-treatment and at 12 months. DBT group showed better functioning but were less severe at baseline. |
Jørgensen et al., (2021) | Outpatient, Denmark, RCT | > 4 (DSM–5) + > 67 on BPFS-C | 111 99.1/0.9% | 14–17 (15.8, 1.1) | MBT-G | TAU | 12 months | C-GAS (secondary) | 0, 3 times during treatment phase, EOT, 3 and 12 months post-treatment | Both groups showed improved function between baseline and 12 months. No difference found between experimental condition and TAU on functioning. At end of trial both groups were rated as having “variable functioning with sporadic difficulties or symptoms” |
Chanen et al., (2022) | Outpatient, Australia, RCT | > 5 (DSM-IV-TR) | 139 80.58/19.42% | 15–25 (19.1, 2.8) | HYPE + CAT | HYPE + BEF; YMHS + BEF | 16 sessions (16–25 weeks) | IIP; SAS-SR (primary) | 0, 3, 6, 12, and 18 months | All groups improved significantly on both measures of functioning and 12 months. These benefits were sustained with the comparison group (YMHS + BEF) outperforming the active therapy conditions on the IIP-C, but not the SAS at the end of the trial follow-up. |
Abbreviations: BEF = Befriending; BPFS = Borderline Personality Features Scale for Children; CAT = Cognitive Analytic Therapy; C-GAS = Children's Global Assessment Scale; DBT = Dialectical Behavior Therapy; DBT-A = Dialectical Behavior Therapy for Adolescents; DSM = Diagnostic and Statistical Manual of Mental Disorders; EOT = End of Treatment; GCC = General Clinical Care; HYPE = Helping Young People Early; IGST = Intensive Group Skills Training; IIP = Inventory of Interpersonal Problems; MBT-G = Mentalization-Based Treatment - Group Format;; O-TAU = Optimized Treatment as Usual; SAS-SR = Social Adjustment Scale - Self-Report; SFET = Specialist First Episode Treatment; SOFAS = Social and Occupational Functioning Assessment Scale; TAU = Treatment as Usual; YMHS = Youth Mental Health Service. |
Table 2 Overview of measures of functioning |
Study | Measure | Administration | Functional Domains Assessed | | Scoring |
Chanen et al., 2008 Gleeson et al., (2012) | SOFAS | Clinician or observer-rated based on knowledge of patient or interview | Social (interpersonal) and occupational performance | On the SOFAS, the individual is provided with a score out of 100 which considers social, occupational and/or academic functioning. A higher score indicates a higher level of functioning. |
Pistorello et al., (2012) Asarnow et al., (2021) Chanen et al., (2022) | SAS-SR | Self-reported structured questionnaire with 5-point likert scale | Social (relationships with family and extended family and leisure activities) emotional adjustment, and school or work | SAS-SR contains 54 items that assess role performance over the past two weeks. Six domains reviewed including work/school, social/leisure activities, extended family, primary relationship, parental role, and family unit. |
Chanen et al., (2022) | IIP-C | Self-reported structured questionnaire with 5-point likert scale | Social (interpersonal) | IIP is a 64-item measure designed to assess interpersonal difficulties. Items organized in a circumplex structure. The dimensions include dominance, submission, hostility, warmth, aloofness, nurturance, manipulation, and social avoidance. |
Mehlum et al., (2016) Jorgensen et al., (2021) | CGAS | Clinician or observer-rated based on knowledge of patient or interview | Social (interpersonal) and academic performance | The CGAS is scored on a scale ranging from 1 to 100, with higher scores indicating better overall functioning. The CGAS considers numerous domains, including academic performance, interactions with family and peers, emotional well-being. |
Abbreviations: SOFAS = Social and Occupational Functioning Assessment Scale; SAS-SR = Social-Adjustment Scale – Self Report; IIP-C = Inventory of Interpersonal Problems - Circumplex version; CGAS = Children's Global Assessment Scale |
Table 3 Risk of Bias Tool 2 (ROB2) | |
| | Domain 1 | Domain 2 | Domain 3 | Domain 4 | Domain 5 | Overall Risk of Bias |
| Study | Randomization process | Deviations from intended interventions | Missing outcome data | Measurement of the outcome | Selection of the reported result |
1 | Chanen et al., 2008 | low | low | some concerns | low | some concerns | some concerns |
2 | Gleeson et al., (2012) | low | some concerns | high | some concerns | some concerns | high |
3 | Pistorello et al., (2012) | some concerns | low | low | low | some concerns | some concerns |
4 | Mehlum et al., (2016) | some concerns | low | low | low | some concerns | some concerns |
5 | Asarnow et al., (2021) | low | low | low | low | low | low |
6 | Jorgensen et al., (2021) | low | low | some concerns | low | low | some concerns |
7 | Chanen et al., (2022) | low | low | low | low | low | low |
Seven studies were identified from the search. Figure 1 details the search, screening, and selection process.
Figure 1 to be inserted at this point in the manuscript
Study Characteristics
Seven RCTs were identified from the search. Studies were undertaken in: Australia (n = 3), USA (n = 2), Norway (n = 1), Denmark (n = 1). Eligible studies reported data for 657 participants (86.71% female). The mean age ranged from 14.89 to 20.86. All samples were composed of adolescents or young adult outpatients receiving care in the community. Only three studies reported data on ethnicity. Of these, Pistorello et al. (2012) reported ethnicity for the full sample (69.8% ‘White; 6.3% ‘Asian American’; 11.1% Hispanic; 31.7% ‘African American’ and 4.8% ‘Native American’) as well as Asarnow et al. (2021) (56.39% ‘White; 5.85% ‘Asian American’; 27.49% Hispanic; 7.02 ‘African American’ and 0.58% ‘Native American’). One study, (Mehlum et al., 2016) reported that 84.9% of their sample was of “Norwegian ethnicity”. Two studies identified functional outcomes as primary outcomes (Chanen et al., 2021; Gleeson et al., 2012).
Criteria for Assessing BPD Features and Inclusion
All studies in the sample refer to DSM criteria, with the DSM-IV being the most common (American Psychiatric Association, 1994). However, the thresholds at which participants were accepted into the studies varied significantly. Four studies required that participants met at least subthreshold criteria, thus having three symptoms present or more (Chanen et al., 2022; Gleeson et al., 2012; Jørgensen et al., 2021; Pistorello et al., 2012). Three papers required an additional risk factor such as self-harm behaviour, low-socioeconomic status, or history of abuse/neglect (Chanen et al., 2008, Mehlum et al., 2016; Asarnow et al., 2021). Chanen et al. (2008) specified additional risk criteria such as low socio-economic status or experience of previous abuse or neglect. Whereas Mehlum et al. (2016) required at least three BPD features as well as one episode of self-harming behaviour two weeks prior to entry. Similarly, Asarnow et al. (2021) required two BPD features, at least one suicide attempt, three of more episodes of self-harm over the individuals life and ≥ 24 on the Suicidal Ideation Questionnaire-Junior.
Measures of Functioning and Method of Assessment
Refer to Table 2 for specific measures of functioning and further details on their administration and scoring. Across the seven included studies, four measures of functioning were identified, with variation in functional measures between studies. Functioning was reported as a primary outcome in two studies (Chanen et al., 2008, 2022), with one of these studies (Chanen et al., 2022) examining multiple domains of utilising two outcome measures of functioning (SAS-SR; IIP-C).
Among the identified measures, two studies utilized the Social and Occupational Functioning Assessment Scale (SOFAS) (Chanen et al., 2008; Gleeson et al., 2014), while two studies employed the Children’s Global Assessment Scale (C-GAS) (Jørgensen et al., 2021; Mehlum et al., 2016). All studies assessed both social functioning and either educational or vocational functioning. However, it is important to note that the approach to assessing these domains varied. Three studies involved use of structured questionnaires in which the participant was prompted to provide feedback regarding their skills and the frequency of their behaviours or activities (Pistorello et al., 2012; Asarnow et al., 2021; Chanen et al., 2022). Four studies (Chanen et al., 2008; Gleeson et al., 2014; Jørgensen et al., 2021; Mehlum et al., 2019) employed functional measures which utilised a single score scaled summary of functioning through semi-structured interviews.
Intervention
A diverse range of therapies was observed in amongst selected studies. Three studies (Chanen et al., 2008; Gleeson et al., 2012; Chanen et al., 2022) utilized Cognitive Analytic Therapy (CAT) as the primary intervention, which was integrated with the Helping Young People Early (HYPE) model. The HYPE program is a specialized treatment program that adopts a multidisciplinary approach, incorporating relational clinical care, case management, general psychiatric care, and talking therapy (Chanen et al., 2008). Two studies (Mehlum et al., 2016; Asarnow et al., 2021) used DBT-A and employed similar intervention methods, including weekly individual therapy sessions, multifamily skills training, and telephone coaching with therapists outside of therapy sessions. Jørgensen et al. (2021) implemented Mentalization-based Treatment (MBT) in a group format, comprising three introductory sessions, five individual case formulation sessions, 37 weekly group sessions, and six sessions with parents.
There were variations in the duration of the interventions delivered across selected studies. Mehlum et al. (2016) conducted their intervention over 19 weeks, whilst Asarnow et al. (2021) delivered intervention over a 6-month period. Two other studies, Pistorello et al. (2012)d rgensen et al. (2021), completed their treatments over one year. Whereas Chanen (2008, 2022) and Gleeson et al., (2012) did not specify explicit time frames for their interventions; instead, they indicated the number of sessions, which were 16, 13, 17 sessions, respectively.
Comparison Condition
All papers included an active treatment condition as a control, with variations in the control condition therapy. Studies compared it to different descriptions of TAU or good clinical care (Chanen et al., 2008; Jørgensen et al., 2021; Mehlum et al., 2016). Descriptions of TAU or good clinical care varied between studies. Two studies discussed non-specific conditions integrating either psychodynamicly or cognitive-behavioural strategies (Pistorello et al., 2012; Mehlum et al., 2016).
Jørgensen et al. (2021) described TAU as a non-manualized approach that included psychoeducation, counselling, crisis management, and caregiver participation. However, sessions were conducted monthly. Chanen et al. (2022) compared the active treatment condition to two interventions within the Helping Young People Early (HYPE) model, one using befriending and the other integrating HYPE with a Young Persons Mental Health (YPMH). Asarnow et al. (2021) compared the active treatment condition to a general “individual and group supportive therapy” focused on addressing “thwarted belongingness,” emphasizing acceptance, validation, and fostering a sense of connection and belonging, with ad-hoc sessions involving parents.
Table 1 to be inserted at this point in the manuscript
Table 2 To be inserted at this point in the manuscript
Quality Appraisal
Quality appraisals using the ROB2 of included studies are detailed in Table 3. As such, studies were assessed for randomization process, deviations from intended interventions, missing outcome data, measurement of outcomes, and selection of reported results. Reviewers BB and AM followed the guidelines provided by the Cochrane Collaboration to assign judgments of low, some concerns, or high risk of bias for each domain. Study quality was rated low (n = 2), some concerns (n = 4) and high (n = 1).
Studies were not excluded based on their quality rating; however, quality was considered in the narrative synthesis.
To ensure consistent quality appraisal and establish inter-rater reliability, authors BB and AM independently assessed all seven papers. Initially, there was a weighted κ agreement of 0.818, which indicated substantial agreement. The use of a weighted kappa score was appropriate as the evaluated categories had an inherent order or hierarchy. Although there was initially a discrepancy in one paper, following discussion, complete agreement was reached.
Table 3: To be inserted at this point in the manuscript
Meta Analysis of Functional Outcomes
Chanen et al. (2022) included three conditions, where the experimental condition CAT + HYPE was compared with the condition that resembled an active TAU condition, which was HYPE + YPMH. Chanen et al. (2022) utilized two measures of functioning, namely IIP-C (interpersonal functioning) and SAS-SR (social adjustment, leisure, educational, and vocational). For the analysis, we chose the SAS-SR as the primary outcome measure because it offers a broader scope and has been consistently used in three other studies. This was in attempts to ensure comparability and enhance the validity of the findings.
Effect of Psychological Interventions on Functional Outcomes at Post-Treatment Seven studies (N = 506) were included in a meta-analysis of pre- and post-treatment effect sizes (ES). Overall, specialised psychological interventions did not significantly improve functional outcome scores when compared to control groups (p = 0.3742). The meta-analysis yielded a small effect favouring the intervention group (overall ES g = 0.08, 95% CI = -0.10–0.25). ES for individual studies ranged from − 0.18 to 1.23 and substantial significant heterogeneity was observed (T2 = 0.49, Q = 228.60, p < 0.001, I2 = 89.55%). See Fig. 2 for the forest plot.
Figure 2: To be inserted at this point in the manuscript
Figure 2 forest Plots for post-treatment effect sizes
Effect of Psychological Interventions on Functional Outcomes at Final Follow-up
Seven studies (N = 508) were included in a meta-analysis of pre- to final follow-up effect sizes (ES). Again, specialised psychological interventions did not significantly improve functional outcomes compared to control groups (p = 0.276). The meta-analysis yielded a slightly higher ES when compared to post-treatment. ES were still within the small range and favouring the intervention group (overall ES g = 0.16, 95% CI = -0.13–0.46). ES for individual studies ranged from − 0.05 to 1.27 and substantial significant heterogeneity was observed (T2 = 0.29, Q = 14.59, p < 0.024, I2 = 59.79%).
Figure 3 forest plots for effect size at final follow-up
Overall Effect on Functioning
All studies showed that participants improved in functioning from baseline to the end of the trial. Only two studies (Gleeson et al., 2012; Pistorello et al., 2012) found a significant positive effect of the experimental condition on functioning compared to the comparison condition. However, both studies were noted to have had some concerns or a high risk of bias. Only two studies were identified as low risk of bias.
At post-treatment Chanen et al. (2008) reported that the experimental group had a higher level of functioning albeit with a small effect size (SMD, 0.21). However, at the 24-month follow-up, participants in the GCC condition exhibited higher overall functioning levels (SMD, 0.32). Additionally, three studies (Pistorello et al., 2012; Mehlum et al., 2014, 2016; Asarnow et al., 2021) reported small effect sizes post-treatment between the DBT experimental group and the control group. The effect was maintained until final follow-up, although ES between the two conditions remained small. Jørgensen et al’s., (2020) results favoured the control TAU condition compared with the MBT experimental condition; ES were small at post-treatment but were maintained at the final follow-up (SMD, -0.05).
Chanen et al., (2022) note changes in functioning across all three conditions at post-treatment and final follow-up. On both the IIP-C and the SAS-SR, the HYPE with befriending was the most effective in improving functioning. Functional gains continued through to final follow-up on both measures of functioning in all conditions, however, the YMHS with befriending outperformed both the CAT with HYPE condition as well as the HYPE with befriending condition on the IIP.