Trial design
The present study was a two-arm, parallel groups randomized trial comparing ABFT with TAU. The study was approved by the regional committee for medical research ethics, South-Eastern Norway and the protocol was pre-registered (www.clinicaltrials.gov NCT01830088). Inclusion criteria were 1) a current major depressive episode as defined by the DSM-IV (39), 2) a score above 15 on the Grid Hamilton Depression Rating scale (GRID-HAMD, 40) and 3) currently living with their primary caregiver. Exclusion criteria were a diagnosis of any psychotic disorder, eating disorder, bipolar disorder, intellectual disability or pervasive developmental disorder.
Participants and Procedures
Participating adolescents and their families were recruited among adolescents referred to two child- and adolescent mental health service (CAMHS) clinics in South-Eastern Norway. The clinics were publicly funded, and all treatments were provided free of charge by the universal health insurance system of Norway. During the pre-specified recruitment period, September 2013 to January 2016, all referrals of adolescents (13 - 18 years) were examined for mentions of depression or core depressive symptoms (depressed mood, anhedonia, or fatigue). Through the use of the Affective Problems subscale on the Youth Self-Report (41) routinely administered by the CAMHS, adolescents with a score >6 were identified in addition to adolescents who were identified as depressed by their referral letters (42). Based on these procedures 276 patients were identified, contacted and checked for study eligibility. Altogether 160 adolescents were screened with the the Beck Depression Inventory-Second Edition (BDI-II) after which 100 adolescents and their parents went through a full clinical assessment (see CONSORT diagram, Figure 1). They met with a study-affiliated clinical psychologist (either the first or second author) at the CAMHS and written informed parental consent and adolescent assent were obtained. Adolescents and parents were then interviewed separately with a semi-structured diagnostic interview, the Schedule for Affective Disorders and Schizophrenia for School-Age Children Present and Lifetime Version (K-SADS-PL) (43) generating DSM-IV-TR diagnoses. All interviews were conducted by the study-affiliated clinical psychologists and were video-recorded. Both parents and adolescents completed self-report measures. If the adolescent met inclusion criteria, the assessing clinician conducted randomization by opening a sealed, numbered envelope containing the treatment allocation. Randomization was stratified by clinic, age (13-15 years and 16-18 years), gender, and severity of depression (GRID-HAMD score of ≤ 24 and ≥ 25). Sixty-one participants were finally randomized to either ABFT or TAU in a 1:1 ratio. Shortly after randomization, one patient withdrew consent resulting in a randomized study sample of 60 participants, 30 in each treatment condition.
Parents and adolescents were given feedback on diagnosis and treatment allocation at the end of the assessment session. The assessing clinician answered questions from parents or the adolescent concerning the assessment, and implemented standard safety procedures to the extent deemed necessary. CAMHS staff were then informed of treatment allocation and given a report of the assessment findings, and assigned the case to a study therapist.
Treatments
Treatment consisted of 16 weeks of either ABFT or TAU. ABFT consisted of weekly therapy sessions delivered according to the treatment manual by therapists trained in ABFT for the purpose of the trial. TAU was not manualized and the therapists were free to provide the treatment they considered most appropriate. Both treatments were provided for a minimum of 16 weeks, but could be extended depending on the therapists’ assessment of their patient’s needs. For both treatment conditions, results from baseline assessments of psychiatric diagnoses and symptoms were made available to the attending therapist before the first therapy session. If a patient’s data indicated high risk of self-harm or suicide, the study staff immediately notified the patient’s therapist. Two participants were on antidepressant medication during the trial and they were on a stable dose for at least 12 weeks before being enrolled in the study.
Therapist characteristics and training
Over a period of 2 years, 25 (88 % female) therapists delivered the treatments; 19 clinical psychologists, 2 medical doctors, 2 clinical pedagogues, 1 clinical social worker and 1 psychiatric nurse. Therapists delivered either ABFT or TAU only. Therapists varied in their theoretical orientation, including eclectic (40%), cognitive (16%), psychodynamic (4%), or family-oriented (4%) therapy. The therapists had an average of 7.2 years of clinical experience working with adolescents (SD =5.73, range 0 – 18). Eight therapists were trained in ABFT for the purpose of the study. Training consisted of a day-long introductory seminar, followed by a three-day workshop, as well as reading the treatment manual. Therapists were required to have completed one case of ABFT under supervision before treating study patients. All ABFT sessions were videotaped for supervision purposes. Therapists had ongoing supervision by an experienced ABFT therapist, reviewing video recordings of therapy sessions. Supervision of therapists was planned as weekly sessions. The original PI on this study was not certified as an ABFT therapist or trainer, but had very solid training in the approach. He served as the main clinical supervisor. Many of the weekly supervision sessions had to be conducted as peer supervision, when the supervisor was occupied or otherwise unavailable. Sometimes, supervision was conducted as a combination of peer supervision and discussion with the main supervisor by phone. To increase objectivity, the developers of ABFT provided some training, but had no involvement in the supervision and minimal involvement in the project.
TAU therapists were recruited from the regular staff of the CAMHS, and treated patients in the trial as part of their regular caseload. TAU was non-monitored and access to supervision varied by clinical experience, but all therapists could discuss cases in multidisciplinary teams.
Assessments and measures
For the duration of the treatment, patients completed self-report measures electronically every other week using a secure online platform (CheckWare Assessment Systems) (44). Some self-report measures were administered occasionally as paper and pencil questionnaires by the treating clinicians, because of technical difficulties. Post-treatment assessment at 16 weeks was conducted by independent raters (clinical psychologists trained for this purpose) blinded to treatment allocation. Both the main diagnosis and comorbid psychiatric diagnoses were determined based on the K-SADS at baseline, combining information from the adolescent and parent interviews. Interrater reliability was determined by blind scoring of 28 randomly selected videotaped interviews. Interrater reliability for depressive diagnoses based on the K-SADS interview in this study was κ= .56. The primary outcome measure was severity of depressive symptoms measured by the clinician-rated GRID-HAMD and participants’ self-report on the BDI-II. BDI-II, a widely used 21-item self-report inventory, was used to assess the severity of depressive symptoms every other week throughout the duration of the trial. Internal reliability was α = .94. GRID-HAMD was measured at pre- and post-treatment. GRID-HAMD has been shown to have good psychometric properties as a measure of depression severity (40, 45). The average Intraclass correlation coefficient (ICC) for GRID-HAMD scores in this study was .89, based on a two-way mixed consistency. GRID-HAMD scores are classified as no depression (0‐7); mild depression (8‐16); moderate depression (17‐23); and severe depression (>24) (46). Clinical response is defined as improvement in GRID-HAMD total score by ≥ 50% from baseline and remission from depression as GRID-HAMD score <5. Suicidal ideation was measured with the Suicidal Ideation Questionnaire-Junior (SIQ-Jr) (47), and was used in this study in the multiple imputation process.
Changes to the protocol
A power analysis was conducted before the trial start, based on previous ABFT studies. Assuming a 10% attrition rate from the acute phase, an intra-subject correlation of 0.5 on the longitudinal measures, an adjusted alpha of 0.001 to accommodate for multiple comparisons, and 80 % power, a sample size of N=100 would allow us to detect a small effect size (d=0.27). Our final sample, however, consisted of 60 adolescents and parents. As a consequenze of small sample size and missing data, the use of more stringent alpha levels in subsequent analyses were abondend, and multiple group comparisons were not conducted. Further, only the most important variables were included in subsequent analyses. According to the protocol we planned to assess the primary and secondary outcomes at weeks 12, 24 and 48. We originally intended to adopt a four week waiting period from randomization to treatment start but this turned out not to be feasible due to the severity of the depressive symptoms for many patients. The treatment period was extended from 12 to 16 weeks, and the first outcome assessment was conducted at week 16 and not 12 as specified in the protocol.
Statistical analysis
GRID-HAMD scores at 16 weeks post randomization were missing for 22 of 60 participants (36.7%). In some cases, adolescents actively declined to provide data. In other cases, when participants did not turn up for scheduled assessment appointments they were not targeted for renewed appointments to collect their data for practical reasons, such as lack of interviewer capacity. In both cases, we considered it likely that the probability of having missing outcome data was conditional on patient characteristics, and hence non-ignorable (48). We used baseline data to analyse correlates of missingness calculating polychoric correlations between a binary coding of missingness for week 16 GRID-HAMD, and the sumscores of an extended set of baseline variables available in the dataset (49). We found missingness to be positively correlated with negative self-statements, insomnia and suicidal ideation and negatively correlated with self-reported motivation for talking to a therapist. As we found several theoretically plausible predictors of missingness, we made the assumption that missingness was sufficiently conditional on observed variables, and conducted multiple imputation to handle the missing data (48),using the package ‘mice’ version 3.3.0 for R version 3.5.1, with RStudio IDE (50-52). Multiple imputation yields several complete datasets with variation in imputed values across the datasets preserving the uncertainty due to data being unobserved. Analyses are repeated across all the imputed datasets and results from these multiple analyses are then pooled for interpretation, allowing estimates of the influence of missing data on the obtained parameter estimates (for a highly accessible treatment of multiple imputation, see 53). Conducting multiple imputation of variables composed of multi-item scales can be challenging. Ideally, imputations should be conducted at the level of individual items (54). However, with several multi-item scales, the number of predictors in the imputation model will often surpass what is feasible with a limited sample size, as the number of predictors approach the number of observations. All variables in the model to be estimated using the multiply imputed data need to be included as predictors in the imputation, and including other variables as auxiliary predictors increases the plausibility of the necessary assumption of missingness being conditional on variables in the imputation model (51). Following recommendations of Eekhout and colleagues (55), we set up our imputation with separate imputation of the individual items of the GRID-HAMD only, and passive imputation of the total score, recalculating it iteratively each time the component scores were imputed.
We used baseline GRID-HAMD score, treatment condition and BDI-II and Suicidal SIQ-Jr scores at 16 weeks as predictors in the imputation model for GRID-HAMD scores on theoretical grounds (48). We examined both individual items and scale scores from other measures, including measures completed by the parents, to select auxiliary predictors for each GRID-HAMD item, as well as for the BDI-II and SIQ-Jr scores at 16 weeks (53). Predictors for imputing any missing values in these predictor variables themselves were selected using the ‘quickpred’ function of the mice package (51). We used the ‘midastouch’ version of predictive mean matching as the imputation method, which has better performance in small sample contexts (56). We imputed 40 datasets, using 50 iterations of the algorithm. Convergence of the imputation algorithm was assessed by visual inspection of traceplots and found to be satisfactory(48)
Data were analysed by intent-to-treat (ITT) principles. The primary hypothesis was tested with Fisher’s Exact Test. Linear mixed models were fitted to test whether treatment condition was significantly related to change in scores over time on the primary and secondary outcome variables. For primary outcome measured by GRID-HAMD, which had only two timepoints for the observations, we fitted the model to all the imputed datasets and then pooled the resulting estimates. Pooled estimates according to Rubins rules are reported (48). BDI-II had 55% missing across eight timepoints, primary outcomes with multiple observations were analyzed in a mixed modeling framework, handling missing observations using maximum likelihood estimation (49). Variable selection for multilevel analyses was implemented to minimize the information criteria (IC). Since a group mean can conceal changes at an individual level, a Brinley plot (57) was used to visualize within-subjects effects, from pre- to post-treatment score (Figure 2). The Brinley plot is based on the multiply imputed data set, and the uncertainty of the imputed scores is visualized in the plot.
Analyses were conducted using R (version 3.5.1) and the package lme4 (version 1.1-17) (50, 58) and an alpha level of 0.05.