PARTICIPANTS
Data was collected from a consecutive sample recruited from a specialized outpatient service for adolescents presenting with risk-taking and self-harming behavior between April 2013 and November 2018. The service provides low-threshold initial contact, state-of-the-art diagnosis of BPD features, and evidence-based therapy for adolescents with emerging BPD. Inclusion criteria were age 12 to 17 years and any type of risk-taking or self-harming behavior (e.g., repeated non-suicidal self-injury (NSSI), suicide attempts, binge drinking, substance misuse, excessive gaming and internet use, risky sexual behavior, impulsive and delinquent behavior). Participants were only excluded for insufficient knowledge of the German language.
PROCEDURES
The study protocol was approved by the ethics committee of the Medical Faculty at the University of Heidelberg, Germany (S-449/2013). Written informed consent was obtained from participants who were ≥ 16 years of age. If participants were younger than 16 years of age, they were asked for written informed assent and their parents or legal guardians for written informed consent. Participants underwent a comprehensive assessment at baseline (T0) and at one-year follow-up (T1), including demographic information (e.g., age, sex), semi-structured clinical interviews, and questionnaires. The assessments were conducted by specially trained clinical psychologists. Participants were reimbursed for participating in the follow-up assessment (20 Euro).
MEASURES
BPD symptoms and diagnosis were assessed using the Structured Clinical Interview for DSM-IV Axis II Personality Disorders (SCID-II) (31). Note that the DSM-IV BPD criteria are the same as in the DSM-5 Section II. Each criterion is rated as 1 = “not met”, 2 = “partly met”, 3 = “completely met”. Additional variables used in the current study included conduct disorder (CD) and antisocial personality disorder (ASPD) diagnoses according to DSM-IV, assessed using the SCID-II; alcohol use disorder (AUD) and substance use disorder (SUD) according to DSM-IV and ICD-10, assessed using the structured Mini International Neuropsychiatric Interview for Children and Adolescents (MINI-KID) (32); internet gaming disorder (IGD) according to DSM-5 (11), assessed using a structured clinical interview (33); frequency of suicidal thoughts and attempts, and of NSSI over the past year, measured by the Self-Injurious Thoughts and Behaviors Interview (SITBI-G) (34); severity of depression, assessed by the Children’s Depression Inventory (CDI) (35); symptom burden, assessed by the Global Severity Index (GS) of the Symptom Check-List-90-R (SCL-90-R) (36); illness severity, assessed by the Clinical Global Impression – Severity (CGI-S) scale (37); clinical improvement, measured by the Clinical Global Impression – Improvement (CGI-I) scale (37); psychosocial impairments, assessed by the DSM-IV Axis Five: Global Assessment of Functioning (GAF) (38); quality of life, assessed by the KIDSCREEN-10 (39); adverse childhood experiences, measured by the respective subscales for antipathy, neglect, physical abuse, and sexual abuse of the Childhood Experiences of Care and Abuse Questionnaire (CECA.Q) (40); and personality traits, assessed by four higher-order personality dimensions – Emotional Dysregulation, Dissocial Behavior, Inhibition, and Compulsivity – of the Dimensional Assessment of Personality Pathology-Basic Questionnaire (DAPP-BQ) (41).
STATISTICAL ANALYSIS
First, we calculated the prevalence rates of the SCID-II BPD criteria as a marker of heterogeneity in the current sample. Second, we investigated the underlying latent structure by comparison of LCA, FA, and FMM. Third, we conducted post-hoc analyses to characterize the best fitting model using the additional measures. Step 1 and 3 were conducted using Stata/SE, version 16.0 (42). Step 2 was performed using Latent GOLD® software, version 5.1 (43).
In order to investigate the latent structure of BPD (step 2), we applied LCA, FA, and FMM to the dummy-coded SCID-II BPD criteria. Ratings of 3 (“completely met”) were coded as 1 = “present”, ratings of 2 (“partly met”) and 1 (“not met”) were coded as 0 = “absent”. We closely followed the model building strategy proposed by Clark et al. (27). First, we fitted LCA models with increasing numbers of classes. Based on the literature, we estimated LCA models with one to four classes (19–24). Next, we modeled a single-factor confirmatory factor analysis (CFA) and the three-factor CFA reported by Sanislow et al. (25), which are the most replicated FA models in the BPD literature (10). Finally, we fitted FMM with one factor and two or three classes, respectively. As shown below, this was the endpoint combination of number of classes and factors determined by our best fitting LCA and CFA models (27). For each FMM, four variations with increasing measurement invariance were tested (27) (see SM Table 1 for further details on model specifications). Once the best fitting FMM was chosen, it was compared with the best fitting LCA and CFA models in order to determine the overall best fitting model (27).

The comparison of latent models was guided by statistical criteria, such as goodness-of-fit indices and entropy, and conceptual considerations (18). To compare LCA models and FMM with different numbers of classes, the parametric bootstrapped likelihood ratio test (BLRT) (44) was used. Notably, when comparing FMM with the BLRT, only models that have the same parameterization, but differing numbers of classes can be compared. For comparison of FA models and among different model types (LCA, CFA, and FMM), the Bayesian Information Criterion (BIC) (45) and its sample size adjusted version (SABIC) (46) were used. The BIC is considered to be stricter than the SABIC (47,48). The BIC and SABIC are computed as a function of the log likelihood with a penalty for model complexity (18,27,49). A difference of more than 10 in BIC values between two models indicates support for the model with the lower value (50). In addition to the fit indices discussed, entropy was evaluated, which is a measure of the degree to which the latent classes are distinguishable and the precision with which individuals can be placed into classes. It ranges from 0 to 1, with higher values indicating clearer class separation. A value of ≥.80 is recommended, when participants shall be classified based on the “most likely class membership” resulting from LCA or FMM for further analysis (51).
Having identified the best fitting model, we examined the effects of sex and age as covariates (52), as these parameters may influence BPD symptom expression (53,54). In particular, we estimated the extent of the between-class and within-class variation of the best fitting model (see below) that was due to sex and age. This was done by regressing the class (corresponding with between-class variation) or the observed variables (corresponding with within-class variation) on sex and age (26) (further details on the covariate models are given in SM Figure 1). Thereby, we fixed the age effect to be the same for all BPD criteria.
Finally, post-hoc analyses (step 3) were conducted in order to characterize the classes identified by the best fitting model (see below). Therefore, participants were grouped according to their most likely latent class membership and compared with regard to demographic (age, sex), predisposing (adverse childhood experiences, personality traits), and clinical variables (BPD diagnosis and number of symptoms, CD/ASPD, AUD, SUD, IGD, NSSI, suicidal behavior, depression, symptom burden, quality of life, functional impairments, illness severity, and clinical improvement) at baseline and at follow-up. For the comparison of categorical variables, chi-square tests or Fisher’s exact tests, if expected cell counts were less than five, were used. For continuous variables, Mann-Withney U tests were used, when the assumption of normality was violated as indicated by a significant Shapiro-Wilk test. Effect sizes (Cramer’s V and Pearson’s correlation coefficient r) and corrected significance levels according to the method described by Benjamini and Hochberg (55) (in order to control for the increase of the type I error according to multiple testing) were reported for all group comparisons. Differences in continuous variables by group over time were tested using mixed-effects linear regression analyses. Measurement time point (T0, T1), latent class membership (borderline group vs. impulsive group), and their interaction were used as fixed effects, the study ID as a random effect. In case of missing values, the analyses were conducted on the subsample with complete data.