Participants and procedures
We assessed 35 symptomatically remitted patients with BPD (BPD-REM) and compared them with the participants reported in Santangelo et al. , i.e. 60 patients with acute BPD (BPD-ACU) and 60 healthy controls (HC). Thus, a total of 155 female participants between 18 and 64 years of age (mean age of 31.15 ± 9.89 years) was analyzed in this study. All patients in the BPD-ACU group met the DSM-IV diagnostic criteria for BPD
, whereas all participants in the BPD-REM group constituted participants with a symptomatic remission, i.e. participants who met less than five BPD criteria according to DSM-IV within the past year. Patients with acute BPD were recruited from the waiting list for a residential dialectical behavior therapy treatment program  at the Central Institute of Mental Health Mannheim in Germany. The HC were randomly selected from the national resident register of the City of Mannheim. Further details on the recruitment of the BPD-ACU and the HC group are reported in Santangelo et al. . The 35 remitted patients with BPD constitute a subsample of the 58 participants enrolled in the study by Zeitler et al. . Zeitler et al.  contacted former patients with BPD 12 to 18 years after taking part in one of two dialectical behavior therapy treatment studies at Freiburg University in Germany [37, 38]. All these patients had initially been diagnosed as meeting BPD criteria using standardized diagnostic instruments. Further details are reported in Zeitler et al. . Only participants with a symptomatic remission or loss of diagnosis (i.e. who met four or less BPD criteria according to DSM-IV within the past year) were enrolled in the current e-diary study. Of the 35 BPD-REM participants, n = 9 fulfilled none of the BPD criteria, whereas n = 11 fulfilled one, n = 5 fulfilled two, n = 7 fulfilled three, and n = 3 fulfilled four diagnostic criteria for BPD at the time of enrollment in the study. The most prevalent BPD criteria in the BPD-REM participants were “stress-related paranoia or dissociation” (n = 13, i.e. 38%), followed by “suicidal and self-harming behavior” and “affective instability” (each fulfilled by n = 12, i.e. 34%), and “identity disturbance, unstable self” (n = 6, i.e 17%). All the other diagnostic criteria were fulfilled by only a small number of BPD-REM participants (≤ n = 3, i.e. ≤ 9%). It is important to note that both the BPD-ACU patients and the BPD-REM participants were acquired before (BPD-ACU) or after (BPD-REM) treatment on specialized units for dialectical behavior therapy treatment that were run by a team around Martin Bohus, who implemented dialectical behavior therapy treatment in Germany.
In all groups, axis I disorders were assessed using the German version of the Structured Clinical Interview for DSM–IV axis I Disorders (SCID–I; ). In the BPD-ACU and the HC group, axis II disorders were assessed using the Structured Clinical Interview for DSM–IV axis II Disorders (SCID–II; ). Participants in the BPD-REM group underwent a thorough diagnostic procedure to assess current DSM-IV BPD criteria using the International Personality Disorder Examination (IPDE; ). Postgraduate psychologists administered all three well-validated diagnostic instruments with very good psychometric properties (e.g. SCID–I kappa=.71, SCID–II kappa=.84, IPDE kappa=.80; [42, 43]). The exclusion criteria for enrollment in the study differed by group. In the BPD-ACU group, patients with a history of schizophrenia, bipolar disorder, or current substance use disorder were not enrolled in the study. For the HC group, individuals with any current or past axis I or axis II diagnoses were excluded from the study. The exclusion criteria for the BPD-REM group were acute intoxication with alcohol or drugs and current psychotic, manic, or severe depressive episodes. Whereas the age of the participants in the BPD-ACU and the HC group was restricted to ranging from 18 to 46 years, no restrictions regarding age were applied in the BPD-REM group. Table 1 provides sample characteristics by group. A Kruskal-Wallis H test revealed significant differences regarding age in the three groups, with subsequent Mann-Whitney-U tests indicating that the participants in the BPD-REM group were significantly older compared to both the BPD-ACU and the HC participants (Table 1; BPD-REM vs. BPD-ACU: Mann-Whitney-U (n1 = 60, n2 = 35) = 166.5, p < .001; BPD-REM vs. HC: Mann-Whitney-U (n1 = 60, n2 = 35) = 142.0, p < .001). These age differences were expected, as the BPD-REM participants constituted former patients with BPD, who were treated in the dialectical behavior therapy inpatient treatment program in Freiburg in the years 1995 through 2002, whereas the patients in the BPD-ACU group were recruited from the waiting list of the residential dialectical behavior therapy treatment program in Mannheim in the years 2008 through 2013. The BPD-REM and the BPD-ACU participants did not differ regarding the percentage of participants in each group taking psychotropic medication. Comorbidities were common in both the patients in the BPD-ACU group as well as the participants in the BPD-REM group. The most frequent co-occurring DSM-IV axis I diagnoses included mood disorders (BPD-ACU: n = 38, 63%; BPD-REM: n = 9, 25%) and anxiety disorders (BPD-ACU: n = 36, 60%; BPD-REM: n = 18, 51%) with posttraumatic stress disorder being the most prevalent anxiety disorder in both groups (see Table 1). Nevertheless, patients in the BPD-ACU group had significantly more co-occurring axis I disorders compared to the participants in the BPD-REM group, Mann–Whitney U (n1 = 60, n2 = 35) = 662.0, p < .01.
E-diary assessment and measures
Data on affective instability and instability of self-esteem were collected during participants' daily lives. After completing the diagnostic assessments, participants were thoroughly instructed and trained regarding the use of the e-diary, which they carried on four consecutive days while undergoing their usual everyday life activities. Participants in the BPD-ACU and the HC group received a palmtop computer (Tungsten-E, Palm Inc., USA) programmed with the IzyBuilder software (IzyData Ltd., Switzerland) to function as an e-diary, whereas the participants in the BPD-REM group received a study smartphone programmed with the movisensXS app (movisens GmbH, Karlsruhe, Germany). We checked for basic differences between the assessment devices and found no differences (cf. ).
On the following four days, the e-diary emitted a prompting signal according to a pseudorandomized time-sampling schedule in hourly intervals (60 minutes ± 10 minutes) from 10 am to 10 pm. Participants were prompted 12 times a day, resulting in a total of 48 prompts per participant over the four-day assessment period. Each response was automatically time-stamped by the e-diary. After completing four assessment days, participants returned the e-diaries, were debriefed, and financially compensated based on the number of completed data entries (40 to 50 Euros). Moreover, participants were asked about their experiences with the e-diary procedure using a post-monitoring questionnaire with six questions, all on a 5-point rating scale ranging from 1 = not at all to 5 = very much (the questionnaire constitutes an adaption from ). Overall, participants reported low to medium reactivity with higher ratings of burdensomeness in the BPD-ACT group (Table 1). Even though the overall test was significant, the differences between the groups were small and the only significant post-hoc difference emerged between the BPD-ACU and the HC groups (Mann-Whitney-U (n1 = 60, n2 = 60) = 8.1, p < .001). The study was approved by the institutional review board of the Medical Faculty Mannheim, Heidelberg University, and all participants provided written informed consent before participating in the study. Participants' adherence to the e-diary protocol (that is, the number of answered e-diary prompts) was very good with a mean compliance rate of approximately 90% (median = 93.75%) and did not differ between groups (Table 1). However, one of the BPD-REM participants encountered technical problems on the first day of the assessment, and therefore no e-diary data of this person was available. Thus, the BPD-REM e-diary data set comprised data on 34 BPD-REM participants.
At each prompt, participants rated their current affect and self-esteem. To assess participants' momentary affective states, we used a specifically designed and validated measure for repeated assessments of momentary affective states in e-diary studies . Momentary affective state was conceptualized as varying along two dimensions, and participants rated two bipolar items for each valence (ranging from unpleasant to pleasant) and tense arousal (ranging from restless/under tension to calm/relaxed). In more detail, the item wordings of the valence scale were the German equivalent of “At this moment I feel: unwell–well” and “content–discontent” and of the tense arousal scale “At this moment I feel: agitated–calm” and “relaxed–tense”, whereas the latter item of each scale was reverse coded. Patients with a palmtop computer rated the four bipolar items regarding their momentary affective state on a 7-point rating scale ranging from 0 to 6, whereas those with a study smartphone rated each item on a visual analog scale ranging from 0 to 100. To yield comparable values, ratings of the visual analog scale (0 – 100) were converted into the 7-point rating scale (0 – 6) for the four items.
To assess participants' current self-esteem, we used a four-item short form of the Rosenberg Self-Esteem Scale . Items 1, 2, 9, and 10 of the original scale were adapted to assess the participants' current status (i.e. the wording "on the whole, …" was replaced by "at the moment, …"). The item wordings of the items used were “At the moment:” (1) “I am satisfied with myself”; (2) “I think I am no good at all”; (3) “I am inclined to feel that I am a failure”; (4) “I take a positive attitude toward myself”, with items 2 and 3 being reverse coded. The original four-point rating scale was expanded to increase the potential variability in the ratings (see [13, 48]). In more detail, patients with a palmtop computer rated the four items on a 10-point rating scale ranging from 0 to 9, whereas those with a study smartphone rated each item on a visual analog scale ranging from 0 to 100. To yield comparable values, ratings of the visual analog scale (0 – 100) were converted into the 10-point rating scale (0 – 9). The items to assess participants' momentary affective states and self-esteem have been successfully used in prior studies [18, 44]. In the present sample, we conducted variance component analyses to examine whether our measures of valence, tense arousal, and self-esteem were able to assess within-person change over time reliably. The reliability of the items was very good in our sample (valence RC = .76, tense arousal RC = .73, self-esteem RC = .84), which is in line with the high reliability of the e-diary scales reported in our prior studies (see [18, 44]).
Single point-in-time assessment of the general level of functioning and quality of life
To assess the general level of functioning, we used the interviewer ratings on the Global Assessment of Functioning scale of the DSM-IV (GAF; ). The GAF scale assesses how severe a person's mental illness is and how much a person's symptoms affect his or her everyday life. Interviewers subjectively rate the social, occupational, and psychological functioning of an individual, covering the range from positive mental health to severe psychopathology. The GAF is constructed as an overall measure with 100 scoring possibilities of the level of functioning (1 – 100), whereas higher scores indicate greater levels of functioning.
Participants in the BPD-REM group filled the World Health Organization Quality of Life questionnaire (WHOQOL-BREF; ). The WHOQOL-BREF comprises 26 items, which measure four broad domains of quality of life, namely, physical health, psychological health, social relationships, and environment. In addition, there are two items that measure the overall quality of life and general health. Participants are asked to rate how much they have experienced the items in the preceding two weeks on a 5-point rating scale ranging from 1 (not at all) to 5 (extremely/completely/always). The raw domain scores were transformed according to guidelines , resulting in a mean domain score that is between 4 and 20, whereas higher scores indicate a greater quality of life. The instrument has good to excellent psychometric properties and constitutes both a reliable and valid measure of participants' quality of life . In our sample, Cronbach's alpha for the WHOQOL-BREF was very good (α=.89) with moderate to good Cronbach's alphas for the four subscales (αphysical health = .71, αpsychological health = .82, αsocial relationships = .54, and αenvironment = .76).
Data preprocessing and statistical analyses
Data preprocessing. We created composite valence, tense arousal, and self-esteem scores by inverse scoring the negatively poled items and then calculating the mean values of the respective items for each administration of the scale. For the variables included in the analyses, possible values ranged from 0 to 6 for valence and tense arousal, and from 0 to 9 for self-esteem.
Analyses of instability. In the current study, we applied identical statistical procedures as in the original study . Thus, we calculated three instability indices that allow for examining group differences while taking into account the temporal structure of the unstable processes: Squared successive differences (SSD; ), probability of acute change (PAC; ), and aggregated point-by-point changes (APPC; ), i.e. decreases and increases in relation to the preceding rating. We calculated the SSD by first determining the differences of two consecutive assessments and then squaring these differences. Thus, large differences between two measures are given a higher weightage than smaller differences. We determined the PAC by defining acute changes, i.e. the changes in the top 10 percentile of the distribution of successive differences over all persons. The cut points corresponding to the 90th percentiles were 2.75 for self-esteem, 2 for valence, and 2.5 for tense arousal. Hence, successive differences were declared acute changes when the differences of two consecutive assessments were equal or greater than these predetermined cut points. To analyze group differences, specific multilevel models were used for analyzing SSD (a gamma model with a log link) and PAC (a logistic model with a logit link) in a two-level model. To examine group differences regarding global instability (i.e. SSD) and the likelihood of extreme changes (i.e. PAC), we analyzed a total of six models, i.e. one model each for SSD and PAC of valence, tense arousal, and self-esteem.
We calculated the APPC by decomposing the self-esteem and valence time series into decreases and increases in relation to the preceding rating of the decreases or increases (i.e. point-by-point changes). Thus, APPC descriptively describe whether specific patterns of increases or decreases characterize instability, i.e. whether changes (ups or downs) are related to specific states (e.g. only during high self-esteem or highly positive valence). By disentangling the time series and decomposing them into point-by-point changes, we obtained multiple decreases and increases in self-esteem and valence for each participant. We aggregated these changes by their momentary starting state into five nearly equal self-esteem bins and valence bins, respectively. For self-esteem decreases the five bins correspond to the following ratings: low = 0.25 – 2, mid-low = 2.25 – 3.75, mid = 4 – 5.5, mid-high = 5.75 – 7.25, and high self-esteem = 7.5 – 9, whereas for self-esteem increases the five bins correspond to the ratings: low = 0 – 1.75, mid-low = 2 – 3.5, mid = 3.75 – 5.25, mid-high = 5.5 – 7, and high self-esteem = 7.25 – 8.75. For valence decreases the five bins correspond to the ratings: low = 0.5 – 1.5, mid-low = 2 – 3, mid = 3.5 – 4, mid-high = 4.5 – 5, and high valence = 5.5 – 6, and for valence increases the five bins correspond to the ratings: low = 0 – 0.5, mid-low = 1 – 1.5, mid = 2 – 2.5, mid-high = 3 – 4, and high valence = 4.5 – 5.5. We conducted multilevel analyses to analyze the aggregated between-group changes among these bins (see [18, 20]). To counteract the problem of multiple comparisons, we used the Bonferroni-Holmes correction .
We also examined the strength of the association of self-esteem instability with affective instability by analyzing random slope two-level gamma log link models, in which SSD of self-esteem were predicted by SSD of valence and SSD of tense arousal, respectively, at Level 1 in slopes-as-outcomes linear mixed models. That is, we extracted one slope parameter per person, reflecting the association between changes in self-esteem and changes in valence (tense arousal, respectively) of each person. We then used the extracted slopes of these models in linear regression models to predict the single point-in-time assessments of (i) the general level of functioning, i.e. the GAF score; and (ii) the overall quality of life and general health as well as the four domains of quality of life assessed by the WHOQOL-BREF, i.e. physical health, psychological health, social relationships, and environment, in the BPD-REM group. To put it simply, we were interested in whether a strong link between concurrent changes of self-esteem and affect is associated with higher impairments in functioning and quality of life.
We used the R  function for generalized linear mixed models "glmer" (package "lme4"; ) to test our hypotheses. The specific models are described in more detail in Santangelo et al. . We solely report the comparisons of the BPD-REM group with the BPD-ACU and the HC group since the comparisons of the latter two groups have been reported elsewhere .