The preliminary hypothesis on which the study is based suggests that the more dysfunctional clinical variables that are present, the greater the psychological vulnerability to suicidal behavior.
Participants
The data analyzed here were drawn from responses from a clinical sample of patients with mood and anxiety symptomatology. Sampling was consecutive and purposive based on the availability of participants. Participants were either ambulatory or hospitalized patients from three mental healthcare centers serving three socioeconomic strata (high, medium and low) in Greater Santiago, Chile.
Inclusion criteria
The study included female and male participants who were available to participate in the study, who were able to distinguish between fantasy and reality, and who were in an emotional and cognitive state that enabled them to answer the assessment questions. Patients consulting for addiction, eating disorders, psychotic disorders or cognitive disorders were not included to control for the diagnosis variable, but it was recognized that these pathologies can be strongly linked to suicide risk (Erlangsen, Zarit, & Conwell, 2008; Herzog et al., 2000; OCDE, 2015). In addition, data from patients who chose not to participate or who later withdrew from the study were not included.
Participants were undergoing treatment as usual (TAU), which in the case of hospitalized patients consisted of crisis intervention with psychological, psychiatric, and occupational therapy approaches. For outpatients, treatment consisted of psychiatric and psychological approaches. This study was a cross-sectional evaluation of specific moments in participants’ timelines.
Psychiatric diagnoses were made in collaboration with the treating teams according to the diagnostic criteria established in the Diagnostic and Statistical Manual of Mental Disorders, 4th Edition (DSM-IV-TR) (American Psychiatric Association, 2013).
Prior findings
Previous results from the same group of patients were used for the development of Bayesian networks. SVM analysis provided variables that placed patients in either a risk or no risk condition. DT analyses showed possible configurations of combinations that could be located (depending on the route) in a state of risk or no risk. Such prior results were used to select relevant clinical and personality variables that either made individuals less likely to experience the psychological vulnerability associated with suicidal risk or placed them in such a state. These variables included psychological distress resulting in dysfunctionality, a dysfunctional experience, expression of aggression, factors that prevented suicidal behavior, destructive depressive experiences, and satisfaction with family functioning (Morales et al., 2014, 2015; Morales, Echávarri, Barros, et al., 2017; Morales et al., 2016; Taylor, Morales, Zuloaga, Echávarri, & Barros, 2012).
The characteristics of the analyses included in this study are detailed below:
- Support vector machine (SVM) model[1]. This technique was used to define whether a group was either at risk or not at risk by using supervised learning models linked to learning algorithms that analyzed and recognized patterns. The model generated 22 variables that, depending on the circumstances in which they occurred, defined whether a person belonged within a suicide risk configuration (Barros et al., 2017).
- Decision tree (DT) model[2]. This technique was used to process and analyze large quantities of explicative variables. Based on the lowest Gini index (Bramer, 2007) and given an appropriate and a sufficient number of questions, it was possible to identify four decision trees and a trajectory of psychological variables, which created a state of vulnerability to suicidal behavior (Morales et al., 2017). The progression of these analyses is detailed in the Results section below, where we mention our prior work (Barros et al., 2017; Morales et al., 2017).
- Sociodemographic and clinical information. Several descriptive variables were assessed: demographic, social, clinical, diagnostic, reasons for seeking treatment, and a description of the participant’s behavior or suicidal ideation, when applicable.
- In the present study, we further identified conditional relationships among variables using the graphical model technique (Antonucci & Zaffalon, 2014), specifically the Bayesian network technique (Sucar & Tonantzintla, 2006).
A probabilistic graphical model is defined as a collection of graphs representing conditional probabilities between different variables. The Bayesian network is a type of probabilistic graphical model in which a defining graph fulfills certain specific properties (acyclic and directed).
Selected graph theory concepts are defined below:
- Graph: A collection of nodes or vertices as well as a collection of arcs (or edges) in which each arc connects nodes and is visually represented with lines that join nodes.
- Directed graph: A graph where all the arcs are directed; that is, they have a starting node and an ending node and are represented by arrows on the arcs.
- Acyclic graph: A graph is acyclic if it is directed, and there are no sequences of arcs that start at one node and end at the same node. In other words, there is no "route" that leaves from and arrives at the same node.
- Node parent: Node A is considered a parent of node B if there is a directed edge from A to B.
- Node children: Node C is considered a child of node B if there is a directed edge from B to C.
A Bayesian network consists of the following:
- A network structure represented by a directed acyclic graph (DAG) where there is a collection of nodes, in which each node represents a random variable, and each edge represents a dependency relationship or correlation between variables.
- A probability distribution of parameters that can be deconstructed in a local probability distribution based on arcs found in the graph.
- Codes for conditional dependence relationships among the variables in its graph, revealing joint probability distributions expressed as factorizations of local probabilities, in which joint probabilities of all the variables can be calculated as the products of the probabilities of all the variables given their parent values.
Instruments
We developed a psychological evaluation instrument (Barros et al., 2017) available both online and offline. The instrument includes 25 questions to be answered on a Likert scale, and the answers are analyzed based on an algorithm defined by the Bayesian network model. The results identify whether a patient is in a state with SB characteristics. Then, each respondent is placed on a continuum of discomfort/well-being and fragility. This tool also considers risk factors and protective factors. The results from Barros et al. (2017) identified areas of interest for particular psychotherapeutic interventions for each respondent: a) feelings of satisfaction/dissatisfaction with life; b) state of satisfaction/dissatisfaction with oneself and achievements; and c) reasons to live/stay alive if you are thinking about committing suicide. An example of the results is presented in the results section below.
Data collection
Potential participants were asked to sign an informed consent form and were then asked to respond to the questions included in the following instruments: Outcome Questionnaire (OQ-45.2), State-Trait Anger Expression Inventory (STAXI-2), Reasons for Living Scale (RFL), Depressive Experience Questionnaire (DEQ), Family APGAR, and sociodemographic and clinical questionnaires. Detailed descriptions of these questionnaires can be found in Barros et al. (2017) and Morales et al. (2017).
Participants were guided through the questionnaires and consent form by specially trained evaluators. If participants were minors, the consent form was given to their guardian or caregiver. The probability and protocol were approved by the institutional ethics committees of the School of Medicine at the Catholic University of Chile and the Sótero del Río Hospital.
The aims and methodology of the study were explained to participants, as well as the unpaid nature of their participation. Costs, risks of participating in the study, the voluntary nature of participation, the right to withdraw from the study, and confidentiality were also explained. Authorizations from treating physicians were also requested for patients’ participation in the study, and any potential deterioration in the patients’ mental states during the study was to be noted. No incidents were recorded during this study. Participants were also offered the opportunity to inquire further about the study by contacting the principal investigator. Health clinicians, researchers, and mathematical analysts collaborated in offering assistance to participants throughout the study.
Descriptive analysis of the data
Participants were categorized into the following two groups: 1) with suicidal behavior, as indicated by attendance of consultations regarding a suicide attempt or suicidal ideation within the preceding year (n=326); 2) without suicidal behavior, as indicated by attendance of mental health consultations with no suicide attempt or signs of suicidal ideation within the preceding year (n=324). The sample included 650 ambulatory mental health patients between the ages of 14 and 85 (adolescents, young adults, adults, and seniors) who were recruited between June 2010 and December 2014. Of this sample, 95.38% had been diagnosed with mood disorders (DSM IV-R). Among the total sample, the average age was 39.77 ± 15.03, with an age range between 14 and 83 years old. There were 517 women (79.54%) and 133 men (20.46%). Sociodemographic characteristics are detailed in Table 1.
The total sample was also mainly characterized by patients diagnosed with affective disorders, most commonly major depressive disorder (43.38%; n=282), of whom 28.09% (n=91) had not exhibited SB (attempt or ideation) during the past year, and 58.59% (n=191) had attempted suicide. Of the 191 patients with SB who had been diagnosed with major depressive disorder, 26.18% (n=50) made high-severity suicide attempts, 19.90% (n=38) made low-severity suicide attempts, and 53.93% (n=103) presented suicidal ideation. Low-severity suicide attempts were characterized by minimal intentions of dying, the low subjective or objective lethality of the attempt, and the deployment of efforts to be saved after the suicide attempt. On the other hand, high-severity suicide attempts were characterized by strong intentions to die as well as high subjective and objective lethality, with no efforts to be saved being made after the attempt. The psychiatric diagnoses are shown in Table 2.
Regarding the age distribution, the total sample included patients in the following age groups: 14-19 years old (n= 76; 11.69%); 20-29 years old (n=119; 18.31%); 30-39 years old (n=123; 18.92%); 40-49 years old (n=125; 19.23%); 50-59 years old (n=146; 22.46%); 60 years and up (n=61; 9.38%). The age distributions are shown in Table 3.
Data analysis
Given the large number of variables currently available, it was necessary to perform a feature selection to narrow down what was to be modeled. This approach follows the principle of parsimony, indicating that if two models show the same performance, the model that has a smaller number of variables would be preferred. Consequently, considering previous work, we based the analysis of this study on two primary explorations using SVM and DT, as explained above. The software was R Project for Statistical Computer (R Core Team, 2019).
Initial data processing
The results from the previous SVM analysis provided a model that selected 22 variables, which, depending on the circumstances, could define whether a person was in a suicide risk zone (accuracy = 0.78, sensitivity = 0.77, and specificity = 0.79). The assessment of all these variables allowed a determination of whether a patient was at risk of attempting suicide or was actively thinking of attempting suicide. Interrelationships between these variables were multiple and contributed to the particular ways in which variables were configured for each case. The metrics and analysis are presented in Barros et al., 2017.
The results from the DT analysis showed the flow of responses as a trajectory of psychological variables that constituted a current situation of suicide risk (or no risk). Four trees distinguishing the groups were established, and the elements of one tree were analyzed in greater detail since they included both clinical and personality variables. This tree consisted of six nodes without suicide risk and eight nodes with suicide risk. Decision tree 01 had a 0.674 accuracy value, a 0.652 precision value, a 0.678 recall value, a 0.670 specificity value, an F measure of 0.665, and a 73.35% receiver operating characteristic (ROC) area under the curve (AUC). Decision tree 02 had a 0.669 accuracy, a 0.642 precision, a 0.694 recall, a 0.647 specificity, a 0.667 F measure, and a 68.91% ROC AUC. Decision tree 03 yielded a 0.681 accuracy value, a 0.675 precision value, a 0.638 recall value, a 0.721 specificity value, a 0.656 F measure, and a 65.86% ROC AUC. Decision tree 04 showed a 0.714 accuracy value, a 0.734 precision value, a 0.628 recall value, a 0.792 specificity value, a 0.677 F measure, and a 58.85% ROC AUC. The metrics and analysis are described in Morales et al., 2017.
Taking the prior findings as inputs (i.e., the support vector machine and decision tree results mentioned above), we started with 25 variables. These variables were grouped and reprocessed into categories as follows: demographics were categorized as discrete values for classification, the Reasons for Living (RFL) questions were grouped into two variables, and question 25 of the Reasons For Living Questionnaire was kept separate because it was shown to be a relevant variable on its own, while the remaining questions were grouped as a single variable (due to strong correlations among them as seen in Figure 1) by using the averages of their values; items from the Outcome Questionnaire (OQ) were grouped into a single variable (due to strong correlations among them) by using the averages of their values, except for question 8 from the Outcome Questionnaire, which was kept separate because of its relevance as a question on its own (Lambert et al., 1996; Von Bergen & de la Parra, 2002). Figure 2 presents a matrix of correlations between the selected questions from the Outcome Questionnaire.
The Depressive Experience Questionnaire (DEQ) underwent a different preprocessing procedure. Correlations among the variables were weak (Figure 3). Therefore, a principal component analysis (PCA) was performed with standardized variables (Shlens, 2014), and the first two main components that explained 56.437% of the variance were chosen (Table 4). The first main component, called “low self-esteem”, included the being unable to accept personal plans and goals, having feelings of inner emptiness, becoming terrified when alone, having feelings of personal distress linked to success/failure, being concerned about what others can provide in relationships, and having feelings of dissatisfaction with oneself. This first component had higher coefficients associated with variables DEQ_16 and DEQ_19 and lower scores associated with variables DEQ_56 and DEQ_62, with negative effects on the latter two variables. The second main component, called “interpersonal sensitivity”, included accepting personal plans and goals, setting very high goals, becoming terrified when alone, fluctuating between feeling big and small, not feeling jealous in relationships, and needing things that only others can provide. Variables for the second component all had coefficients greater than 0, and the variables from items 3, 19, and 56 of the DEQ had the greatest impact on this component. The quadrants were configured with the following distribution:
- 00: Patients with low scores for main components 1 and 2 (high self-esteem and low interpersonal sensitivity);
- 01: Patients with a low score for main component 1 and a high score for main component 2 (high self-esteem and high interpersonal sensitivity);
- 10: Patients with a high score for main component 1 and a low score for main component 2 (low self-esteem and low interpersonal sensitivity);
- 11: Patients with high scores for main components 1 and 2 (low self-esteem and high interpersonal sensitivity) (Table 5).
The coefficients of the DEQ variables for each of the two components are shown in Table 4 and Figure 4. Feature transformation for the DEQ variables was necessary to narrow the scope of the problem. Regarding this questionnaire, the selected components rather than the original variables were used to calibrate the Bayesian network model.
With the new feature obtained from the DEQ variables, in the total sample, 24.3% (n=158) of patients had high self-esteem and low interpersonal sensitivity, 24.3% (n=158) of patients had high self-esteem and high interpersonal sensitivity, 26.3% (n=171) of patients had low self-esteem and low interpersonal sensitivity, and 25,1% (n=163) of patients had low self-esteem and high interpersonal sensitivity.
Meanwhile, participants in the group with SB had the following characteristics: 15.3% (n=50) of patients had high self-esteem and low interpersonal sensitivity, 17.4% (n=57) of patients had high self-esteem and high interpersonal sensitivity, 38.2% (n=125) of patients had low self-esteem and low interpersonal sensitivity, and 29.1% (n=95) of patients had low self-esteem and high interpersonal sensitivity.
Model calibration
The calibration of the model was achieved using cross-validation. The process consists of two stages:
1) Learning the structure of the network
2) Learning the parameters of the network.
For the first stage, a mixed approach based on a) clinical expertise and knowledge and b) heuristics was used to learn the structure from the data. Based on clinical expertise and domain knowledge, an initial graph was defined regarding the relationships that we wanted to keep in the structure based on their clinical relevance. Additionally, a set of 'blacklisted' arcs were defined if the arcs that we did not want were part of the graph. This graph is shown in Figure 5. Then, the structure of the final graph was completed by using a search algorithm, and several methods were tested in the calibration process (Grow-Shrink, Incremental Association, Fast Incremental Association, Interleaved Incremental Association, Hill-Climbing, Tabu search, and Max-min Parents and Children).
New relationships were formed according to the existing correlations in the data, which generated the Bayesian network seen in Figure 6.
Subsequently, with the structure for each algorithm already in place, parameters associated with joint probability distributions were calibrated using the data and the Bayesian method for estimating parameters. Finally, the tabu search algorithm (Nowicki & Smutnicki, 1996) was selected based on its lower average classification error and lower error classification variance over the cross-validation process. The results of the search algorithm calibration are shown in Figure 4.b.
Evaluating model fit
As indicated above, the cross-validation technique was used to calibrate the structure and parameters. To determine the model fitness of the calibration process, the same technique was used, and the leave-one-out cross-validation method was also used to evaluate the final model. In both cases, we calculated precision and other relevant performance measures of the resulting model, and the details of each method are presented as follows.
1) The leave-one-out cross-validation method (LOOCV) was used, in which the model was trained with N-1 cases and its accuracy was calculated for the remaining cases (those that were not used for training). This procedure was repeated for each group of data, and the success average was equivalent to the precision estimator. With the LOOCV method, the Bayesian network model fit was 0.7046. The indicators are shown in Table 7.
2) Repeated 10-fold cross-validation was applied to calculate the model fit by repeating the process 100 times, which gave an average accuracy value of 0.701 (Figure 7).
The results were used to develop a psychological evaluation questionnaire. This instrument has 25 items that are answered on a Likert scale. These questions can be asked by professionals in contact with individuals who are potentially at risk of attempting suicide. The person administering the questions need not be an expert but should be trained to ask the questions. Details of these questions can be seen in the Psychological Vulnerability Questionnaire shown in Table 6.
The results identified whether the participant being evaluated was in a fragile state that made him or her vulnerable to actively thinking about suicide or attempting to commit suicide. As mentioned above, this assessment tool shows protective and risk factors for each patient, which might guide evaluators and clinicians toward indicating aspects of interest for psychotherapeutic intervention. A patient profile description was elaborated in terms of the following:
- a) Feelings of satisfaction/dissatisfaction with life
- b) State of satisfaction/dissatisfaction with oneself and achievements
- c) Reasons to live/to stay alive if one is thinking about attempting suicide.
Captions of the figures presented in this manuscript are shown in Table 8.
[1] Support vector machine models: Supervised learning models associated with learning algorithms that analyze data used for regression and classification analysis. Starting with a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, hence resolving a non-probabilistic binary linear classifier (Boser, Guyon, & Vapnik, 1992).
[2] Decision tree technique: A model of computation in which an algorithm is considered to be a sequence of branching operations based on comparisons of some quantities, the comparisons being assigned the unit computational cost. The branching operations are called “queries or tests". The algorithm may be considered a computation of a Boolean function where the input is a series of queries and the output is a final decision, in which every query is dependent on previous queries or tests (Bramer, 2007).