Predicting suicidal thoughts and behaviors among college students: a machine learning approach

Suicidal thoughts and behaviors are prevalent among college students. Yet little is known about screening tools to identify students at higher risk. We aimed to develop a risk algorithm to identify the main predictors of suicidal thoughts and behaviors among college students within one-year of baseline assessment. We used data collected in 2013–2019 from the French i-Share cohort, a longitudinal population-based study including 5066 volunteer students. To predict suicidal thoughts and behaviours at follow-up, we used random forests models with 70 potential predictors measured at baseline, including sociodemographic and familial characteristics, mental health and substance use. Model performance was measured using the area under the receiver operating curve (AUC), sensitivity, and positive predictive value. At follow-up, 17.4% of girls and 16.8% of boys reported suicidal thoughts and behaviors. The models achieved good predictive performance: AUC, 0.8; sensitivity, 79% for girls, 81% for boys; and positive predictive value, 40% for girls and 36% for boys. Among the 70 potential predictors, four showed the highest predictive power: 12-month suicidal thoughts, trait anxiety, depression symptoms, and self-esteem. We identied a parsimonious set of mental health indicators that accurately predicted one-year suicidal thoughts and behaviors in a community sample of college students.


Introduction
College students are vulnerable to mental health problems and suicidal thoughts and behaviors (STB). 1,2 In a large study in eight countries the 12-month prevalence rates were 17.2% for suicidal ideation, 8.8% for suicidal planning, and 1.0%, for suicide attempt. 3 Factors that may contribute to the increased risk of STB in this population include the transition from high-school to university, increasing workload, increased psychosocial stress and academic pressures, and adaptation to a new environment. 4 Avoiding the onset or aggravation of STB requires early detection of students at risk, to help them access mental health services or having them engaged in coaching strategies. 5,6 However, identifying students with STB is challenging due to limited resources on campus, 7 and because college students may be reluctant to share information about their mental health. Effective screening would require (1) the identi cation of characteristics that predict STB; and (2) minimally intrusive questions integrated into short assessments easier to administer to large populations. Most previous studies of STB prediction in students have been based on logistic regression models that account for a limited number of predictors, and have provided association measures. 8,9 However, the identi cation of factors associated with STB does not necessarily imply that they could help predict future STB 10,11 and for that, models speci cally designed for prediction are needed. Moreover, some variables used in previous studies (e.g., psychiatric assessment) 12 are impractical to assess in a large population of students as they require the expertise of a trained clinician.
As pointed out in a recent paper summarizing 50 years of research on STB, further research should shift from identi cation of risk factors associated with STB to focus on developing predictive algorithms using machine learning methods. 13 Such methods enable the inclusion of several risk and protective factors, while accounting for their potential interactions, 14,15 which is consistent with the shared concept that STB result from complex interactions between social, psychiatric, psychological, and environmental factors. 16 In this study we applied a machine learning method to develop an algorithm to predict STB in the next 12 months after baseline assessment using a large longitudinal cohort of French university students. All analyses were strati ed by gender as recommended. [16][17][18] Results Description of the sample The nal study population comprised 5066 students, including 4005 (79.1%) girls and 1061 (20.9%) boys. Of the 5066 participants, 874 (17.3%) students reported experiencing STB in the past 12 months (17.1% reported suicidal ideation and 0.7% suicide attempts). The STB prevalence did not signi cantly differ between girls (n = 696; 17.4%) and boys (n = 178; 16.8%). Among the 874 students who reported STB, 61.3% (n = 536) had reported 12-month suicidal thoughts (with or without history of lifetime suicide attempts), and 14.6% (n = 128) had reported a lifetime suicide attempt at baseline.
The main baseline characteristics did not signi cantly differ according to gender (Table 1). The mean participant age was 20.7 years (SD 2.6). Over one-third of the sample (n = 1932; 38.1%) was in their rst year of university education. The majority of the students lived alone in an apartment (n = 1544; 30.5%) or at their parents' home (n = 1495; 29.5%), and 17.5% (n = 884) described their current economic situation as di cult or very di cult. The most prevalent indicators of childhood adversity were maternal depression history (n = 1536; 30.3%) and parental divorce or separation (n = 1484; 29.3%). At baseline, one in ve students reported 12-month suicidal thoughts (n = 1072; 21.2%) and 5.4% (n = 275) reported a lifetime suicide attempt. All data presented as n (%) unless otherwise noted.

Prediction of suicidal thoughts and behaviors
Among girls, the predictive model had an out-of-bag error of 24.6%, suggesting the overall misclassi cation of a quarter of the female participants. Among boys, the out-of-bag error was 28.1%.
The model showed an AUC of 0.84 (95% CI 0.83-0.86) for girls, indicating a discrimination 68% better than chance, and 0.82 (95% CI 0.79-0.86) for boys ( Fig. 1). The sensitivity was 0.79 for girls and 0.81 for boys, indicating that the model correctly predicted 79-81% of the actual cases ( Table 2). The predictive positive values were 0.40 and 0.36 for girls and boys, respectively, meaning that 40% and 36% of predicted cases were actually cases. Analysis of the variables' importance for the prediction, as measured by the mean decrease in accuracy, revealed that the following four variables were the most predictive in both girls and boys: 12-month suicidal thoughts at baseline, self-esteem, trait anxiety, and depression symptoms (Fig. 2).

Secondary analyses
We repeated these analyses excluding baseline STB from the potential predictors, and found that the predictive performances were only slightly lower than in the main analyses. For girls and boys, respectively, the AUC was 0.79 and 0.76, and the sensitivity was 0.71 and 0.65. Variable importance for the prediction was the same between girls and boys, with the following four main predictive variables: trait anxiety, self-esteem, depression symptoms, and perceived stress (Fig. 3). We then tted our random forests models among the 1497 girls and 414 boys who answered the childhood adversity questionnaire. The predictive performances were similar for girls (AUC 0.79; sensitivity of 79%) and boys (AUC 0.75; sensitivity of 76%). In girls, the four main predictive variables were baseline suicidal thoughts, depression symptoms, self-esteem, and trait anxiety. In boys, the four top predictors were 12-month suicidal thoughts, perceived stress, trait anxiety and self-esteem ( Supplementary Fig. 1.). Thus, in both genders, childhood adversity variables did not contribute to STB prediction.\

Discussion
Using random forests models in this large sample of college students we found that four main baseline variables predicted STB at 12-month: suicidal thoughts at baseline, trait anxiety, depression symptoms, and self-esteem. The model including these variables showed good predictive performance (AUC = 0.8) estimated using cross-validation. In sensitivity analyses excluding baseline STB, the main predicting variables were trait anxiety, depression symptoms, self-esteem, and perceived stress. These predictors did not differ according to gender, and all models showed similar good predictive performances. Finally, childhood adversity variables did not contribute to STB prediction.
To our knowledge, only two prior studies have developed STB predictive models in students and reported comparable predictive performances to our study. One study used the random forests method to predict suicide attempts among medical students, using a cross-sectional design. 19 The other study used a logistic regression model to develop a risk-screening algorithm for persistence of suicidal behaviors during college. 20 STB prediction was not in uenced by childhood trauma or perceived parental support, which are usually strongly associated with STB in young adults. 21,22 These results are in line with previous studies. 20,23 This nding highlights that association doesn't necessarily means prediction, 11 and that proximal risk factors of STB may be better than distal one for predicting one-year STB.
We identi ed a small number of major predictors that ensured high accuracy in STB prediction. These predictors, derived from short and commonly used questionnaires, may help developing a large-scale screening tool for university students. For example, they could be integrated into a short online screening administered upon college entrance. An online questionnaire may prove acceptable to students, and would provide an alternative to mental health assessment by a physician for students who are often reluctant to disclose sensitive personal information in face-to-face interviews. 24,25 The quantitatively most important predictor was suicidal thoughts at baseline. 20,26 Likewise, anxiety and depression were often comorbid with STB in students. 27 After excluding baseline STB, we found the same main three variables as in the main models, but with an increased prediction importance. Interestingly, self-esteem emerged as one of the main predictors of STB. Low self-esteem is known to be a part of social anxiety, and to overlap with depression, both of which are associated with STB. 28 Selfesteem, which is an important marker of psychological vulnerability in young adults [29][30][31] has also been found associated with suicidality. 32 Our study showed that self-esteem is an independent and prominent predictive marker of STB and should therefore be used in a screening tool.
Overall, our results suggested that baseline suicidal ideation associated with four validated psychological scales (Rosenberg scale for self-esteem, STAI-YB Spielberger scale for trait anxiety, PHQ-9 for depression, and perceived stress scale PSS-4) are informative enough to identify students who will present STB at the one-year assessment.
Key strengths of this study are the large sample of students and the longitudinal design. Since there are many different paths to STB, accurate STB prediction requires the consideration of a complex combination of a large number of factors. 13 The i-Share baseline questionnaire includes a large number of variables, which enabled analyses with a large number of potential STB predictors (70 in the main analyses and 87 for the secondary analyses). Our analyses were conducted following the current recommendations and best methods for prediction analysis, especially the use of different samples for creating the predictors and then for calculating the predictive performance, which prevents the performance measures from being over tted. 10,11 The variables identi ed as main predictors of STB were consistent across main and secondary analyses, suggesting robust and consistent ndings. Some limitations should nevertheless be acknowledged when interpreting the results. First, the follow-up response rate (33.5%) was moderate, as is common in longitudinal studies with students 33 and differences were observed between respondents and non-respondents in the follow-up. These differences were not major (proportions were similar) and should have a limited impact when identifying STB predictors. Nevertheless, caution is needed regarding the external validity of our results and the possibility of generalizing conclusions to all students and to all settings. Second, girls were over-represented in our sample, and our sample might not be representative of the whole student population. Third, the selfreported questionnaires could lead to information and recall bias, particularly if participants underreported their frequency of STB due to concerns about social desirability. However, such under-reporting is likely to be reduced by the use of an online questionnaire. Additionally, and more importantly, relying on other data (e.g., clinical assessment) would defeat our aim of nding easily assessable predictors of SBT in large university student samples. Finally, we could not strictly separate analyses between suicidal ideation and suicide attempts due to the small number of one-year suicide attempts in our sample.
In conclusion, we identi ed a parsimonious number of predictors that can be used to accurately identify students who will present STB within one-year from the predictor assessment. Pending replication of these results in other studies, these predictors may be used to develop a screening tool to be routinely used among university students. For example, a web-based screening tool could represent a promising approach for identifying students at suicide risk and to refer them to counselling and mental health services.

Study design and participants
Our study sample comprised participants in the ongoing internet-based Students' Health Research Enterprise (i-Share) project-a prospective population-based study on students' health which was launched in some French universities in 2013. Students were informed about the purpose and aims of the study through yers, communications in classes or social media. To be eligible students must be registered at a university or higher education institute, be at least 18 years of age, and be able to read and understand French. Volunteers provided an on-lined informed consent. The enrolment procedure has been previously described. 34 The i-Share protocol was approved by the "Commission nationale de l'informatique et des libertés" (CNILNational Commission of Informatics and Liberties) (number: DR-2013-019), which ensures that data collection does not violate freedom, rights, or human privacy. At enrolment (i.e., baseline assessment), self-administered on-line questionnaires collected sociodemographic characteristics, physical and mental health parameters, personal and familial history, living conditions, lifestyle habits, and substance use.
One year later, students were invited by email to complete a follow-up questionnaire. Three reminder emails were sent at 14, 28, and 33 days following the invitation. For the present longitudinal study, we used data from a sample of students who were included in the i-Share cohort study between February 2013 and September 2019, who participated in the follow-up, and for whom data on STB were available.
Baseline information was available for 15, 667 students. These students were solicited to participate in the follow-up, and 5255 agreed to participate (33.5% response rate). At baseline, compared to the students who participated in the follow-up, the non-participants reported slightly more 12-month suicidal ideation (n = 2285, 22.0% vs. n = 1151, 21.9%; p = 0.0004) and more lifetime suicide attempts (n = 682, 6.6% vs. n = 298, 5.7%; p = 0.0004). Additionally, the non-respondents were more likely boys (n = 2963, 25.9% vs. n = 1099, 20.9%; p < 0.0001). We did not observed differences between participants and nonparticipants for the year of study or parental depression history (Supplementary Table 1.). Among the respondents, 189 (3.6%) were excluded because they did not answer the STB-related questions.

Measures
The one-year follow-up questionnaire included questions about suicidal thoughts and suicide attempts during the last 12 months. Participants who reported having occasional or frequent suicidal thoughts and/or suicide attempts were coded as positive for STB.
We considered baseline assessments of 70 potential predictors (Supplementary Table 2.). These variables included socio-demographic characteristics (e.g., age, year of study, scholarship, and accommodation type), lifestyle habits (e.g., time spent on screens and sleep quality), familial characteristics (e.g., perceived parental support, parental divorce, and parental history of depression), physical health (e.g., handicap and perceived health), and substance use (e.g., tobacco and alcohol use). Baseline characteristics also included history of diagnosed psychiatric disorders, lifetime suicide attempts and suicidal thoughts during the 12 months preceding inclusion (latter called baseline STB). We measured several mental health parameters with validated scales: depression symptoms using the 9-item Patient Health Questionnaire (PHQ-9); 35 trait anxiety using the Spielberger State-Trait Anxiety Inventory (STAI-YB); 36 self-esteem using the Rosenberg scale; 37 perceived stress using the Perceived Stress Scale (PSS-4); 38 and impulsivity using the Barratt Impulsivity Scale (BIS-11). 39 Childhood adversities are not investigated in the baseline questionnaire. In order to take into account these important potential predictors in our models, a subsample of 1911 participants was administered a supplementary questionnaire adapted from the Childhood trauma questionnaire. 40 This questionnaire included 17 variables assessing experiences of sexual abuse, physical or psychological maltreatment, or neglect (Supplementary Table 2). Statistical analyses We rst described the overall study sample and according to the gender. Continuous variables are expressed as mean ± standard error. Categorical variables are described as the proportion.
Prediction of one-year STB. To predict STB we used a random forests model, which is a non-parametric ensemble machine learning method applicable for both classi cation and regression prediction. 41 This technique is broadly used due to its high performance and robustness, and because it enables the use of variables independently of type and distribution. 42 Random forests are based on the aggregation of a set of decision trees created through recursive bootstraps of the initial sample. 43 In each bootstrap sample, a decision tree is created using two-third of the observations. The remaining one-third, termed the out-ofbag sample, is used to obtain an unbiased performance measure of the created algorithm. This evaluation of prediction performance yields a measure termed the out-of-bag error, which represents the overall error of the algorithm in terms of outcome prediction. The out-of-bag sample is also used to calculate the relative importance of each variable for the prediction. To this end, the value of a given variable is randomly shifted in the out-of-bag sample, and any resulting change of the out-of-bag error re ects the variable's importance in the prediction. Finally, all individual decision trees are aggregated to create the nal predictor algorithm. To carry out these analyses we used the randomForest and caret packages in SAS and R. Missing data on the predictors (2%) were handled using the R missForest algorithm 44 speci cally designed to deal with missing data in random forest models.
Predictors. For the main analyses, the 70 potential predictors were included in the model. We then performed two secondary analyses. First, we re-estimated our models after removing baseline STB as they were potentially strong predictors of one-year STB. 45 Second, we re-estimated our models in the subsample including data on childhood adversity.
Evaluation of model performance. We evaluated the prediction quality of our models in the testing sample using the following performance metrics: (1) out-of-bag error, obtained using the out-of-bag sample of the training set, which represents the overall error in the prediction (ranges from 0%, indicating that no individual is correctly classi ed, to 100%, indicating that all individuals are correctly classi ed); (2) area under the curve (AUC), 46 which measures the accuracy of discrimination performance represented by the predicted true positive rate against the false positive rate (ranges from 0.5, indicating prediction by chance, to 1, indicating perfect prediction); (3) sensitivity, representing the rate of actual cases (i.e. students reporting STB) identi ed by the algorithm; and (4) the positive predictive value, describing the proportion of algorithm-predicted cases that are also actual cases. To prevent these performances to be over-tted and to increase the generalizability of the prediction model, we estimated these indices through cross-validation. We therefore split randomly the initial dataset into 10-folds, we created the model using 9 of the 10 folds and tested on the remaining fold. We repeated this process until all the folds were used as test sets. The prediction metrics were then obtained from every test set.
All methods were carried out in accordance with the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement for prediction model development (moons). 47 Data availability The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.