__Traditional Analyses__

After collecting four-year follow-up data through electronic medical records chart review, analyses were performed using simple statistical tools.

The overall CFI-S score was predictive of any future suicidality (ideation, planning, attempts, hospitalizations) with a ROC AUC of 0.798 and a *p*-value of 2.39 E-21 (Figure 1a).

The average CFI-S score for those with future suicidality was 54 vs. 31 for those without future suicidality, with a t-test *p*-value of 1.46 E-22 (Figure 1 b).

We also examined the correlation of the CFI-S score with suicidality severity - suicidal ideation (SI) receiving a score of 1, suicide plan (SP) receiving a score of 2, suicide attempt (SA) receiving a score of 3, and hospitalization for suicidality receiving a score of 4. The Pearson’s correlation R-coefficient was 0.44 , *p*-value of 2.91 E-24(Figure 1 c) .

Additionally, a Cox regression was used to determine imminence of suicidality, producing a Hazard Ratio of 1.33 with a *p*-value of 7.53 E-03 and a one tailed *t*-test with a value of 3.76 E-03 (Figure 1 c).

A *t*-test was also performed for each individual CFI-S items between those with suicidality and those without (Figure 1 d). The top item (*p*-value 6.29 E-26, 12 orders of magnitude higher than the second best) was perceived uselessness (not needed, and/or feeling like a burden to kin). The next top items, in order, were past suicidality (1.57 E-14), social isolation (2.40 E-14). hopelessness (6.17 E-13), and past history of a mental health diagnoses (9.54 E-13).

__Machine Learning Analyses__

Machine learning has the ability to extract more out of data, and it has been used for various medical diagnosis, such as tree-based models in PTSD assessment,18 naïve Bayes, random forest, and support vector machines in lung cancer prognosis19, XGBoost for kidney disease diagnosis20. We developed a comprehensive machine learning framework for predicting future suicidality occurrence, severity, and imminence.

The future suicidality prediction is formulated as a binary classification problem. We developed a deep neural networks (DNN) framework, and compared it with other classical machine learning classifiers - native Bayes (NB), XGBoost (XGB), random forest (RF), support vector machines (SVM).

The receiver operating characteristic (ROC) curve, accuracy, precision, recall evaluation metrics, F1 score, and area under receiver operating characteristic (AUROC) results in Figure **2**(b-c) show that the constructed RF and DNN classifiers exhibit superior performance compared to the other classical machine learning classifiers for the discovery and test cohorts, respectively. For the results shown in this figure, we train and tune hyper parameters of our machine learning models with a discovery cohort, and then we get the test result by testing our models with an independent test cohort. Therefore, models that achieve good results in the test cohort are better than models that perform well in the discovery cohort. I.e., DNN achieves higher results in the test cohort, which demonstrates its generalization ability. The proposed DNN model is a complex and high-performance deep learning model, that takes CFI-S information as input and learns to utilize input data intelligently to achieve best performance possible within training time (see Methods section, Deep neural networks’ hyperparameters and training details for model details).

In addition to classical machine learning and deep neural network classifiers, we constructed patient similarity networks (see Methods section, Network representation for network representation details). With graph visualization shown in Figure 3, we can locate and visualize a new patient in the graph based on collected CFI-S records, which is useful for potential early stage screening. Imagine a case where a patient takes 15 minutes and provides the CFI-S record, we can then compute the similarity between this CFI-S and all the other records we have in the system, then visualize the location of this patient in the graph. The graph has approximately 2 parts, the smaller area located in the lower left corner, which is a “high-risk” area, and the larger area located in the upper right corner of the graph is a “low-risk” area. With patients located in the graph, we provide a fast early-stage screening through graph neural network (GNN). GNN is an advanced graph based neural network model that works well on data that can be represented in graph or network. We formulate our GNN with this similarity network and provide a SI prediction. From the results shown in Figure 3(c), we can see that similarity network based GNN not only operates as an advanced classification model, but also provides explainability through visualization.

Different from predicting future suicidality occurrence, the severity and imminence predictions are formulated as regression problems. Severity represents the weighted score in relation to the severity of suicidality of a patient in a 4-year follow-up. Imminence refers to the time (month) elapsed between the CFI-S assessment and the first instance of suicidality of a patient. We used our DNN framework to investigate these two regression problems. Figure **2**(d-e) summarize the prediction results of the proposed DNN model and other classical machine learning models (see also Supplementary Materials section X for details on the experimental setup and additional results.) The accuracy in Figure **2**d for the severity prediction and imminence prediction ranges between 85% to 100% and 90% to 96%, respectively, for increasing prediction intervals. The results for the test cohort are slightly lower than those in the discovery cohort. This demonstrates that the proposed deep learning framework can prove instrumental in suicide investigation, and may generalize well in external and future cohorts.