A systematic review of brief screeners for suicidal behaviour in primary care


 The authors have requested that this preprint be removed from Research Square.

gold standard was found, as the studies identified used different reference standards.
Although there are other promising short questionnaires, the majority has not yet been evaluated in primary care setting or the general population with regard to their diagnostic accuracy. A final assessment should always be based on the clinical judgement of the attending physician. Registration The study protocol was registered at PROSPERO (ID: CRD42019122173). Background Suicidal behaviour is described as a continuous process. It begins with occasional suicidal thoughts and can lead to the complete suicide (1). A hierarchical arrangement of different degrees of severity also forms the basis of the progression from suicidal thoughts to suicide attempts (2). We defined suicidal behaviour in accordance with O'Conner et al.
(2014): suicidal behaviour refers to "thoughts and behaviours related to an individual intentionally taking their own life. These thoughts include the more specific outcomes of suicide ideation, which refers to an individual having thoughts about intentionally taking their own life; suicide plan, which refers to the formulation of a specific plot by an individual to end their own life; and suicide attempt, which refers to engagement in a potentially self-injurious behaviour in which there is at least some intention of dying as a result of the behaviour. " (3).
In the general population, lifetime prevalence of suicidal ideation ranges between 7.8 and 9% (4, 5), short-time or point-prevalence are usually lower (6)(7)(8). The prevalence of suicide plan and suicide attempt was approximately 3% (9). About 30% of the individuals with suicidal ideation attempt suicide -two thirds of which in the first year after the onset of suicidal ideation (9). Studies of suicidal behaviour in primary care showed that around 1% to 10% of the patients had suicidal thoughts (10), but only a few of them talked about these thoughts to their general practitioner (GP) (11-13). Suicidal ideation and previous suicidal attempts are considered strong predictors of future suicide (14-17). Therefore, actively screening for suicidal behaviour should be an integral part of primary care services: it could reach large numbers of individuals, especially at high-risk patients, who otherwise would not be identified (18).
The literature shows, that up to 80% of suicide victims visited their GP in the year prior to their suicide and almost half even a month before (19,20). However, GPs often did not recognize the symptoms of a potential suicide risk in their patients' behaviour (21, 22).
Suicide prevention strategies in primary care aim at training and increasing the awareness of GPs. Programs include information on the treatment of mental disorders (23) and recognizing patients with suicidal behaviour through screening (18). In 2013, the U.S.
Preventive Services Task Force published a review on the subject of screening for suicide risk in primary care (24). The review showed limited data indicating that there are screening instruments with acceptable performance characteristics for adults (24). The present review also focusses on screening instruments for suicidal patients in primary care. In comparison to the Task force review, several additional criteria were taken into account to broaden the scope of insight on this subject: studies published after 2013 were added to our search strategy; to our knowledge, this is the first review on this subject also including articles in German language; furthermore, the number of items of the included screening instruments was limited to a maximum of twelve items (25). Due to the small number of primary care studies to be expected from preliminary literature search results, we also included studies from the general population with a similarly low prevalence of suicidal behaviour (9) (10). The aim of this study was to obtain an initial overview of the available screening instruments and compare their capability in correctly identifying patients with suicidal behaviour and in need of further mental health care.

Methods
The authors followed the PRISMA extensions for Diagnostic Test Accuracy Studies (26).

Protocol and registration
The study protocol was registered at the PROSPERO international prospective register of systematic reviews (ID: CRD42019122173).

Eligibility criteria
We defined the following eligibility criteria for studies to be included in the present review. Participants had to be primary care patients or participants from the general  (29), and PsychINFO (EBSCO) (29). The search was performed in January 2019; no time limits were used. The full search strategy is shown in the supplementary file 1. Through an additional manual search further articles were identified. This included screening references of included studies, identified reviews, meta-analyses, and books.

Study selection
For deduplication allsearch results were exported to EndNote. Afterwards, two reviewers Karoline Lukaschek (KL) and Milena Frank (MF) screened titles and abstracts independently using the web application Rayyan (30). Studies for the full-text analysis were selected and checked for eligibility by two reviewers (KL, MF). Studies fulfilling eligibility criteria were included in the review. Any disagreements were resolved by consensus. If this did not lead to an agreement, a third reviewer Jochen Gensichen (JG) was consulted.

Data collection process
Both reviewers (KL, MF) performed data collection for the key data points. All relevant data was extracted and added to a Microsoft Access sheet, which was prepared for this review.
Any disagreements between the two reviewers in the data collection were resolved by discussion among the researchers. If this did not lead to an agreement, a third reviewer (JG) was again consulted.
For data extraction a part of the STARD 2015 checklist was used (31). Key characteristics of the studies such as participant characteristics, clinical setting, study design, target condition, index test, reference standard, sample size, and funding were extracted.
Corresponding authors were contacted if necessary.

Risk of bias and applicability
Risk of bias and applicability was assessed using Quality Assessment of Diagnostic Accuracy Studies-version 2 (QUADAS-2) by KL and MF (32). In order to evaluate the risk of bias for each study, different categories were assessed. These included patient selection, index test, reference standard and flow and timing. To estimate the applicability concerns, patient selection, index test and reference standard were judged. The different categories were classified as low, high or unclear. The quality assessment was conducted by two independent reviewers and any disagreements were resolved by consensus or by consultation of JG.

Diagnostic accuracy measures
The different screeners were assessed and compared with regard to their measures of diagnostic accuracy. This included sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV). These measures were extracted from original studies if reported or calculated with ThresholdROC package version 2.7 in R version 3.6.0 by Kathrin Halfter (KH) and MF. If not available, 95% confidence intervals were calculated using Wilson Score interval (KH and MF). Sensitivity and specificity were visualized in a forest plot (see Figure 2)..

Certainty of evidence
The certainty of evidence of the different index tests was assessed using GRADEpro. The certainty of evidence was classified as high, moderate, low or very low. Different factors that may decrease the certainty of evidence were evaluated. This included Risk of bias, Indirectness, Inconsistency, Imprecision, and Other considerations (33). If there was only a single study for a screening instrument, we followed the recommendation of the Cochrane Training and did not downgrade in relation to Inconsistency (34).

Synthesis of results
Due to the large heterogeneity of the extracted studies a narrative synthesis combined with figures and tables was performed. Sensitivity and specificity were visualized in a forest plot.

Results
Study selection Figure 1 PRISMA Flow Diagram (35) A total of 9 969 results were identified through database searching and additional search.
In these studies, over 17 different screening instruments for suicidal behaviour were Only n = 6 of the identified articles measured diagnostic test accuracy or provided data for calculating it (36-41) (see Table 1). These were included in the narrative synthesis. Table 2 gives an overview of the remaining n = 9 studies not further assessed, due to missing diagnostic accuracy measurements (42)(43)(44)(45)(46)(47)(48)(49).

Study characteristics
The six included studies all measured different index tests and used various reference standards. Four studies were conducted in the United States (37,38,40,41) and two in Australia (36,39). There was no study identified in German language. The oldest study dates back to 1994 (40) and the most recent one was published in 2015 (37). All studies had a cross-sectional design. Three articles evaluated suicidal behaviour in primary care (38,40,41) and three in the general population (36,37,39). Suicidal ideation was the most widely used target condition (four times), followed by suicide attempt (two times) and suicide plan (two times). The general term suicidal behaviour was selected only once as target condition. Each of the described index tests were assessed only in a single study. Table 1 shows key characteristics for included studies.

Risk of bias and applicability
Regarding patient selection, 33% of the articles were assessed as having low risk of bias, 33% were judged as high risk of bias, and 33% as unclear risk of bias. The index test was assessed 83% as low risk of bias and 17% as unclear risk of bias. The category reference standard was assessed 33% as low risk of bias and 67% as unclear risk of bias. Flow and timing were judged 50% as low risk of bias, 33% as unclear of bias and 17% as high risk of bias. The applicability concerns were rated as low risk, just concerning the patient selection, applicability concerns were unclear risk in 33% of the articles.

Results of individual studies and synthesis of results
The prevalence of suicidal behaviour in the general population ranged from 5% (39) to 75% (37). In the primary care setting without a pre-test, the prevalence ranged from 1%-3% (38,40). In the primary care setting with a pre-test for depression, the prevalence of suicidal behaviour was 35% (41).
The certainty of the evidence was high in two studies (38,39), moderate in three studies (37,40,41) and low in one study (36). Results of the individual studies are presented in Table 4. The measures of diagnostic accuracy sensitivity and specificity are visualized in a forest plot in Figure 2.
Target condition suicidal ideation Chamberlain et al. (2009) evaluated the K10 as a screening instrument for suicidal ideation. We calculated sensitivity and specificity with 95% confidence intervals for the cut off value high (22) and very high (30). The calculated sensitivity was 26% and the specificity 99% for the very high cut off and 58% sensitivity and 93% specificity for the high cut off. The certainty of evidence was rated as high. Target condition suicidal behaviour Uebelacker et al. (2011) examined the ability of the PHQ-9 -item 9 in detecting suicidal behaviour compared to the Structured Clinical Interview for DSM-IV Axis I Disordes (SCID-I) mood module. Reported sensitivity was 69% and specificity 84%. For the present review, the certainty of evidence was classified as moderate.

Discussion
We identified a total of 14 studies that assessed suicidal behaviour in the primary care setting or the general population. Of those, only six measured the required diagnostic accuracy. This review identified two further studies in the adult primary care population (40,41) in addition to the already in the review of U.S. Preventive Services Task Force (24) described study by Olfson et al. (1996) (38) and three studies in the general population (36,37,39).

Gate question for suicide attempt
All of the screeners, except of SIDAS, yielded in a high to moderate GRADEpro rating.
SIDAS was assessed as low certainty of evidence.
Six of the nine screening instruments were screening instruments with only one item.
Single item instruments are controversially discussed in the literature (37,(51)(52)(53). In the study by Millner et al. (2015), both single item index tests yielded acceptable sensitivities and specificities, but the author advises against using these questionnaires, since single item measures do not capture a large proportion of suicidal behaviour, and his statistical simulations showed that these lead to more erroneously classifications than questionnaires with more items (37). In contrast, item 9 of PHQ-9 was investigated by Simon et al. (2013); the author reported that the response to item 9 could identify patients with an increased risk of suicide attempt and suicide (53).
Screening of patients at risk is recommended by several medical societies (18, 54,55). In this context, it should also be considered that a sample of primary care patients at risk (e.g. patients with depression) and a sample of unselected primary care patients would have different prevalence of suicidal behaviour. The prevalence of a disease directly influences the values PPV and NPV (56). PPV and NPV are particularly interesting for physicians because they describe the probability that a patient with a positive (negative) test result is ill (not ill) (56). The populations of primary care patients and general population are usually settings with a low prevalence of suicidal behaviour, e.g. in the range from 0.01 (38) and 0.05 (39). The resulting NPV of around 95% underlines suicidal behaviour is a very rare condition with a pre-test probability of being healthy from 0.95 to 0.99. The resulting PPV of around 15-30% means at the same time that 70-85% are diagnosed as false positive. Therefore, as any other symptom/disease in the low prevalence range, prediction is limited (57). Thus it is necessary to identify potential patients at risk (e.g. depression) before screening for suicidal behaviour. Solely the study  (18, 58). Concerns about inducing suicidal behaviour through screening could be clarified (59).
A further approach is that hope and optimism contribute to mental health (3,60,61), protective factors should be included in the medical history (18, 48). Chang et al. (2013) found out that optimism is a protective factor for suicide risk in primary care patients (62). To our knowledge, there is only one short screener, the P4 (48), that assesses protective factors; however, the P4 was excluded from our analysis due to missing diagnostic accuracy measures.
A closer look at the PHQ-9 -item 9 is worthwhile. Although item 9 did not achieve the required sensitivity of 80% (cf. 69%), it has the advantage that it is already integrated into everyday practice as a depression screener (41). A prospective study in primary care and mental health care showed that the answer to item 9 had a strong predictive value for suicide attempts and suicide (53).
The choice of the reference standard plays a decisive role, as the results of the index test depend directly on the assessment of the reference standard. A diagnostic interview e.g. (63) is supposed to be the highest standard. In the literature, however, different screening instruments have been referred to as the gold standard, including short questionnaires such as the C-SSRS (47,64,65). An agreement on a gold standard would be an important step for future research. This would improve the comparability of the results.

SCID-I
The NPV of the screening instruments was over 80%. If further studies could replicate these results and confirm them in a setting with patients at risk, the screening instruments could help the GP to rule out suicidal behaviour.

Limitations
The studies assessed different index tests. All index tests were examined in only one study, which limits the interpretability of the results. Moreover, the studies included reported different reference standards. As there is no gold standard in the assessment of suicidality, a comparison of the diagnostic accuracy values of the index tests is difficult.
Besides overall risk of bias and applicability concerns were moderate. This could limit the interpretability of the results.
When evaluating with GRADE, it must be critically considered that each index test was only evaluated in a single study.
As only studies measuring diagnostic test accuracy were included in our reviewdescribing criterion validity (66) -articles examining other criteria of psychological tests (67) were not evaluated further (42)(43)(44)(45)(46)(47)(48)(49). For promising short questionnaires like the SBQ-R, P4 or the CSSRS we did not identify studies in the primary care setting or the general population which measured their diagnostic accuracy.
In any systematic review there remains a risk that articles were not identified because of the selection of search terms or the erroneous exclusion in title and abstract screening.
The review was limited to articles in English and German language. Self-reported suicidal behaviour may be limited by individuals' motivation to reveal their suicidal plans or intent; this affects the results of the studies (68).

Conclusion
Screening of unselected primary care patients for suicidal behaviour is little effective. In contrast screening of patients at risk is an important step in diagnostics of suicidal behaviour (69). In Germany GPs have competence in this field due to their specialized training (70, 71). Promising short questionnaires such as PHQ-9 -item 9, or P4 should be included in future research in order to assist GPs in their judgement of suicidal behaviour of patients at risk. However, the final assessment is always based on the clinical judgement of the attending physician (72). The agreement on a gold standard would be an

Consent for publication
Not applicable.

Availability of data and material
The datasets used and analysed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests Funding There was no funding of the review.
Authors' contributions KL and MF wrote the manuscript, performed the title and abstract screening, the full text analysis, the data collection, the risk of bias assessment and rated the certainty of evidence. MF and KH performed the statistical analysis. JG and AS revised the manuscript critically for clinical content.
All authors have read and approved the manuscript.    Table 3 is only available as a download in the supplemental files section. CI=Confidence interval, GHQ-28= General Health Questionnaire-28