Using Machine Learning To Understand Suicide: A New Approach To Classifying Australian Coroner’s Court Decisions

We aimed to demonstrate how a large collection of publicly accessible Australian Coroner’s Court case ﬁles ( n =4459) (2009-2019) can be automatically classiﬁed for determination of death by suicide, presence of mental health disorder and sex of deceased via Natural Language Processing (NLP) methods - supervised machine learning and unsupervised dictionary-based and string search based approaches. We achieved superior levels of accuracy in the machine learning classiﬁcation (Gradient Boosting vs. Random Forest baseline) of deaths by suicide of 83.3% (sensitivity = 85.1%, Speciﬁcity = 79.1%) and an accuracy of 98.3% for the dictionary-based classiﬁcation of mental health disorder, as deﬁned by the OCD-10 (sensitivity = 99.0%, speciﬁcity = 97.9%). Our machine learning approach automatically classiﬁed 24.2% (1078/4459) of the case ﬁles as referring to deaths by suicide while 63.7% (2940/4459) where classiﬁed as exhibiting a mental health disorder 1 . We employed a two-stage machine learning approach involving feature engineering, followed by predictive modelling in the second. Feature engineering involved several steps including removal of low value text, parts of speech analysis, term document weighting and topic clustering. Predictive classiﬁcation involved extensive hyperparameter tuning to yield the most accurate model. We validated our models against a manually pre-coded subsample of case ﬁles, and also via binary logistic regression to test the contribution of each classiﬁed mental health disorder against determinations of deaths by suicide according to extant literature. This validation step conﬁrmed elevated odds of suicide attributed to diagnoses of Depression, Schizophrenia and Obsessive Compulsive Disorder. Finally, we offer a short case study to demonstrate the efﬁcacy of our approach in investigating a subset of case ﬁndings referring to suicides resulting from family violence. We offer a proof of concept model that demonstrates an objective and scalable approach to the analysis of legal texts. The use of NLP methods in analysing Coroner’s Court case ﬁndings has important implications for the ongoing development of a real-time surveillance of suicide system in Australia.


Introduction
Australia has committed to establishing a real time surveillance of suicide system. Such a system would incorporate data from police, hospital emergency departments, the Coroner's Court and other stakeholders 2 . Although in its inception, the real time surveillance of suicide system seeks to make public a range of data and analyses that will inform preventative public policy initiatives.
However, there are considerable methodological challenges involved in predicting suicide events, driven by its often sudden and unpredictable nature, low case numbers, and stigma 3 . In over 50 years of research, a recent meta-analysis revealed both a narrow methodological approach to investigating the risk factors of suicide, and low predictive accuracy of suicide using these identified risk factors, that has not improved over time 4 . Researchers have called for an algorithm-based approach to predicting suicide that utilises machine learning 4 . It is anticipated that this will facilitate a more accurate and efficient detection of suicide.
To facilitate the shift to machine learning methods such as Natural Language Processing (NLP) for predicting suicide, there is a need to leverage complex and comprehensive datasets to utilise the linguistic context within which suicide is discussed. In Australia, evidence collected through the Coronial process is the richest source of existing collected data relevant to suicide 5 . Coroner's Court data provides a flexible opportunity for retrospective investigation that is suitable for both quantitative and qualitative methods, by reporting in detail the background factors contributing to a reportable fatality (e.g. Milner 6,7 ).
The National Coronial Information System (NCIS) was an initiative of the Australian Coroner's' Society to instrument the quantification of risk factors for suicide as part of the Court's role in safeguarding public health and safety 8 . Data are routinely suggest that other more complete databases could be leveraged to remedy these discrepancies.
The work of Fernandez et al. 19 and Zheng et al. 20 have highlighted a number of challenges in applying NLP to the identification of suicide or suicidal behaviours in bodies of text. These include the rarity of such events, especially given that many suicides occur following hospital admission and are therefore not recorded in EMR documentation. The rarity of such events raises an associated issue of imbalanced data, which can pose significant problems for automatic classification algorithms. Finally, EMRs are designed to yield objective clinical utility during time of admission and often do not contain information relevant to when the suicide actually occurs, often following admission.
Given the rare occurrence of suicide, there are suggestions that large publicly available datasets, such as the national death index could be leveraged to provide a richness of input to better optimise machine learning algorithms (29). We propose that the investigative Coronial process into suspected suicide by Coroner's Courts of Australia can provide such richness of data input. Coroner's Court case files confer benefits over other data repositories (e.g. EMRs) by describing, as a matter of procedure, the psychosocial background, as well as the psychological and emotional states of the deceased in close proximity to time of death. As such, Coroner Court data is likely to reflect suicide within the context of the complex interplay of risk and protective factors, including a person's biology, psychology, social environment and life experience that can greatly assist in targeted prevention initiatives 4,28 .
In this study, we seek to extend on previous work 19,20 by applying a comprehensive suite of machine learning methods to the analysis of Coroner's Court case findings. We aim to describe NLP tools used to classify Coroner's Court findings by a determination of death by suicide (using GBA and a Random Forest classification Algorithm (RFA)), according to mental health diagnosis (dictionary-based approach) and sex of deceased (search string based approach). We further aim to evaluate the performance of our approach against manual coding. As additional validation, we employ Binary Logistic Regressions to evaluate the strength of the relationship between deaths by suicide and a number of known psychiatric risk factors (as classified). Finally, we present a case study to illustrate our approach, We classify case findings according to incidence of family violence. We use topic clustering to generate a number of insights into possible predictors of family violence and then classify case findings by additional variables including alcohol and drug use and service utilisation. We then use Binary Logistic Regression to evaluate the strength of relationship between case findings classified by family violence and several risk factors including alcohol and drug use, mental health diagnosis, and service utilisation.

Document characteristics and topic clustering
The NSW pre-coded case findings were initially separated into test and validation datasets, using a 70:30 split (test dataset n=398; validation dataset n=171). The number of words in each case finding ranged between 484 and 258,979 (µ=6,406, sd=8,859) (c.f. table 1). Nine topics resulted in the most parsimonious configuration. Table 3 lists the top fifteen terms occurring in each topic cluster. Each topic suggests a clear and distinct thematic cluster, indicating good face validity. For example, Topic 2 refers to investigations of vehicular accidents, while others refer to suicide (Topic 5), maritime deaths (Topic 7) and deaths while incarcerated (topic 8). Only topics 5 and 8 were used by the GBA to classify the corpus documents as determinations of deaths by suicide. These topics were not evident in cluster models with fewer topic configurations. Table 2 describes differences in the numbers of suicides between Australian jurisdictions over a 10 year period. New South Wales recorded the highest number of suicides, followed by Victoria and then Queensland.

GBA
The GBA achieved an accuracy of classification of 83.3% in the validation dataset. This reflected a good balance between the true positive rate (Sensitivity =85.1%) and the true negative rate (Specificity = 79.1%). The probability that case findings identified by the algorithm as deaths by suicide were actual determinations of suicide was 91.0% in the validation dataset (Positive Predictive Value (PPV)) (c.f. table 5). These metrics of accuracy were stable between datasets (e.g. training accuracy = 81.4%; validation accuracy = 83.3%) suggesting a good fit to the data. Classification of the full corpus with this algorithm identified 24.2% (1078/4459) of publicly available Coroner's findings as determinations of death by suicide.

RFA
The RFA achieved an accuracy of 75.4% in the validation dataset. This corresponded to values of Sensitivity of 73.0% and Specificity 81.6%. A PPV of 91.3% in the validation dataset was also achieved, comparable with the GBA approach (c.f. table 6. The sharp decline in sensitivity values (97.0% to 73.0%) between training and validation datasets however suggested overfitting, despite precautions taken via cross-validation and early stopping. Classification of the full corpus with the RFA identified 14.15% (631/4459) of publicly available Coroner's findings as determinations of death by suicide, somewhat lower than our GBA.

Unsupervised dictionary based algorithm performance
The dictionary based approach achieved an accuracy of 98.3% in the classification of mental health diagnoses. This corresponded to values of sensitivity (99%) and specificity (97.7%). Using this approach, mental health diagnoses were attached to all 4459 Coroner's Court case files where relevant (63.7% = 2940/4459).

Search string based determination of sex of deceased
Our novel approach to classifying court findings in terms of the sex of the deceased yielded high levels of accuracy. Using this search string approach we obtained levels of accuracy of classification of 97.4% (sensitivity = 100.0, specificity = 90.91%, PPV = 96.4%).

Error Detection
As part of our auditing processes, it was important to investigate the nature of misclassified suicide case findings in our validation data set. Out of the nine misclassified suicide cases 2/9 (22%) involved sieges and lengthy police negotiation that resulted in suicide of the person of interest; a further 2/9 (22%) involved either mandatory psychiatric evaluation or imprisonment in remand, and were initially assessed by attending personnel as displaying low risk of self-harm; a further 2/9 (22%) involved murder suicides; while the remaining 3/9 (33%) involved death by misadventure: falling from a height, vehicular collision and location of the deceased after being missing. In each of these cases drug intoxication and/or significant mental health impairment were also implicated.
In contrast, where our algorithm incorrectly classified the case finding as constituting a death by suicide, 6/16 (38%) were either accidental (e.g. drowning of an infant; vehicular accident; drug overdose) or misadventure (e.g. combined drug toxicity, absconding from psychiatric detention). In three cases (19%) the cause of death was determined by the Coroner to be natural, but upon closer inspection perhaps warranted further investigation (e.g. "was left alone for 15 minutes. . . [deceased] later found in the shower without the water running"; "elevated blood levels of prescription drugs"). Three cases (19%) involved possible murders, where the exact cause of death could only be speculated upon, but also where suicide was discussed as a possible consideration. A single misclassified case involved a police shooting, in which the deceased had approached police in a psychotic state.
Only a small number of case findings were misclassified in terms of mental health disorder by our dictionary based algorithm, where several different diagnoses were discussed within the one file. Similarly, only a small number of case findings were misclassified on sex of deceased. In almost all cases, these discussed multiple fatalities.

Validation by Binary Logistic Regression
We used Latent Semantic Analysis to derive a number of themes from a subset of family violence related case findings. Thirty clusters produced a model with a number of insights. Table 9summarises the top five words in each of a selected number of clusters. Binary logistic regression analyses were performed to further confirm the validity of our classification methods, by comparing odds ratios against extant literature. A regression model tested the amount of variance explained in classified findings of suicide by mental health diagnosis. Mental health diagnosis (as identified by the unsupervised dictionary-based classifier) was a significant predictor of determinations of death by suicide, χ 2 =517.46, p<0.001, Pseudo r 2 =0.16, AUC=64.8%. The adjusted odds ratios of a determination of death by suicide were significantly elevated when keywords indicating a diagnosis of depression were also present (p<0.001, OR = 2.86, 95% Confidence Interval (C.I.) = 2.33 -3.51) and also elevated for diagnoses of Generalised Anxiety Disorder (p= 0.006, OR = 1.39, 95% C.I. = 1.10-1.76), Schizophrenia (p<0.001, OR = 1.56, 95% C.I. = 1.25 -1.94) and Obsessive Compulsive Disorder(p= 0.03, OR = 2.18, 95% C.I. = 1.10 -4.32) (c.f. Table 4).

Case Study
Two hundred and twenty-nine case findings were identified as referring to family violence. Thirty clusters produced the most parsimonious topic solution. A number of candidate thematic clusters offered novel insights into underlying factors evident in family violence case findings. These are described in table 9. Binary logistic regression analysis was performed to investigate the odds of a number of factors in predicting family violence cases across the corpus as a whole. The combination of predictors including deaths by suicide, mental health diagnosis, sex of deceased, alcohol use, drug use and service utilisation was significantly predictive of family violence cases across the corpus as a whole, χ 2 =256.31, p<0.001, Pseudo r 2 =0. 16. The adjusted odds ratios of a classification of family violence were significantly elevated where a death by suicide was also classified (p<0.001, OR = 2.51, 95% C.I. = 1.87 -3.37)and were also elevated for alcohol use (p<0.001, OR = 3.91, 95% C.I. = 2.33 -4.41), drug use (p<0.01, OR = 1.55, 95% C.I. = 1.09 -2.17), (p<0.001, OR = 2.51, 95% C.I. = 1.87 -3.37) and service utilisation (p<0.001, OR = 2.49, 95% C.I. = 1.73 -3.69). The adjusted odds ratios of a classification of family violence were also significantly elevated where several mental health diagnoses were mentioned, including depression (p<0.05, OR = 1.46, 95% C.I. = 1.03 -2.06), Obsessive Compulsive Disorder (p<0.001, OR = 9.12, 95% C.I. = 3.35 -23.97) and Schizophrenia (p<0.001, OR = 25.0, 95% C.I. = 9.95 -54.04). Table 7 describes the proportions of cases of family violence evident across the corpus by year of investigation.

Discussion
We aimed to demonstrate a proof of concept suite of NLP approaches to classify a large corpus (n=4459) of publicly available Australian Coroner's Court case findings in terms of a determination of death by suicide, presence of mental health disorder and sex of the deceased.
We found that a GBA resulted in high levels of accuracy (83.3%; sensitivity = 85.1% and specificity = 79.1%) when classifying the corpus of documents on determinations of death by suicide, when compared with a subsample of pre-coded case files. This approach proved superior to the baseline RFA (Accuracy = 75.4%; Sensitivity = 73.0%; Specificity = 81.6%). Furthermore, when a dictionary approach was employed, we achieved accuracies of 98.3% (sensitivity = 99%, specificity = 97.7%) in the classification of mental health disorder, when compared to the same pre-coded subsample. Finally, we obtained high rates of accuracy when classifying sex of deceased using a novel search string approach (accuracy = 97.4%, sensitivity = 100%, specificity = 90.91%). To the best of our knowledge, this is the first investigation to use NLP methods to classify Coroner's Court case findings in this way.
We demonstrated superior performance using the GBA rather than the RFA. This is likely due to key differences in how each algorithm classifies known cases to reduce classification error. RFAs randomly sample from the pool of topic clusters and build individual decision trees using this evidence. The developed trees are then averaged to reduce error variance. This strategy is less successful in managing bias due to imbalanced data, such as in the present study where deaths by suicide present in only a minority of cases (24%). In contrast, a GBA grows decision trees by re-weighting in favour of weaknesses in the model, thus adding successive trees until the error of classification is minimised. Where the data is imbalanced, GBA is more likely to achieve a more accurate classification, as we have demonstrated.
The use of NLP in the detection of suicide is new, and as such little comparable data is available. Despite this, our findings compare favourably with the predictive accuracies demonstrated in work classifying EMRs in terms of deaths by suicide. Notably, our supervised machine learning approach using the GBA achieved better precision (positive predictive value) than did Fernandes 19 , who used a hybrid dictionary approach with rule-based post processing (91.0% vs. 82.8%). However, Fernandes achieved higher levels of sensitivity than our GBA (98.2% vs 85.1%). Our approach was more successful in estimating the likelihood of identifying a true determination of death by suicide, but was not as good at detecting non-suicidal findings. Therefore our approach is the more compelling when the aim is the accurate identification of case findings featuring death by suicide.
Our approach is also comparable to the use of other NLP methods in classifying determinations of outcome of appeal reached by international courts of law 25,29 . Virtucio and colleagues 25 classified the outcome of 27,492 appeals made to the Philippines Supreme Court. Virtucio employed a similar methodology to this study: topic clustering followed by RFA but achieved reduced accuracies of classification of 68%. This suggests advantages in using our GBA when classifying binary decisions. The reduction in accuracy can be attributed to the form of sampling used by Virtucio to constitute the reference dataset: one based on string searches for relevant keywords. The high levels of accuracy achieved by our approach suggests that the manual pre-coding of case findings, based on jurisdiction with the highest proportion of determinations of death by suicide, confers advantages by optimising upon the presence of keywords of interest, and establishing a valid ground truth.
There was a degree of equivocation in the language used by each Coroner, with multiple possible causes of death often being considered. Where our algorithm incorrectly classified a case as a determination of death by suicide, there may be benefit in highlighting such cases for further investigation, as these often feature a complex nexus of events, where single causes can often prove difficult to establish prima facie.
Case findings differed in the nature and quality of evidence used to reach a determination of cause of death. Coroner's seek to apply the Briginshaw standard, which demands a higher level of evidential proof where an allegation is either serious or unlikely, or where there is a gravity of consequences stemming from a particular finding, as is generally the case in determinations of death by suicide 30 . However, there are differences in how this principle is applied by each Coroner, with some demanding a very high standard of proof, and others arriving at a determination of suicide with relative ease 30 . Our classification algorithm differed at times from the Coroner, where they have relied upon their expert judgement to withhold a determination of suicide, despite textual references to the contrary.
The rates of accuracy for our dictionary based classification of mental health diagnoses are unsurprising, given the consistent way in which psychiatric diagnoses were referenced by Coroners throughout the corpus. It is common when carrying out an investigation into a fatality of unclear cause, that a medical professional is called upon to provide supplementary evidence.

5/18
These professionals utilise extant taxonomy (e.g. ICD-10; DSM-V) to reference particular psychiatric profiles, ensuring consistency of language across investigations. Dictionary based classification methods are therefore most appropriate where there is little ambiguity in the language used.
Our high rates of accuracy in the classification of a range of mental health disorders are comparable to other investigations. A recent study by Karystianis and colleagues 31 compiled a dictionary of terms describing a range of mental health disorders using ICD-10 taxonomy to analyse precipitating factors in 492,393 family Violence related police reports in New South Wales across a 10 year time span. Karystianis obtained a similarly high degree of specificity (92.5% among victim and 84.6% among persons of interest) and precision (positive predictive value = 96.1% victim, 99.3% person of interest). Karystianis attributed the lower rates of specificity to the nature of the police reports, being written by non-experts in mental health diagnosis.
Our dictionary approach also exceeded the levels of precision (97.1% vs. 91.7%) and sensitivity (99.0% vs. 87.8%) obtained by Fernandes 19 who also employed a dictionary informed approach. These differences can be explained by the different constructs investigated. It is likely that suicidal ideation may have been referred to inconsistently in the EMRs investigated by Fernandes, and may not have been identified by the dictionary as successfully (e.g. suicidal thoughts, planning etc.). It is likely that the accuracy achieved in our study takes advantage of expert testimony from medical personal as is common to evidence gathering processes in Coroner's Court inquests.
We explored the associations between mental health diagnosis and a determination of suicide within Coroner case findings using a Binary Logistic Regression. This showed well documented associations between psychiatric diagnoses and deaths by suicide. An international systematic review and meta-analysis demonstrated a comparable odds ratio for suicide (OR = 2.20) among patients with more severe depressive symptomatology, that can be expected to also characterise deaths by suicide in the present study 32 . Similarly, a recent systematic review of 50 longitudinal studies found a similar odds ratio of death by suicide attributed to a diagnosis of Schizophrenia of 1.40 32 . Our findings of an elevated odds ratio for suicide (OR=2.18) with a diagnosis of Obsessive Compulsive Disorder were slightly lower than those revealed by a recent systematic review of 63 international studies 33 (Odds Ratios of between 3.02 -9.08). it is notable that 36.3% of suicide determinations did not involve a mental health diagnosis. It is possible that in specific circumstances, such as for those living in isolation or rural settings, psychiatric history may not be available. This requires further investigation for which supervised NLP may be useful.
To demonstrate our approach, we present a short case study that analyses the odds of several variables in predicting case findings of family violence across the study corpus. Latent Semantic Analysis was used to derive a number of insights using a subset of family violence related case findings. We then used our suite of algorithms to classify cases of deaths by suicide (GBA), mental health diagnoses (dictionary based approach), alcohol and drug use, and service utilisation (string search). We found that the odds associated with several of these variables, including deaths by suicide, alcohol and drug use, diagnoses of depression, Obsessive Compulsive Disorder and Schizophrenia, and service utilisation were significantly elevated for cases of family violence, as classified. Importantly, case findings featuring this mix of variable have been evident within Coroner case findings investigated from at least 2009. Notably, those case findings that were classified as featuring family violence were 150% more likely to have died by suicide, 188% more likely for the deceased to have been female and 150% more likely to have utilised services prior to death. This illustrates several important points. Firstly that topic clustering can inductively elucidate a range of novel themes evident within a subset of the data. Secondly, these themes can suggest a range of candidate predictors for further investigation. Thirdly, with the use of our suite of classification algorithms, candidate predictors can be unearthed with relative ease. Fourthly, via Binary Logistic Regression, we can deductively estimate the contribution of each identified variable to the prediction of a desired outcome. Finally, by extracting the year in which the inquest was conducted, we have demonstrated that the Coroner's Court has been attempting to publicise the importance of family violence and the complex mix of psychosocial factors that often underpins these decisions, well before the world wide #MeToo movement gained momentum in 2017. Furthermore, family violence related fatalities were more likely to have been involved in services prior to death, suggesting that more could possible be done to alert services to this nexus of factors.
. These findings are not without their limitations. As a proof of concept application of NLP methodology, we chose to analyse a large corpus of publicly available Coroner's Court case findings. The proportions of deaths by suicide as classified are likely to differ if these methods were applied to more representative data repositories (i.e. NCIS). However, witness testimony is not a feature of these repositories, possibly impacting upon the variation in context within which suicide is discussed. Nevertheless, the proportion of deaths by suicide arrived at in the present study should be regarded with caution, given that only a fraction of Coronial inquests are made available publicly. Stratified random sampling across all Australian jurisdictions and by sex of deceased would ensure the training and validation datasets are more representative of the entire corpus. This simple modification might improve classification accuracy in future studies above the 83% achieved.
Certain stages of our analysis workflow were time consuming. Parts of speech tagging took in excess of 30 hours to reach a final outcome on all words in the corpus. Should our approach be applied to larger datasets, this timeframe would likely increase exponentially. It is unclear at this stage whether the time expense is warranted, and whether this was balanced by a subsequent reduction in time at later stages of analysis (e.g. topic clustering; supervised machine learning).
Similarly, the hyperparameters of both GBA and RFA took considerable time to refine. Perhaps the additional investment in time taken to manually optimise the algorithms via trial and error might best be reserved for real world applications where precision of classification is an imperative.
However, this study also had several strengths. We used the NSW case findings to form the pre-coded dataset to optimise upon references to suicide in text. As noted, the NSW Coroner's Court arrived at determinations of death by suicide in more cases than any other Australian jurisdiction. Our methodological approach was more likely to capture a higher probability of reference 34 to suicide than if we had randomly sampled from the corpus as a whole.
Also of note was our methodological decision to homogenise a range of synonyms under the single term 'suicide'. The reasons for the lack of specificity in the language of Coroners in referencing suicide is unclear, but has been speculated by other authors as relating to persistent community stigma around determinations of death by suicide 30,35 . A commonly agreed upon lexicon would aid in the robustness of any NLP based classification approach.
We also decided to retain many case findings which were a priori difficult to classify (e.g. multiple fatalities), thus reducing the levels of classification accuracy that could be obtained. Rather than overly sanitising the data, we deemed it of greater importance to demonstrate classification accuracies with real world case findings.
As a proof of concept, the suite of NLP approaches outlined in this study suggest future directions of enquiry. These approaches could be applied to problems of classification in larger, more comprehensive and representative national and state based suicide related databases. Our approaches could also be used to mine novel insights in extant data repositories such as case findings of the Coroner's Court to inform analyses reported in the real time surveillance of suicide system, as our case study suggests.
It would be useful to compare our use of GBA with other forms of machine learning classification (e.g. Neural Networks) to ascertain whether other approaches can yield higher levels of accuracy. NLP approaches could also be applied to the investigation of other psychosocial predictors of suicide, such as significant life events (e.g. bereavement or relationship breakup). Via our use of topic clustering, novel insights might also be achieved in these areas, helping us to better understand the possible interplay of risk and protective factors in underpinning the decision to suicide.
To the best of our knowledge, this study marks the first proof of concept application of supervised (Gradient Boosting) and unsupervised (Dictionary based; string search based) machine learning methods to Coronial Court case findings. When compared to a pre-coded sample, our predictive algorithm correctly classified determinations of deaths by suicide in the majority (84%) of Coroner's Court case findings. Furthermore, our dictionary based approach correctly identified instances of mental health disorder in nearly all cases.
The use of text mining has the potential to drill down to the level of individual words, potentially offering a valuable supplement to nationally aggregated suicide data. However, NLP approaches take time to develop and train, with certain steps taking considerable time to ensure adequate precision of prediction. Once developed however, this suite of approaches could provide valuable resources that supplement human based coding of suicide related databases. Such approaches might also identify complex patterns in the data via an exploratory approach rather than through traditional linear hypothesis testing.
Such approaches also involve ethical challenges, especially where NLP approaches reach a determination of death by suicide in the absence of Coroner designation, potentially altering the reputation of the deceased and memories of the living, which may also occur when human coding is employed, such as during pre-coding of case findings.
Our supervised machine learning approach demonstrates that with the aid of a small pre-coded sample of case findings, less clearly defined concepts such as suicide can also be successfully identified from text data. Such an approach, in particular our use of topic clustering, would be well suited to the analysis of free text sections of other databases, with possible e-health applications.
Furthermore, our use of unsupervised dictionary based methods are appropriate where the use of language is clear and unambiguous. Such dictionaries could be defined and refined using precoded categories extant in national and state based databases and when applied to new cases, could be used to identify the presence or absence of such information.
To realise the full potential of a real time surveillance of suicide system, we need to leverage data sourced from a range of stakeholders. The Coroner's Court undertakes the most detailed and comprehensive investigations available into deaths of an unexpected nature. Using this rich text based data available in Court case findings, we are well placed to illustrate the complexity of factors that underpin such events as deaths by suicide, family violence and others, once a suite of NLP approaches is leveraged. Furthermore, NLP approaches are scalable and can be applied to any length of document and any size of corpus. Once developed, such approaches provide economically inexpensive forms of analysis of court findings that can easily supplement human based forms of coding and enquiry.

7/18
Methods Data collection 4527 publicly accessible Inquest Findings and Findings without Inquest (hereafter: case findings) from the Coroner's Court of each Australian jurisdiction were accessed in September 2019 [36][37][38][39] . These reflected a reference period of between January 2009 and September 2019. A scraping tool was used to automatically access and download documents from Coroner's Court websites, adapting the methodology of Pina-Sanchez and Colleagues 23 . Optical Character Recognition (OCR) 40 accurately transcribed each .pdf file into .txt format. Twelve files (0.27%) were not able to be transcribed due to read/write protections and were discarded. A further 32 of the extracted case findings (0.71%) were removed from the analysis as they did not contain background information of the deceased relevant to answering the aims of the study. Table 1 shows the number of publicly accessible case findings extracted by Australian jurisdiction.

Data cleaning
Data cleaning involved removal of low value text, and was made feasible by an analysis of common features common to the structure of each document. Most case findings followed a common pro forma structure: file identification information (title, subject, jurisdiction and identify of the deceased), table of contents; background information or introduction (a summary of the case); sequentially numbered items of evidence detailing the deceased psychosocial background and the circumstances of their death; a determination of the manner and cause of death; recommendations; and concluding remarks (personally directed remarks from the Coroner, often acknowledging the grief and distress of the family members involved).

Text Content
Data cleaning of case findings removed pro forma sections unrelated to the study aims. This included removing the file identification information, table of contents, recommendations, and concluding remarks. This was performed using a Regex search for different keyword tokens (e.g. "Introduction"; "Background"; "1."), which a random selection of n =25 documents confirmed as corresponding to the beginning of the sequentially numbered summary of evidence.
Commonly used keyword tokens indicating the beginning of the personal remarks were not consistently identified across case findings, as such a Regex search was not feasible for identifying these sections of text. A random sample of documents (n = 25) confirmed however, that the final 250 characters of each file struck a balance between removing information unrelated to the study aims and retaining text referring to cause of death and was decided by the research team as an appropriate cutoff to be eliminated.

Text Length
Due to the variation in document length across the sample, visual inspection of a subset of case findings was conducted to determine the depth of information contained within shorter documents. A cut off of 4000 characters was set and 10 documents either side of this cut-off point were examined by the research team. Clear qualitative differences were apparent above and below this cut off, with a total of 24 files (0.52%) excluded from analysis that did not contain a sufficient depth of information to meet the aims of the study (e.g. highly redacted cases involving the death of a minor). A final sample of 4459 Coroner's Court case files were included in the final analysis.
The study design RStudio 41 was used to develop the data analysis workflow. The full workflow is illustrated in figure 1.

Sample precoding
Four hundred and seventy-two NSW case findings (10.6%) were pre-coded by RI. The pre-coded dataset was split 70:30 between training and validation. This split was performed 500 times, stratified by determination of suicide. Each split was inspected on the metrics of mean, sd, median, skewness and kurtosis of document length to ensure the training and validation datasets were comparable.

Feature engineering
Nouns, verbs, adjectives, adverbs and pronouns were retained using parts of speech tagging. Eliminating other low value words did not reduce accuracy of prediction overall, but was important in reducing overall processing time. N-Grams <442 were included using Rapid Automatic Keyword Extraction (Rake) (udpipe r-package 42 ). The keyword 'suicide' was expanded to include other synonyms commonly used by different Coroners (i.e. 'suicidal', 'kill her/himself', 'deliberate', 'intentional'). Term document weighting using square root/residual inverse document frequency weighting was used to balance rare with common words42 (Lingmatch r-package 43 ). Latent Semantic Analysis was then used to derive the desired numbers of topic clusters GBA and RFA An XGBoost GBA was used to classify deaths by suicide and was compared against a RFA benchmark. Several model parameters including values of eta, gamma, maximum tree depth, minimum child weight, subsample and column sample by tree were optimised using an automated grid search with 15000 possible combinations. These parameters were then manually adjusted to further optimise upon values in specificity and sensitivity in validation dataset.

Dictionary based approach
Several OCD-10 psychiatric diagnoses were used to develop the dictionary search strings. Once classified using this approach, Keyword in Context was used to investigate each dictionary term within the sentence context of usage on a random sample of classified case findings to ensure the terms identified corresponded with the desired context.

Case study
We used a string search using the keywords 'domestic' to identify those case findings referring to family violence. Keyword in context was used on a random sample of n=10 case findings to ensure good face validity. Latent Semantic Analysis was used to derive a number of thematic clusters using this subset of case findings. Candidate clusters were inspected for face validity. Terms evident within several novel thematic clusters then informed the search string approach to variable development (e.g. drug use theme = string search 'drug'). Thus we employed a string search algorithm to search all corpus case findings for instances of the terms 'alcohol', 'drug', and 'service'. Keyword in Context was also used to ensure that each term corresponded with the constructs alcohol and drug use and service utilisation respectively. A further string search was used to extract the date of findings.