DeepBiomarker2: Prediction of alcohol and substance use disorder risk in post-traumatic stress disorder patients using electronic medical records and multiple social determinants of health

Introduction: Prediction of high-risk events amongst patients with mental disorders is critical for personalized interventions. In our previous study, we developed a deep learning-based model, DeepBiomarker by utilizing electronic medical records (EMR) to predict the outcomes of patients with suicide-related events in post-traumatic stress disorder (PTSD) patients. Methods We improved our deep learning model to develop DeepBiomarker2 through data integration of multimodal information: lab tests, medication use, diagnosis, and social determinants of health (SDoH) parameters (both individual and neighborhood level) from EMR data for outcome prediction. We further refined our contribution analysis for identifying key factors. We applied DeepBiomarker2 to analyze EMR data of 38,807 patients from University of Pittsburgh Medical Center diagnosed with PTSD to determine their risk of developing alcohol and substance use disorder (ASUD). Results DeepBiomarker2 predicted whether a PTSD patient will have a diagnosis of ASUD within the following 3 months with a c-statistic (receiver operating characteristic AUC) of 0·93. We used contribution analysis technology to identify key lab tests, medication use and diagnosis for ASUD prediction. These identified factors imply that the regulation of the energy metabolism, blood circulation, inflammation, and microbiome is involved in shaping the pathophysiological pathways promoting ASUD risks in PTSD patients. Our study found protective medications such as oxybutynin, magnesium oxide, clindamycin, cetirizine, montelukast and venlafaxine all have a potential to reduce risk of ASUDs. Discussion DeepBiomarker2 can predict ASUD risk with high accuracy and can further identify potential risk factors along with medications with beneficial effects. We believe that our approach will help in personalized interventions of PTSD for a variety of clinical scenarios.


Introduction
Posttraumatic stress disorder (PTSD) and alcohol and substance use disorder (ASUD) often co-occur, with an estimated prevalence of ASUD amongst individuals with PTSD of 46% in the United States alone 1 . Patients with ASUD experience higher rates of PTSD, with the highest rates reported in patients with both alcohol and drug use disorders 2 . Current literature has enlisted mechanisms that may explain this co-occurrence: (1) ASUD could exacerbate the risk of developing PTSD, as patients tend to lead a high risk lifestyle which increases the chances of being exposed to or experience traumatic event (e.g.: sexual assault under the in uence of substances) 3 ; (2) PTSD development can precede ASUD as patients use substances to self-medicate their PTSD symptoms 4 ; (3) Genetic in uences on the onset, maintenance, or etiology of both disorders 4,5 ; and (4) Multiple aspects of well-being such as psychosocial risk and protective factors extracted from social determinants of health (SDOH) 6 could be related to a shared underlying factor affecting the overall quality of life. While various studies suggest that both PTSD and ASUD share common dysfunctions pertaining to numerous biological systems, it is paramount to determine multiple risk and bene cial factors that are responsible for development of ASUD in PTSD, and this can be achieved via application of novel analytic technologies to data-mine EMR data from these patients. In addition, current treatment of PTSD and ASUD is limited 7 .
Social determinants of health (SDoH) are "conditions or environments in which people are born, grow, live, work, and age," 8 . Five key SDoH domains that have signi cant impacts on human health are: (1) economic stability, (2) education, (3) health and health care, (4) neighborhood and built environment, and (5) social and community context 9 . These complex, integrated, and overlapping social and economic systems in turn are responsible for most health inequalities and poor health outcomes existing today. Electronic medical records (EMRs) are an important component of clinical practice and documentation. However, a major limitation of EMRs is the lack of reliable SDoH information and documentation, which is strongly associated with mental health 10 . EMRs collect clinical information such as diagnosis, medication use, laboratory test results, vital signs, procedures, and other data in a systemic fashion. Other nonclinical determinants of health such as age, race, ethnicity is collected in a structured EMR format. Current studies have linked a variety of SDoH parameters such as neighborhood socioeconomic status (nSES) indicators with disease risk factors to improve the accuracy of risk prediction models 11,12 . By extrapolating information obtained from both conventional (e.g., EMR sources) and non-conventional sources (e.g. SDoH databases, clinical EMR notes, census data), one can use this "big data" for risk prediction and development of interventions to improve multiple clinical outcomes especially focusing on high risk patients such as patients with multiple comorbidities. However, to the best of our knowledge, few studies assess the importance of this wide range of multimodal information such as diagnosis, medication use, laboratory test results, individual level SDoH indicators (e.g., race, age, gender, etc.) and neighborhood level SDoH indicators (e.g., nSES index, etc.) in the prediction of outcomes of patients with mental disorders.
Deep learning/data mining algorithms translate "big data" into valuable information for hypothesis generation through deep hierarchical feature construction to capture long-range dependencies in EMR data 13 . Deep learning techniques learn certain features directly from the data itself, without any human guidance, thus allowing for automatic discovery of latent data relationships that might otherwise be hidden 14 . By analyzing the EMR of PTSD patients, we can nd risk factors or medications that can have the potential to reshape the trajectory of disease progression and we can then use these ndings to design better treatment for those patients. In this study, we improved our previous deep learning based Deepbiomarker model to develop our latest version, DeepBiomarker2 by re ning the relative contribution analysis for the identi cation of important features and integrating multiple SDOH parameters in our model. We then applied DeepBiomarker2 to PTSD outcome prediction, provided re ned results speci c to high-risk cohorts, proposed new interdisciplinary hypotheses and identi ed risk/protective medications for prevention of PTSD developing to ASUD. With the inclusion of such multimodal information, we anticipate that health professionals can improve the classi cation of patients based on their complexity and heterogenicity, develop targeted interventions to ful l various clinical needs, transform health care by providing integrated services, build strong community partnerships with other health care providers for better health outcomes and reduce existing health disparities at lower costs.

Data source
We included data from January 2004 to October 2020 from the Neptune system at the University of Pittsburgh Medical center (UPMC) which houses EMR from the UPMC health system for research purposes (rio.pitt.edu/services). The database includes multimodal information: demographic information, diagnoses, encounters, medication prescriptions, prescription ll history, and laboratory tests. We used the ASUDs after the diagnosis of PTSD. The PTSD and ASUD patients were identi ed byICD9/10 codes ('309·81', 'F43·10', 'F43·11' and 'F43·12') and by ICD9/10 codes (See Appendix A) respectively 15,16 The medication lls include medications that a patient had lled at commercial pharmacies, collected and reported by the clearing house SureScripts. The EMR data processing steps are rigorous and hence the chance of data missing is low.

Data preparation
The data preparation was done in a similar fashion to that described in our previous paper DeepBiomarker 17 : For a given PTSD patient without previous diagnosis of ASUD at an index date, our primary aim was to predict whether the patient will experience ASUDs within next 3 months. In our study, a case was de ned as a PTSD patient who had a record of ASUD within the next 3 months, while a control was de ned as a PTSD patient with no records of ASUD in the next 3 months after the index date. The index date can be any encounter date after the PTSD diagnosis but before the rst diagnosis of ASUD. If a patient had multiple encounters satisfying the inclusion criteria, we only considered the latest encounter to mimic the latest status of the patient. We excluded patients that had the diagnosis of PTSD and ASUD in the same day or experienced ASUD before PTSD diagnosis. The patient was also required to have no record of ASUD within the period of one year before the index date to negate the possibility of previous history of ASUD. We used multimodal information such as diagnosis, medications, and lab tests result 1 year preceding the index date and SDoH information at the index date as the input. We speci cally included lab test results that had high frequencies and were coded as abnormal by searching results that were labeled as "ABNORMAL", "HIGH" or "LOW". We did not consider lab test results with low frequency (less than 100 times). We clustered diagnosis codes to diagnosis groups based of the rst 3 letters of their ICD-10 codes. Medication names were converted to their respective unique DrugBank IDs. And lastly, for each encounter the associated multimodal information: diagnosis, medication and lab tests were packed into a sequence based on their respective disease categories, DrugBank IDs and lab test IDs.

Dataset splitting
We split our dataset with a ratio of 8:1:1, where 8 subsets were used as the training dataset, while one subset was used as the validation dataset to nd the optimal parameters and the last 1 subset was used as the test set to evaluate the generalization of our model.

SDoH data
For each PTSD patient, we included both the individual level SDoH data and neighborhood level SDoH data. Individual level SDoH features such as race, age and gender were extracted from the demographic information in EMR and coded similar as diagnosis codes to input in the models. We also used neighborhood level SDoH features (see appendix) such as racial segregation, neighborhood socio-economic status, percentage of non-citizens, person of color index, normalized difference vegetation index, aridity index, percentage of foreign born residents, percentage of male widowers, percentage of US citizens, percentage of households with limited English pro ciency, income segregation, percentage of population with same sex marriage, urban index, percentage of population who are separated from their partner, percentage of population who is a single parent, and percentage of households with transportation barriers were calculated using their respective formulas, extracted from the American community survey (ACS) (See supplementary information about SDoH parameters). Neighborhood level SDoH factors are geographically derived neighborhood level SDoH parameters that can be used for assessment of health care utilization 18 . The ACS is a rolling survey of the US population that gathers information, such as ancestry, educational level, income level, language pro ciency, migration status, disability status, employment status, and housing characteristics, across 1298 variables 19 . The ACS releases estimates at the regional, state, and county level every year, and data at the census-tract, block-group and zip code levels are available every 5 years. For our study, a patient's zip code-5 codes at the index date were used to identify their SDoH parameter. Previous studies have used all of the above indexes to represent a geographical area-based measure of the socioeconomic deprivation experienced according to their neighborhood 20 . Collectively, the data was mapped and later used as input in our model.

DeepBiomarker2
We adopted the Pytorch_EHR framework established by Zhi Group where Deep learning models based multiple recurrent neural networks 21 were used to analyze and predict clinical outcomes 22 . In conjunction with their algorithm and our previous DeepBiomarker model, we further modi ed the framework as DeepBiomarker2 by (a) integrating individual lab tests, SDoH parameters and medications along with the diagnosis groups as the input, so that we can assess the important clinical and nonclinical factors associated with ASUD risk; and (b) re ning contribution analysis 23 module by re ning the relative contribution analysis for the identi cation of key factors (see below for more details). Contribution analysis is a method used to calculate the relative contribution of key factors which are identi ed by determining the observed change in the model.
In our study, we followed the same parameters as our previous versions: embed dimension: 128; hidden size: 128; dropout rate: 0·2; number of layers: 8; input size: 30000; patience: 3. To estimate the standard deviations of the accuracy, we repeated our calculations ten times for each of the algorithms.

Statistical analysis
Assessment of importance of the clinical factors for predicting ASUD events In order to examine the importance of the clinical factors for prediction of ASUDs, we calculated the relative contribution (RC) of each feature on the ASUD 23 . The RC of a feature was calculated as the average contribution of the feature to events divided by the average contributions of this feature to no-events. The contributions were estimated by a perturbation-based approach 24 . The equation is shown as follows where FC represents the feature contribution: FC value was the total value of the feature within the same patient if the feature appeared more than once in that patient, where m and n are number of patients with and without an event, respectively. The natural logarithmic form variance for RC was calculated as: sd: standard deviation Thus, the 95% con dence interval (CI) of RC was given by: And the p-value was under the assumption of z distribution 25 . Bonferroni correction 26 was used to reduce the type I error caused by multiple comparisons. The False discovery rate (FDR) adjusted p value is the ratio of the number of false positive results to the number of total positive test results 27 .
We improved our assessment by normalizing the FC value and scaling the RC value for all our features. The improved FC formula is the ratio of the summary of the contribution of a feature and the summary of the contribution of all the features. Next, we performed scaling, where the RC for a PTSD diagnosis was scaled to 1 and the factor generated was multiplied to get the nal RC value for each of the other features. FC normalization accounts for heterogeneity in the numbers of encounters across patients, which may otherwise in ate the contribution of features observed in patients with higher healthcare utilization.

Assessment of model performance
The model performance was evaluated by the area under the ROC curve (AUROC), we used both deep learning and logistic regression to compare the performance of our model.

Results
The performance of DeepBiomarker2 on the ASUD prediction We identi ed 38,807 PTSD patients from UPMC EMR data. And we further identi ed 7,927 cases and 7,685 controls from patients with more than 1 year of EMRs before the diagnosis of PTSD ( Supplementary Fig. 1). Those samples were split to 8:1:1 ratio for training, validation, and test sets. The performance metrics of the DeepBiomarker2 can be found in Table 1. As shown in Table 1, Deep learning models: TLSTM and RETAIN algorithms implemented in DeepBiomarker2 all showed excellent performance on ASUD prediction, i.e., all yielded an AUC ≥ 0·90. The performance of deep learning (AUC above 0·93) was better than LR (0·85). The performance of models with SDoH are slightly better than those without SDoH.

Important indicators for the ASUD prediction
As we described previously, we followed a perturbation-based estimation approach to calculate the relative contribution of each feature on the prediction of ASUD (Supplementary table 1). Table 2, Table 3, Table 4 and Table 5 enlist the top important abnormal lab tests, medication use, diagnosis and SDoH parameters respectively. We can see that in Table 2, HGB with RC of 1·305, along with other abnormal lab tests with RC > 1 are risk factors for ASUD. In Table 3, pain medications such as acetaminophen and oxycodone both have an RC of RC > 1 which are categorized as risk factors for ASUD while medications such as oxybutynin and cetirizine have an RC < 1 and are categorized as protective factors for ASUD. In Table 4, diagnosis such as routine lab examinations are categorized as protective factor for ASUD (RC = 0·53) while emergency visit is categorized as risk factor for ASUD (RC = 1·459).     2). These biomarkers are strongly correlated to ASUD and PTSD along with its implications on adjoining diagnosis and medication use. These lab tests are indicators of underlying comorbidities and thus can be considered as measurements of disease burdens. It is not a surprise that the lab tests increase the risk of ASUDs (RC > 1, see Table 2).

Discussion
We improved our deep learning model DeepBiomarker2 to predict the risk of ASUDs in PTSD patients based on the abnormal results of regular lab tests in the last one year together with the diagnosis and medications used in the same period as well as SDoH parameters. The model yielded very good performance with AUC score above 0·93. The improvement might be from the fact that DeepBiomarker2 can also consider the sequential information of these multimodal features. To further re ne our understanding of speci c biomarkers pertaining to PTSD and ASUD, we have categorized our top biomarkers based on their types:

Biomarkers closely related to PTSD and ASUD
In ammatory-based biomarkers. We have identi ed two in ammatory-based biomarkers potentially useful for assessing the risks of ASUDs in PTSD patients. Current research has emphasized the importance of incorporating in ammatory biomarkers in risk prediction models to further catapult mental disorder research efforts. Several studies showed PTSD patients had elevated levels of WBC and neutrophil levels in their system on account of the activation of multiple in ammatory pathways. Abnormal WBCs and neutrophil levels express tissue function and release pro-in ammatory and pro-coagulant molecules promote thrombus formation, including platelet activation and adhesion which potentially increase the risk of cardiovascular disease in these patients 28 . Another study found low WBC and neutrophil levels in alcoholics but higher WBC levels in cannabis, inhalants, tobacco, opioid users, and no signi cant levels of WBCs amongst cocaine users. While low neutrophil levels were seen in cannabis users and high levels in inhalants and opioid users [29][30][31][32][33] .
Heme-based biomarkers. The PTSD patient population might have pathologies that are related to hematopoiesis, in ammation, endothelial function and coagulability depending upon abnormal levels seen in these patients 34 . Current epidemiological studies have shown RBCs may interact with the in ammatory system and platelets. Once, exposed to oxidative stress, they acquire a senescent phenotype promoting a pro-in ammatory and pro-atherogenic state 35 . We found elevated levels of hemoglobin, red blood cells, hematocrit, RDW, MCH and MCHC in depressed patients. We also found low hemoglobin levels in alcoholics, cannabis, and heroin users 36 . While other heme biomarkers are discussed in Supplementary Table 1. To the best of our knowledge, we would be the rst to propose these heme biomarkers as possible biomarkers for risk of ASUDs in PTSD patients.
Liver-based biomarkers. Albumin is a protein shown to possess free radical scavenging properties that act as a selective antioxidant. There are studies examining the role of serum albumin levels in patients with psychiatric diseases 37,38 . Current research hypothesizes that low serum albumin levels in depressed patients might be due to activation of in ammatory responses in these patients [39][40][41] . Another study found low serum albumin levels in drug addicts while higher serum albumin levels in alcoholics in an emergency department setting 42 . However, it is important to thoroughly examine the association between developing ASUD risk and albumin to demonstrate it as a prognostic biomarker amongst PTSD patients.

Effect of medication use on PTSD for ASUD prediction Risk Factors:
Tamsulosin: Tamsulosin is a selective α1A-adrenergic receptor antagonist which has greater cardiovascular tolerability as compared to other nonselective α-adrenergic receptor antagonists. However, recent population-based studies have shown that benign hyperplasia (BPH) patients under tamsulosin treatment have increased incidence of dementia and cognitive impairment than in the no-BPH-medication cohort 43 . The dose-response analysis done also revealed a higher risk of dementia at higher doses in the tamsulosin BPH cohort (HR, 1.12-1.49). Dementia observed in these cohorts can be attributed to two of tamsulosin properties: (1) detrimental impact on the prefrontal cortex promoting drug seeking behaviors 44 and (2) high blood brain barrier penetration 45 in SUD patients. Although tamsulosin has strong pharmacokinetic pro les that may suggest cognitive impairment, certain clinical studies have shown otherwise 46 . However, they may cause neuro-affective disorders such as depression and further investigation is necessary.
Metronidazole: Based on the medical notes, bacterial vaginosis, esophagitis encephalopathy, and endocarditis are the popular indications for Metronidazole in these PTSD patients. PTSD has been implicated in the etiology of various gastrointestinal (GI) disorders and urinary tract infections (UTI). While multiple possible mechanisms associating PTSD and GI disorders are proposed: changes in autonomic nervous system function impact the gut-brain axis which leads to hypothalamic-pituitaryadrenal axis dysregulation and followed by abnormal cortisol levels. This along with changes in stress levels coupled with behavioral risk factors such as smoking, alcohol use and medications further exacerbate its effects 47 . Current clinical research examining the association between PTSD and GI disorders have yielded mixed evidence due to differences in methodologies of the studies. Esophagitis is one of the complications of gastrophageal re ux disease, where there is esophageal mucosal injury caused by medications. Drugs such as metronidazole damage the esophageal wall by causing a direct toxic effect on the esophageal mucosa, which produces a caustic effect by creating a highly acidic or alkaline environment 48 . The pathogenesis of UTI begins with contamination of the periurethral space by uro-pathogens such as Escherichia coli (E.coli) residing in the gut 49 , After this so called "intestinal bloom of uro-pathogens", it is followed by colonization of the urethra and ascending migration to the bladder 50 . Studies have further described the gut microbiota-UTI axis where the increased abundance of E. coli in the gut was associated with future development of E. coli bacteriuria and E. coli-induced UTI. They found that the E. coli strains in the gut resembled to the E. coli strain in the urine, thus supporting the hypothesis that gut microbiota is a source of UTI 51,52 . An overgrowth of gardnerella vaginalis in the genitourinary tract of women with bacterial vaginosis is associated with higher risk of sexually transmitted infections, postsurgical complications, infertility, cervical infections and cancer 53 . A case study reported a patient with Gardnerella vaginalis bacteremia with severe encephalopathy 54 . Acute encephalopathy and liver disease is another common emergency occurring most frequently due to liver failure. Cerebellar signs such as acute reversible cerebellar ataxia, dysarthria, gait instability, confusion, seizures, vertigo, dizziness, and lateralizing signs due to prolonged metronidazole use has been reported as a cause of encephalopathy [55][56][57]

Pain medications
PTSD patients prescribed with pain medications such as acetaminophen, oxycodone, hydrocodone, and gabapentin 65 have signi cantly higher PTSD symptom severity scores, with opiate analgesics use associated with the highest scores 66 . The rationale was associated with use of both opiate and non-opiate analgesics leads to the hypothesis suggesting the dysregulation of the opioid system in both PTSD and ASUD. While physical injury at the time of PTSD accounts for ongoing pain symptoms seen in these patients, there is a possibility that emotional and social impact of these traumas may further exacerbate ASUD risks amongst these patients.

Protective factors:
Oxybutynin Hyperhidrosis is a somatic disorder that involves excessive perspiration due to hyperfunctioning of sweat glands via muscarinic receptor stimulation. It is often triggered by external stimuli such as emotional, thermal, stress and physical situations 67 . Studies showed that overproduction of sweat is due to a dysfunction of the autonomic nervous system bers caused due to clinical manifestations such as patient's age, family history, symptom onset, co-morbidities, and medication use. Systemic disorders such as PTSD, alcoholism, traumatic brain injury, Parkinson's disease, and diabetes can impair thermoregulatory homeostasis of the body 68 . A recent study found that patients with hyperhidrosis had experienced improvements in anxiety, depression and sweating when treated with antimuscarinic agents such as oxybutynin 69 . Another case study showed patients experiencing methadone-induced hyperhidrosis when treated with oxybutynin had improved quality of life 70 .

Clindamycin
Based on the medical notes, we found that endocarditis, pneumonia, and osteomyelitis are the popular indications for clindamycin in these PTSD patients. Postoperative and posttraumatic infections of bones and joints are one of the most common complications in the eld of medicine. Osteomyelitis is a condition in which patients experience in ammation in the bone and bone-marrow. This could be either due to tuberculosis, syphilis, bacterial, fungal or parasitic (toxoplasma gondii) in origin 71 . There is a growing interest in epidemiology, where infections are implicated as a novel risk factor for the development of multiple mental disorders. The infection caused by neurotropic parasite Toxoplasma gondii (T. gondii) is transmitted to a host (e.g. rodent or human), via ingestion of tissue cysts in undercooked meat or oocytes in cat feces or contaminated soil, where it progresses to form focal brain lesions as seen in patients with acquired immune de ciency syndrome (AIDS) 70 . This can further lead to seizures, mental confusion, neurological impairment, ataxia, visual abnormalities, cranial nerve palsy, alcohol related dementia and psychomotor or behavioral alterations among patients with multiple etiologies 72 . A study examined the association between T. gondii infection, anxiety, PTSD, and depression among individuals in a population-based study. They found that seropositive individuals had more than twice the odds of reporting anxiety compared to seronegative individuals, suggesting a relationship between the immune response to T. gondii and other multiple anxiety and mood disorders 73 . A case study found, clindamycin treatment provided clinical improvement within 48 hours of treatment and resolved irregular brain lesions in the right basal ganglia within 3 weeks of treatment 74 . Another case study, proposed clindamycin to be used to improve cognitive function of AIDS patients with cerebral toxoplasmosis and alcohol abuse [75][76][77] . Thus, the use of clindamycin can be extended to patients with PTSD and ASUD.
Recent studies have suggested no major improvement in community acquired streptococcus pneumoniae meningitis. The longterm neurological sequelae coupled with its high mortality impact overall quality of life. This can be attributed to: systemic in ammatory response of the host leading to leucocyte extravasation into the subarachnoid space, brain oedema, secondary ischemia and vasculitis, stimulation of resident microglia in the central nervous system by bacterial compounds and nally interaction with the bacterial hemolysins on neurons 78-80 . A study done in a rabbit model found that clindamycin was found to pass the blood brain barrier and provided neuroprotection as opposed to other drugs in question. The proposed mechanism of action is attributed to reduced hydroxyl radical formation and lower concentrations of glutamate and glycerol in the interstitial uid of the hippocampal formation which nally leads to decreased neuronal injury in the dentate gyrus 81 . As mentioned,, the role of dentate gyrus in bipolar disorder 82 , schizophrenia 83 and PTSD 83 reiterates the possibility of clindamycin's protective effects.

Magnesium
Magnesium is a common dietary supplement that is taken in either as an oxide, citrate, carbonate, and sulfate forms 84 . They can be used to reduce addiction to different substances. Their ability to reduce the intensity of addiction to different substances is essentially attributed to its ability to produce a moderate stimulation of the brain reward system and its capacity to reduce the activity of glutamatergic substances, that are directly involved in compulsive use disorders 85 . Studies have shown that the drug targets the N-methyl D-aspartate (NMDA) receptors and reduces the intensity of dependence on different substances 86 . This in turn arrests reinforcement activity which is essential in compulsive drug taking behavior seen in these patients 87 .

Cetirizine
Allergic rhinitis is a relatively common disorder in the general population, as is asthma and also seen in patients with bipolar disorder 88 . They impact the patient's overall social, occupational, and personal functioning. Histamine causes smooth muscle contraction, increased secretion of mucus, increased vascular permeability leading to mucosal edema, and parasympathetic nerve stimulation in these patients. In case of asthma, histamine levels are elevated due to increased peripheral basophil and mast cell degranulation, which further causes bronchospasm and hyperactivity 89 . Cetirizine is a H 1 -speci c receptor antagonist that inhibits allergen-induced eosinophil chemotaxis. While, it may not have effects on serotonin, dopamine, α 1 , and muscarinic receptor. There is a possibility that H1R antagonists can cross the blood-brain barrier (BBB) and improve the quality of sleep amongst patients with major depressive disorder (MDD) 90 . Since, no study of cetirizine on the PTSD and ASUD cohort, it would be interesting to see its impact of ASUD risk among PTSD patients.

Montelukast
Montelukast, an anti-asthmatic drug is a leukotriene receptor antagonist (LTA) that counteracts in ammation and other detrimental pathways that are linked with other neurodegenerative diseases and allergic rhinitis may be protective or risky in nature 91 .The evidence for adverse neuropsychiatric outcomes associated with LTAs is inconclusive. Current controlled clinical trials have reported mild to infrequent adverse effects, which is indicative of montelukast being a safe and well tolerated medication. Montelukast improved learning and memory functions, followed by reduced neuroin ammation, restored blood brain barrier integrity and enhanced neurogenesis 92 . However, some case studies have reported positive dechallenge-rechallenge association, where in the adverse symptoms have resolved after treatment termination and returned after restarting the regimen [93][94][95] . Furthermore, since current evidence is not based on PTSD and ASUD adult population, it is essential to understand its impact on the brain circuitry amongst this subset.
Venlafaxine: Certain PTSD components are attributed to stress-induced increases in the noradrenergic activity; however, the pathophysiology of PTSD pertaining to the brain's noradrenergic pathway is underrecognized 96 . Venlafaxine is a serotonin norepinephrine reuptake inhibitor, that has demonstrated e cacy in patients with depression, generalized anxiety disorder, social anxiety disorder, and panic disorder 97 . Our ndings are consistent with several other randomized clinical trials for PTSD pharmacotherapy which include: sertraline 98 , paroxetine 99 , uoxetine 100 and venlafaxine 100 . However, a recent clinical practice guideline study suggested that patients with depression and cannabis use, should not be treated with venlafaxine 101 . But the need for more de nitive larger, multisite, randomized clinical trials is essential.
We also found that "bundled screening" for early detection and treatment of mental disorders, ASUDs, history of childhood obesity, childhood trauma and other unspeci ed conditions could be used as a precautionary protective factor because it can help in overall healthcare cost reduction, disseminate complications from co-occurring disorders, and overcome lack of adequate behavioral health infrastructure to provide appropriate diagnostic follow up. Other protective factors such as screening for malignant neoplasms of the cervix amongst PTSD, ASUD patients will help identify high risk patients early on. Our results are in line with a study that applied the Health Belief Model and trauma-informed frameworks to guide their analysis. They found that discomfort with pap screening was common amongst women experiencing PTSD, ASUD, homelessness and who had a history of sexual trauma such as interpersonal violence, incarceration, discrimination, and neglect 102 . Providers suggested an aggressive application of a trauma-informed approach where educating, counseling, and privacy may help address complex barriers among women experiencing PTSD, ASUD and other discomforts.

Effect of SDoH on PTSD for ASUD prediction
Negatively correlated:

Racial segregation
We found that patients who had values closer to -1 have higher incidences of ASUD risks as opposed to patients belonging to 1.
Our results are in line with a study that showed patients belonging to heavily segregated low income areas experiencing placebased health disparities which often arise due to a results of historical segregationist policies tend to have higher levels of in ammation that were attributed to higher incidences of mental disorders 103 .
Other SDoH parameters such as low neighborhood social-economic index (nSES index), females, younger patients, black majority zip codes, low normalized difference vegetation, low aridity, zip codes with higher number of USA citizens, USA born patients, number of households with limited English speaking capacity, number of widowed partners who are males and patients belonging to zipcodes with higher income segregation all were found to have an association with PTSD and ASUD both in our study and in mental health literature [104][105][106][107][108][109][110][111][112][113][114][115][116][117][118] and thus should be taken into account when reforming the health care system to respond to the challenge of health disparities.
Positively correlated to ASUD risk:

Transportation barrier
Transportation barriers due to lack of access to transportation leads to rescheduling con icts longer wait times and missed or delayed care 119 . These in turn promote poorer management of chronic illness and mental health outcomes 120 . We found that patients living in zip codes with lack of available transportation tend to experience higher incidences of ASUD risks as opposed to patients living in zip codes with access to transportation. Our study is in line with a study which showed 5.8 million individuals in 2017 delayed medical care because they did not have access to transportation. The study emphasized that transportation barriers increased between 2003-2009 with people of color, those living below the poverty threshold, Medicaid recipients, and people with disabilities had greater odds of reporting a transportation barrier 121 . Another study found, unwillingness to be in treatment, nancial/insurance, and transportation barriers to be the most common barriers to after care treatment for alcohol and substance use 122 .
Other SDoH parameters such as patients belonging to zip codes with households with same sex marriages, higher urban population and higher number of patients who are white through EMR information were found to exhibit elevated incidences of PTSD, suicide related events and ASUD [123][124][125][126][127] .
We also noticed that by adding neighborhood level of SDoH parameters, the performance of Deepbiomarker2 has a slightly improvement, and this might imply that some impact of those SDoH parameters have been captured by EMR data, such as medication use, health service access, etc. As shown in Table 4, routine general medical examination at a health care facility, screening for malignant neoplasms of cervix and screening for unspeci ed condition are protective factors with reduced ASUD risk.

New hypothesis on ASUD in PTSD
Most prior PTSD and ASUD studies share a common shortcoming of being rooted in the de facto assumption that ASUDs may emerge majorly due to biological factors. However, we would like to propose some unique hypotheses that based of our results and validated by literature. This may serve as important novel interdisciplinary indicators of mental health diseases. There exists a possible indirect link between gut microbiota dysbiosis, UTIs and Toxoplasmosis gondii 128 .Microbial dysbiosis is a major perpetrator of intestinal in ammation which promotes subsequent permeability of the gut barrier ultimately leading to distal consequences of Toxoplasmosis Gondii infection to permeate into the blood brain barrier 129 . This compromise may promote cognition and AUD risk in patients with PTSD, depression, epilepsy, suicidal ideation, GAD, and schizophrenia 130 . This health disparity can be potentially corrected by improved water puri cation for all. Serum albumin (SA) is closely related to oxidative stress and antioxidant capacity. It may exist in relatively high concentrations, in patients with liver disease and other neurodegeneration disorders and signi cantly lower in patients with cancer, critically ill patients and patients with neuropsychiatric disorders such as: schizophrenia and depression 131 . It is possible that albumin may aggravate oxidative stress by increasing the percentage of free radicals and oxidative damage products by entering the blood brain barrier. This in turn may promote in ammation while simultaneously decreases omega-3 polyunsaturated fatty acids, magnesium, thyroid hormones levels causing depression [132][133][134][135] . Another unconventional hypothesis that could be extrapolated and used for future studies is correlating allergies to PTSD and ASUD. A study showed immunoglobulin-E and WBCs to be associated with worsening of depressive scores in bipolar patients during high pollen seasons. Also, PTSD patients with nicotine dependence and chronic obstructive pulmonary disorder (COPD) had high RBCs 136 . Based on our results and literature, we show that activation of in ammatory mediators due to asthma and allergy rhinitis may be a potential biomarker for predicting ASUDs in PTSD, but further investigations are necessary.

Biomarkers for personalized treatment for PTSD to reduce the risk of ASUD
Using EMR and non-EMR data we examined multiple biomarkers with respect to patient medication history, diagnosis, lab tests and SDoH (both neighborhood and patient level) to propose novel therapies for PTSD patients with ASUDs. Our ndings provided compelling evidence that suggests apart from targeting risk and protective factors, developing prevention and intervention strategies, one must incorporate SDoH to ameliorate sources of psychological and psychosocial risk. This in turn would help to better predict ASUD risks in PTSD patients. Although literature focuses on using conventional SDoH parameters such as income, race, unemployment to be the major predicting factors for future ASUDs, inclusion of future SDoH-ASUD factors such as peer substance use, easy drug access, low access to ASUD help centers and other psychosocial protections should be considered to obtain a comprehensive view about PTSD and SDoH. Our tool is intuitive and could build novel approaches that would provide easy interpretation of these complex biomarkers that could be applied to other future SDoH parameters. Our study is superior because of its multi-faceted data driven approach, sample size, sequential effect consideration, convenience, low cost, a suitable choice of routine testing, inclusion of SDoH and improved practical applications in any clinical setting. Unlike current biomarker studies, we considered all EMR (both clinically and non-clinically applicable) biomarkers and non EMR (non-medical and nonbehavioral precursors of health) biomarkers in our study and ranked the most applicable biomarkers based on their normalized scores. Our goal is to address screening barriers, eliminate existing health disparities, provide follow-up treatment options, educate, and create support essentials to keep high risk patients linked to care.

Limitation of our study
Our study also has a few limitations: First, there could be inconsistencies of biochemical test results between patients due to enrollment bias and some lab tests might have low representation in our database. As such, the analysis might have limited power to detect the effects. Second, we used EMR data from January 2004 to October 2020, and in this period, there is a possibility of changes in treatment and number of lab tests amongst these patients. Therefore, we cannot make causal interpretations based on this. However, these are limitations caused due to using observational EMR as a data source and can be resolved by investigations using a randomized clinical trial or prospective design. Third, we considered the effect of biomarkers along with diagnosis and medication use, however in our results comorbidities had a higher impact compared to biomarkers. This can be explained since diagnosis considers the past status of the patients while biomarkers take into consideration the recent status of PTSD and ASUDs. Fourth, there are SDoH data inconsistencies due to missing data. Fifth, due to data limitations, we only mapped neighborhood level SDoH parameters based on zip codes of patient information extracted from EMR data, we did not include individual level SDoH (e.g., income, home address, education level) due to patient protection and privacy issues. Sixth, we did not include formal diagnosis by screening data but used DSM-5 diagnosis codes and due to discrepancies associated with DSM-5 diagnostic coding, there is a chance of inappropriate use of coding as seen in real practice. Our future studies will seek to nd remedies to the above shortcomings by using multiple data sets, improving our algorithms, including more insightful biomarkers, and integrating other forms of metadata and other biological measures relevant to ASUD risk and performing causal pathway analyses.

Conclusion
Our improvised data driven deep learning approach aimed to identify and examine biomarkers for the assessing the risk of ASUDs among PTSD patients, which can be utilized for developing novel interdisciplinary hypothesis surrounding its etiology.
Extrapolating our results and current information, we found medications like oxybutynin, magnesium oxide, clindamycin, cetirizine and montelukast all to have a potential to reduce risk of ASUDs among PTSD patients. That, being said we also found multiple SDoH parameters both conventional and unconventional ones to have signi cant contributions in ASUD risk prediction. The high accuracy of DeepBiomarker2 showed ASUD risk to be particularly higher among a subset of patients who may possibly experience elevated psychosocial risk coupled with pain medication treatment. This might provide a novel insight to our understanding of PTSD with ASUD in a more holistic manner. While universal prevention programs may offer current bene ts, these ndings from DeepBiomarker2 offer potentially valuable and re ned information that can be used to design and develop personalized prevention and intervention programs that are designed to address the psychosocial needs and health disparities existing amongst these high-risk patients.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.