How Routinely Assessed Biomarkers Can Be Utilized To Identify Individuals With A High Disease Burden: A Bioinformatics Approach Towards Predictive, Preventive And Personalised (3P) Medicine


 Prevalences of non-communicable diseases such as depression and a range of somatic diseases are continuously increasing requiring simple and inexpensive ways to identify high-risk individuals to target with predictive and preventive approaches. Using k-mean cluster analytics, in study 1, we identified biochemical clusters (based on C-reactive protein, interleukin-6, fibrinogen, cortisol, and creatinine) and examined their link to diseases. Analyses were conducted in a U.S. American sample (from Midlife in the United States study, N = 1,234) and validated in a Japanese sample (from Midlife in Japan study, N = 378). In study 2, we investigated the link of clusters to childhood maltreatment (CM). The three identified biochemical clusters included one cluster (with high inflammatory signaling and low cortisol and creatinine concentrations) indicating the highest disease burden. This high-risk cluster also reported the highest CM exposure. The current study demonstrates how biomarkers can be utilized to identify individuals with a high disease burden and thus, may help to target these high-risk individuals with tailored prevention/intervention, towards personalized medicine. Furthermore, our findings raise the question whether the found biochemical clusters have predictive character; as a tool to identify high-risk individuals enabling targeted prevention. The finding that CM was mostly prevalent in the high-risk cluster provides first hints that the clusters could indeed have predictive character and highlight CM as a central disease susceptibility factor and possibly as a leverage point for disease prevention/intervention.


Introduction
Prevalence and incidence of non-communicable diseases (NCD) are continuously increasing in numbers, causing a strong socio-economic as well as a medical burden to the healthcare systems. Economically speaking, the U.S. health care costs have steadily increased for four consecutive years, to reach 3.8 trillion U.S. dollars in 2019 [1,2]. NCD caused 90% of these costs as they result in massive long-term treatment costs and often present with comorbidities [1,2]. Thus, the prevention of NCD, and in this context the identi cation of at-risk individuals and sensitive biomarkers of disease risk, is more important than ever as it represents a leverage point to reduce the economic as well as the individual burden of diseases.
The two-consecutive study presented here demonstrates how routinely assessed biomarkers can be bioinformatically clustered and utilized to identify individuals with a high disease burden. Speci cally, in study 1, we employed a clustering approach based on the levels of C-reactive protein (CRP), interleukin-6 (IL-6), brinogen, cortisol, and creatinine in a U.S. cohort and validate the identi ed clusters in a Japanese cohort (for a study overview see Figure 1). We then linked these biochemical clusters to documented diseases including depression, heart disease, hypertension, stroke, peptic ulcer disease (PUD), and cancer.
In study 2, we tested the association of childhood maltreatment (CM), a well-established early-life risk factor for developing mental and somatic disorders, with diseases as well as with the identi ed biochemical clusters from study 1.

Introduction
According to the Global Burden of Disease study (2017), between 1990 and 2017, disability-adjusted life years (DALYs) due to NCD increased from 1.2 to 1.6 billion. With that, NCD caused more than 60% of DALYs worldwide [4]. But NCD not only cause individual suffering but also burden society as a whole, due to massive monetary and non-monetary costs [4,5]. Relying on interventions --no matter how effective they are --after individuals are already ill, is therefore a pivotal fallacy. Instead, current developments require simple and inexpensive ways to identify high-risk individuals to target with both preventive and interventive approaches. Furthermore, it is increasingly becoming clear, that many well-established risk factors (such as Body Mass Index (BMI) outside the normal range [6], genetic risk factors [7,8], etc.) supposedly helping to identify individuals at high risk for certain diseases are not independently from the individual environment and do not behave the same way across different individuals; highlighting the importance of personalized, tailored approaches in the context of preventive medicine. The presence of one particular risk factor might not have much predictive character for negative outcomes without being considered systemically/holistically, that is, in the context of other physiological, environmental, psychological, and biochemical parameters and processes [e.g., 6-8]. Despite these intricacies, at the same time, disease-predictive measures should be cost-e cient making it possible to implement them in the health care system.
One particular concept that has become well-established in the literature is the concept of allostatic load (referring to the cumulative burden of chronic stress and adverse life events) with its suggested allostatic load index (ALI) [9]. ALI is a cumulative multi-system risk score based on physiological and biochemical measures [10]. For each system, risk indices are calculated as the proportion of biomarkers for which an individual falls into prede ned high-risk quartiles.
As a systemic risk score, ALI is predictive for various outcomes, including all-cause mortality [11,12], while there are some critical limitations concerning its conceptualization. First, calculating a risk score as the sum of different system risk scores does not allow to account for intersystemic interactions and the possible predictive effect of these interactions. This gap is unfortunate as ALI includes parameters that indeed are not independent of each other, such as BMI and blood pressure [13]. Another concern refers to practicability and implementation of ALI into the health care system. While ALI considers parameters that can be assessed relatively simple, it is still likely that, for most individuals, parameters are only partially available, limiting the predictive power of ALI. Together, ALI is a profound concept but arti cially splits physiological processes that are woven into a holistic allostatic reaction, as acknowledged by the developers of ALI [14]. Furthermore, ALI lacks practicability, which is underlined by the fact that, to date, ALI has not been implemented in routine diagnostics.
Given the rising number of NCD, there is an urgent necessity to develop an approach that is practicable, cost-e cient, and at best, based on biomarkers that are assessed in clinical routine allowing to identify high-risk individuals to target with speci c preventive steps. The current study aimed to develop and validate an easily accessible measure that can realistically be implemented in routine diagnostics. Towards this aim and building on ALI, ve biomarkers were chosen as they cover broad physiological functionality; CRP, brinogen, and IL-6 are pro-in ammatory markers (i.e., positive association with in ammation), cortisol as the end product of the hypothalamus-pituitary-adrenal axis is an immunemodulatory mediator playing a crucial role in stress response, and creatinine is important for cellular energy metabolism [15][16][17][18][19]. Contrary to ALI, employing a clustering approach based on these biomarkers allows to account for linear and non-linear interactions among them and to link resulting clusters to depression and a range of somatic diseases. To examine the association between biochemical clusters and diseases, we focused on depression, heart disease, hypertension, stroke, PUD, and cancer as these represent globally highest prevalence, the fastest increase in numbers, and utmost comorbidities [4]. We rst clustered biochemical markers and related them to odds ratios (ORs) for diseases in a U.S. population sample and then repeated this process in a Japanese cohort. To ensure representativity, both samples were recruited via random-digit-dialing qualifying them for studies with results generalizable to the population. Towards our aim to ensure that the selected biomarkers and their clustering demonstrates robust applicability across different cultures and ethnicities [20], we chose one U.S. American and one Japanese sample to generate and validate the biochemical clusters.

Collection of Biosamples and the Assessment of Biochemical Markers
MIDUS. Blood samples were collected after overnight fasting for the assessment of CRP, IL-6, and brinogen, according to the manufacturer guidelines (Dade Behring Inc., Deer eld, IL for CRP and brinogen; R&D Systems, Minneapolis, Minnesota for IL-6) [20]. Plasma levels of CRP and brinogen were assayed using immunonephelometric assay; IL-6 was quantitatively assessed using Enzyme-Linked Immunosorbent Assay (ELISA). The laboratory inter-assay coe cient of variance was 5.7% for CRP, 13% for IL-6, 2.6% for brinogen, all below the 20% acceptable range [21].
To obtain a cumulative cortisol and creatinine measure 12-hour overnight urine samples were also collected between 7 PM and 7 AM. Enzymatic Colorimetric Assays and Liquid Chromatography-Tandem Mass Spectrometry were performed at the Mayo Medical Laboratory in Rochester, Minnesota. Data were excluded if participants had a renal failure or severe renal decline according to glomerular ltration rate [21].
MIDJA. CRP, IL-6, and brinogen were assessed analogically to MIDUS, while cortisol was assessed in saliva (three subsequent days, three times each day) and creatinine was assessed in blood. The 9 saliva measurements were averaged and used as a representative marker for cortisol concentrations [22]. We used blood levels of creatinine.

Diseases
Depression, heart disease, hypertension, stroke/Transient Ischemic Attack (TIA), PUD, and cancer were assessed via self-report. Participants were asked if they were diagnosed with any of these diseases at timepoint of study participation.

Statistical Analyses
First, the potential collinearity of the biomarker levels was assessed by calculating Pearson correlations among CRP, brinogen, IL-6, creatinine, and cortisol. After randomizing the order of participants [23], we performed a k-mean cluster analysis with these markers in the MIDUS sample using IBM SPSS Statistics 27. To ensure the stability of clusters, we repeated the clustering process in subsamples [23]: Speci cally, we conducted a median-split based on age and performed the clustering for each group separately to assess whether the clusters are age-dependent. For the same purpose, we repeated the clustering procedure after excluding participants with a BMI outside the health range (below 18 or above 35). The next step was to repeat biochemical clustering, that was performed for the whole MIDUS sample, in the MIDJA cohort. Finally, z-tests were used to compare ORs for diseases among clusters.

Preliminary Analyses
In both MIDUS and MIDJA samples, biomarkers were positively correlated (see SI Tables 4 and 5).

K-Mean Clustering
We used z-standardized biomarkers for k-mean clustering and evaluated the clustering results from k = 2 to 6 clusters for MIDUS. When k = 2, the patterns of clusters were not distinct enough; when k = 4 or above, some clusters were very small in size (i.e., smallest cluster portion: 8%). Through a combination of the parsimonious principle and engineering meaningful difference among clusters, k = 3 were selected for the subsequent analyses. Figure 2 illustrates the distributions of the three identi ed clusters with respect to the biochemical markers. We replicated all three clusters in the younger MIDUS cohort as well as clusters 1 and 2 in the older MIDUS cohort (SI Figures 7 and8). We further replicated all three clusters in the BMI-restricted MIDUS cohort (SI Figure 9). Then, the 3-cluster solution from MIDUS was validated in the MIDJA sample; the results are shown in Figure 3.
As depicted in Figures 2 and 3, cluster 1 is characterized by average levels in all biochemical measures. Cluster 2 is characterized by high and above-average levels oforCRP, IL-6 and brinogen. Cluster 3 is characterized by high and above-average levels for cortisol and creatinine but average levels for CRP, brinogen, and IL-6.
Associations between biochemical clusters and disease states MIDUS. Cluster 2 had the highest ORs for all considered diseases compared to the clusters 1 and 3 ( Figure 4, SI 10). MIDJA. Cluster 3 had the highest ORs for heart disease, hypertension, and PUD, cluster 2 had the highest ORs for stroke and cancer, and cluster 1 had the highest ORs for depression ( Figure 5).
To compare this cluster-based approach to a well-established clinical biomarker that is associated with a broad range of NCD, the number of diagnoses among individuals in cluster 2 was compared to the number of diagnoses among individuals with CRP concentrations above the clinical cut-off (>3mg/L) [24]. The disease burden in cluster 2 was higher with 1.6 diagnoses (SD=1.16; 0.9 diagnoses for individuals not assigned to cluster 2) compared to individuals above the CRP-cutoff with 1.2 diagnoses (SD=1.07; 0.9 diagnoses for individuals below the cutoff).

Discussion
Findings reveal three distinct and interculturally stable biochemical clusters observable in the general population. Cluster 1 is characterized by average levels of all biomarkers, cluster 2 by high in ammationrelated mediators coupled with low cortisol and creatinine, and cluster 3 by high levels of cortisol and creatinine. The stability of clusters is supported by their replication in the MIDJA sample as well as in the BMI-restricted, in the younger (below age median) and in the older MIDUS cohort (above age median; here only clusters 1 and 2 were replicated). However, we did not replicate cluster 3 in the older MIDUS cohort. One explanation could be that, due to an age-related increase in systemic in ammation [25], older individuals were not assigned to cluster 3, which is characterized by low in ammation.
Relating clusters to diseases, in MIDUS, cluster 2 showed the highest ORs for depression, heart disease, hypertension, stroke, and cancer ( Figure 4). These ndings are supported by previous evidence suggesting that CRP, IL-6, and brinogen are associated with depression [26,27], coronary heart disease [28-31], blood pressure [32], stroke [33][34][35], and cancer [36,37]. However, contrary to these previous studies, the clustering approach used in this study allowed to account for well-known collinearities between biomarkers and thus promotes a more holistic perspective. Speci cally, ndings build on previous studies suggesting a link between in ammation and diseases [25] by demonstrating that it might not be one speci c biomarker but a speci c biochemical pattern (i.e., high CRP, IL-6, brinogen coupled with low cortisol and creatinine) that is associated with diseases. This idea is supported by the observation that individuals in cluster 2, descriptively, indicate a higher disease burden than individuals above the clinically well-established CRP cutoff.
Interestingly, we found no differences in the ORs for PUD between clusters despite the role of in ammation in its pathology [38]. Future research may aim to further examine the role of in ammatory signaling in the pathology of PUD.
While the cluster with high levels of CRP, IL-6, and brinogen can be considered a high-risk cluster, cluster 3 with high levels of cortisol and creatinine but low in ammation may be considered a protective cluster in MIDUS. We found that ORs for most diseases were lower in cluster 3 compared to the high-risk cluster but also as compared to cluster 1 with average levels of all biomarkers. Concerning cancer, this difference became signi cant, potentially suggesting a protective character of this cluster. This would be in contrast to studies suggesting a link between hypercortisolism and disease outcomes [39,40]. However, the combination of low in ammation and high cortisol and creatinine as in cluster 3 might indicate the integrity of the glucocorticoid negative feedback system, protecting from negative health outcomes [41]. Longitudinal studies may examine the consequences of this speci c biochemical pattern. Towards this aim, we will examine MIDUS follow-up data (10 years after biomarker assessments) with respect to mortality outcomes.
In MIDJA, cluster 2 only seems to be a high-risk cluster for stroke and cancer while for other considered diseases, cluster 1 or cluster 3 indicate the highest burden. One aspect to consider here is that the MIDJA sample (N=378) and especially cluster 2 were very small in size (N=30). It is, therefore, possible that the present ndings lack reliability. However, different biochemical patterns may be associated with different outcomes in the Japanese compared to the U.S. American population because moderating mechanisms such as BMI, nutrition, and medication differ between populations [41]. This idea is supported by the nding that although in both MIDUS and MIDJA, approximately 8% of participants were assigned to cluster 2, the disease burden in MIDJA was much lower compared to MIDUS. This highlights the importance of individual aspects in disease susceptibility mentioned above and the role of interactions among different cultural, lifestyle and biochemical factors; while an assignment of a U.S. American individual to cluster 2 might be associated with a high disease burden, this might not be the case for a Japanese individual with the similar biochemical pro le. Future studies should aim to examine the found biochemical clusters in other cultural contexts promoting a better understanding of their associative and predictive character in multiple populations. From a preventive perspective, this may also help to further precise targeted prevention, that is, to better understand which biochemical pro le is associated with what disease susceptibility under what conditions. Limitations. Our work has several strengths such as the validation of the clusters in an independent, Japanese sample and the representative character of cohorts. Yet, the ndings face limitations. First, the present study is cross-sectional not allowing causal inferences. Second, the MIDJA sample size was relatively small. It is, therefore, possible that ORs lack reliability. Third, methodological inconsistencies (urine cortisol and creatinine levels in MIDUS, average saliva levels of cortisol and blood levels of creatinine in MIDJA) between the cohorts may have impacted the clustering process. Forth, diseases were assessed via self-report, which bears the risk of a report bias.
Conclusion. While the interactions among biomarkers make the distinction of their outcomes challenging, the design of the current study helps to gain a better understanding regarding the biochemical patterns are present in the general population and how these patterns contribute to different physiological states on a systemic scale. We identi ed and replicated three distinct biochemical signatures in two mid-life populations including one cluster with collinearly occurring elevated levels of CRP, brinogen, and IL-6 as well as low levels of cortisol and creatinine that indicated the highest prevalence of stroke and cancer.
Future longitudinal studies should aim to test the predictive character of the clusters found in this study, because, if clusters are indeed predictive in terms of risk evaluation, then they would represent a valuable clinical tool for both diagnostics and prevention of diseases. Speci cally, if high-risk individuals can be identi ed by the clustering approach presented here, then these individuals could be provided with personalized treatment options including psychotherapy, e.g., in cases where CM is prevalent, antiin ammatory drugs, or treatment supplements, e.g., nutrition and exercise plans.

Study 2 Introduction
Childhood maltreatment (CM) is an umbrella term that includes any act of emotional, physical, and sexual abuse as well as emotional and physical neglect experienced until the age of 18 [42]. CM can have a myriad of negative effects on survivors' mental and somatic health. The association between CM and in ammation is well established and may underlie the increased prevalence of somatic and mental disorders in CM-exposed individuals [16, [43][44][45]. Thus, CM, which is still an underestimated phenomenon in somatic/clinical settings, might be a disruptive factor in the context of both personalized medicine and targeted prevention, as it may amplify and interact with other disease susceptibility factors, resulting in a massive increase and expansion of an individual's disease risk and development. Therefore, in study 2, the association of CM with disease prevalence as well as with the assignment to the biochemical clusters was investigated.
We used the MIDUS sample for these analyses, as CM was not assessed in MIDJA. Based on previous literature, we expected to nd higher exposure of CM in clusters with high in ammation as compared to clusters with low in ammation [16, [43][44][45].

Assessment of Childhood Maltreatment
CM was assessed using the Childhood Trauma Questionnaire (CTQ; Bernstein & Fink, 1998). As a retrospective self-report measure with 28 items, the CTQ assesses ve types of CM: Emotional, physical, and sexual abuse, emotional, and physical neglect as well as the tendency to minimize CM [46].

Statistical Analyses
Cut-off values for moderate CM exposure were used to create dichotomous variables for each CTQ subscale (emotional abuse ≥13; physical abuse ≥10; sexual abuse ≥8; emotional neglect ≥15; and physical neglect ≥10) [46]. A composite variable was then computed indicating exposure to at least one category of moderate to severe abuse or neglect (CM+) vs. no or low exposure (CM−) [46]. Using the moderate cutoff variable, prevalences of CM were calculated for the whole sample. Next, we compared general disease burden as well as the prevalence of speci c diseases in individuals without and with CM experiences using c 2tests and t-tests. Then, a continuous total score of the CTQ was calculated by summing up the scores across all items. This continuous score was used to create a General Linear Model (GLM) with pairwise comparisons correcting for sex, age, BMI, physical activity, alcohol and smoking habits as well as for multiple testing (Bonferroni) comparing CM among clusters. To avoid issues resulting from heteroscedastic residual variances, we performed a bootstrapping (10,000 samples). Bootstrapping, which allows nding robust parameter estimates (i.e., independently from the homoscedasticity assumption of residual variances), is considered the Gold standard approach since our clusters are stable and since none of the covariates included in the GLM is involved in the clustering process [47].

Results
One-third (36.1%) reported at least moderate CM on at least one CTQ subscale. Individuals exposed to CM had a higher overall disease burden with 1.12 (SD=1.03) diagnoses on average compared to .85 (SD=.93) diagnoses in individuals without CM history (t(1192)=-4.549, p<.001). This difference was mainly driven by the higher prevalence of depression in CM-exposed individuals (36.2%) compared to individuals without CM (16.9%, c 2 (1)=61.72, p<.001).
CM exposure differed between biochemical clusters, with 45.1% of individuals in cluster 2 reporting at least moderate CM on at least one of the CTQ subscales (28.4% without CM), compared to 35.9% in cluster 1 (37.1% without CM) and 30.8% in cluster 3 (43.6% individuals without CM). GLMs using the continuous CM score indicated (SI Table 13) the highest CM exposure in cluster 2, followed by clusters 1 and 3 (all ps<.001).

Discussion
The CM prevalences found here are in line with meta-analytic ndings [48] as well as the result that CMexposed individuals have a higher disease burden compared to non-exposed individuals is supported by previous evidence [49][50][51]. Given the association of CM to in ammatory processes [16, [43][44][45], one mechanism possibly linking CM to diseases might be the biochemical clusters from study 1. As we found that cluster 2 had the highest CM exposure and also the highest disease prevalences, speci c biochemical pro les may underlie the association between CM and disease burden. If that is the case, clusters may represent a future leverage point for targeted prevention, enabling CM-exposed individuals to overcome the abusive experience and their stress burden-related health consequences through e.g. psychotherapy and support groups before it comes to the onset and manifestation in the form of severe disease. However, this idea faces the limitation that we could not statistically test this mediation of the biochemical clusters in the link between CM and disease prevalences as both the possible mediator (clusters) and the dependent variables (disease yes/no) were categorial. To get a deeper insight into this issue, our aim with the MIDUS follow-up data (10 years after biomarker assessments) is to examine whether CM-exposed individuals in cluster 2 indeed show more detrimental outcomes than CM-exposed individuals in the other two clusters.
Limitations. The present ndings should be considered in light of the limitation that we used retrospective self-reported measures of CM. Therefore, report and memory biases are possible. Although the value of self-reported measures of CM when investigating its correlates and outcomes has been emphasized [52], future studies should also aim to relate CM assessed via o cial reports and compare the relations to biochemical clusters and diseases. CM was not available in the Japanese cohort, therefore, the associative nature of CM with the identi ed clusters in the U.S. sample needs future replication in independent cohorts. As this study was cross-sectional, causal inferences cannot be drawn without subsequent research.
Conclusion. Findings complement existing literature indicating detrimental longer-term implications of CM on survivors' health. Results highlight the importance of identifying CM as early as possible before it manifests itself biologically and possibly increases disease vulnerability. We thus encourage professionals in preventive and medical care contexts to be attentive to reports of CM and to consider these in individual treatments; validated screening instruments are available in multiple languages (e.g., CTQ) [46].

Summary And Concluding Discussion
Our ndings suggest three distinct biochemical signatures that are replicable and interculturally stable. One of them is a high-risk cluster indicated by its high disease burden. Due to the cross-sectional character of this study, it might also be that the biochemical clusters are consequences of diseases, however, study 2 demonstrating a strong link between the high-risk cluster and CM provides rst hints that the clusters could be indeed pre-disease markers affecting the vulnerability to diseases. Future studies should aim to test the predictive character of clusters to evaluate their applicability as pre-disease markers. Further, integrating CM screenings in standard medical practice may be a promising way for identifying individuals at risk and for developing tailored prevention and intervention.

Expert Recommendations
The assessment of CRP, IL-6, brinogen, cortisol, and creatinine should be mandatory in all 3PM (i.e., personalized medicine, targeted prevention, and predictive diagnostics) disciplines to get a global insight into an individual's current health condition. High in ammatory signaling coupled with low compensation, that is, with low cortisol and creatinine, is a detrimental biochemical pro le associated with a high disease burden and should be taken as a reason for further examination (especially with respect to artery condition/stroke and cancer) and personalized treatments involving anti-in ammatory drugs, nutrient substitutions, and treatment supplements, e.g., nutrition and exercise plans. Furthermore, individuals with this biochemical pro le should be examined with a special focus on early life stress and especially CM. In cases where CM is prevalent, its role in the patient's individual condition pattern should be examined thoroughly and psychotherapy or other stress reducing interventions should be offered/employed. Future research should examine the predictive character of the found biochemical clusters with respect to long-term well-being, mental and physical health, as well as mortality. Ideally, these studies should examine different cultures promoting a better understanding of the generalizability and limitedness of the predictive power of the found biochemical clusters. Furthermore, this future research may suggest additional factors to be taken into account together with the biochemical clusters, helping to advance and precise disease prediction and, hence, to improve both targeted prevention and personalized interventions.
Disclosure: The authors report no nancial relationships with commercial interests and nothing to disclose.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.