Sex-stratified multimorbidity trajectories in UK Biobank cohort identify triage rules for the risk of mortality and hospitalisation in secondary care

doi:10.21203/rs.3.rs-3909196/v1

Download PDF

Article

Sex-stratified multimorbidity trajectories in UK Biobank cohort identify triage rules for the risk of mortality and hospitalisation in secondary care

https://doi.org/10.21203/rs.3.rs-3909196/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Clinical presentation of diseases is complicated by multimorbidity. There is a pressing need to understand the effects of multimorbidity and where interventions should be targeted. We performed a data-driven analysis of whole-cohort UK Biobank hospital inpatient data in women and men and assembled ICD10 disease sequence trajectories. Age-relative 1-year mortality and hospitalisation rates were calculated post-trajectory using Accelerated Failure Time models with a 1:3 case-control ratio. We show that prior disease trajectories can stratify 1-year post-diagnosis mortality and hospitalisation outcomes for 63 common diseases in secondary care and highlight the impact of prior disease trajectories on mortality outcomes for respiratory failure, renal failure, nerve disorders, hypotension, influenza/pneumonia, and sepsis. Mortality and hospitalisation rates varied from 1.05 to 17594.44 and 2.85 to 582.99 times faster than age-matched controls, respectively. From this, we create triage rules that identify the highest risk multimorbid patients and highlight where intervention can have the greatest impact.

Health sciences/Risk factors

Health sciences/Medical research/Outcomes research

Health sciences/Health care/Prognosis

Multimorbidity

comorbidity

multiple long-term conditions

MLTC

mortality

hospitalisation

Multimorbidity, defined as the simultaneous diagnosis of two or more long term conditions, is common, but not well understood. A recent meta-analysis estimated global prevalence of multimorbidity at 37.2% (Chowdhury et al, 2023). Multimorbidity appears to be growing. Quadrupling in the prevalence of patients with five or more conditions from 2000 to 2014 has been reported (Tran et al., 2018). Incidence has been linked to socioeconomic deprivation with patients in the most deprived decile in Scotland showing 4.6% more morbidities than patients in the least deprived decile, across all ages (Barnett et al., 2012).

The clinical impact of multimorbidity is significant and it has been identified as an international health priority (Academy of Medical Sciences, 2018; WHO, 2016). Multi-morbid patients account for 58% of primary care patients and 78% of primary care consultations (Salisbury et al., 2011). Prescriptions accumulate with increasing multimorbidity, making multimorbidity also a driver of polypharmacy (Guthrie et al 2012; Hughes et al 2013). Increasing polypharmacy has been reported in a Scottish population with the number of patients receiving 5 or more and 10 or more drugs growing from 11.4% and 1.7% in 1995 to 21.1% and 5.8% in 2010 (Guthrie and Makubate, 2015). Reducing the burden of multimorbidity is an important step in achieving the Sustainable Development Goal of reducing premature mortality from non-communicable diseases by one-third by 2030 (Chowdhury et al, 2023).

The UK Biobank (UKBB) comprises data on 502,386 participants from England, Scotland and Wales aged between 40 and 69 at recruitment (Collins, 2012). Access to clinical and molecular data make it well suited to studies of disease coincidence. Several studies have been conducted to explore how diagnoses statistically cluster in patient records of the UKBB. Hierarchical clustering and association rule mining have been applied to 36 chronic conditions, identifying 3 clusters which highlight the importance of diabetes, hypertension and asthma in multimorbidity (Zemedikun et al., 2018). A clustering of over 400 diseases based on risk factors identified 24 clusters and found strong sex-dependent associations of disease risk with BMI (Webster et al., 2021). Another study identified 5 clusters in UKBB by applying Charlson’s broad disease classification and multiple correspondence analysis (Prasad et al., 2022). Pairwise comorbidity relationships have been investigated in the UKBB for patients with depression and have highlighted the strong relationship of anxiety and obesity with depression (Marx et al., 2017). However, little work has been done on how diagnoses dynamically accrue over time, as patient’s age and multimorbidity increases, and few links have previously been made with how a patient’s risk of outcomes such as mortality or hospitalisation change as diseases accrue. A study utilising the Danish National Patient Registry to evaluate the influence of disease history on sepsis mortality revealed that prior trajectories of multimorbidity, many of which start from alcohol abuse, diabetes and cardiovascular disease, increase risk of death (Beck et al., 2016). This approach has illustrated the potential for the use of temporal diagnosis trajectories to identify patient subgroups at high risk of mortality/hospitalisation. However, there has not been an attempt to compare mortality risk associated with multimorbidity patterns across the full spectrum of common diseases and identify diseases where the effect of multimorbidity on mortality is largest.

Here, we explore the dynamical diagnosis trajectories of participants from the UKBB and identify which trajectories are associated with severe risk of hospitalisation and death. We examine diagnosis order and timing and explore how the clustering of diagnoses develops over time. We look at individual diagnosis trajectories and identify those that are associated with greatest risk. In each case, we distinguish sexes and observe the similarities and differences. We arrive at a set of rules for participant diagnosis histories that triage for risk of hospitalisation and mortality. We hope that this work will yield insights that help clinical resources be delivered to patients with the greatest need, improving the efficiency and efficacy of clinical care within the growing population of multimorbid patients.

2.1 Characterisation of diagnosis trajectories across the UK Biobank

We obtained ICD10 coded diagnoses and dates of diagnoses from inpatient admissions records for UKBB participants and excluded ICD10 codes describing procedures/interactions with healthcare instead of disease conditions. The majority of patients in the UK biobank show multimorbidity, with 61.6% of 273,388 female and 60.8% of 229,125 male patients diagnosed with 2 or more diseases (Fig. 1A). As diagnoses accrue, the order and timing can vary between patients. We calculated the duration from the patient’s first diagnosis to each subsequent diagnosis. From the first diagnosis, 50% of diagnoses occur after 3.9 years for males and 4.3 years for females (Fig. 1B). Diagnoses accrue linearly until approximately 17 years (Fig. 1B), suggesting that diagnoses are initially uniformly distributed over time in both females and males. We also calculated the position of each diagnosis in the order of diagnoses. Median and quartiles were calculated (see Supplementary Fig. 1) and for each ICD10 block we compared the median position in diagnosis order to median time to diagnosis from first diagnosis for females (Fig. 1C) and males (Fig. 1D).

The relationship between position in a disease trajectory and the time-to-diagnosis is broadly linear. Blocks that appear above the trend line are part of trajectories containing fewer diagnoses per unit time than would be expected and are slower developing. Blocks that appear below the trend line are part of trajectories containing more diagnoses per unit time than would be expected and faster developing. The trend line rises more steeply for females than males, suggesting that females accrue morbidities more slowly than males (Fig. 1B).

We identified ICD10 blocks that sit outside 1 and 2 standard deviations of the trend line (see Supplementary Tables 1 and 2) and develop faster or slower than would be expected from the trend line. For females, the fastest morbidity accruing trajectories include Disorders of breast, Noninflammatory disorders of female genital tract, Inflammatory diseases of female pelvic regions and Other diseases of pleura. Blocks belonging to the slowest morbidity accruing trajectories include Mental and behavioural disorders due to psychoactive substance use. For males, the fastest morbidity accruing trajectories include Ischemic heart disease, Diseases of oesophagus, stomach and duodenum, Hernia, Urolithiasis, Infections of the skin and subcutaneous tissue, and Disorders of skin appendages. The slowest morbidity accruing trajectories include Intestinal infectious diseases, Obesity and other hyperalimentation, and Mental and behavioural disorders due to psychoactive substance use.

We defined 6 clusters from the ICD10 block diagnosis combinations of participants and observed how participants transition between clusters over time (Fig. 1E-F; Supplementary Fig. 2; Supplementary Tables 3–5). Based on the disease enrichment within clusters (Supplementary tables 4–5), clusters were labelled as ‘Complex, high morbidity’, ‘Low morbidity’, ‘Digestive’, ‘Eye’, ‘Cardiometabolic’, and ‘Musculoskeletal’. Interestingly, 30.6% of males at first presentation (T = 0) fit a multimorbidity cluster pattern other than ‘Low morbidity’ (Supplementary table 3). In contrast, the same figure in females was 16.5% with the greatest difference observed in the frequency of the Cardiometabolic cluster at presentation (8.9% of males versus 3.5% of females) (Fig. 1E-F; Supplementary table 3) indicating that multimorbidity, particularly cardiometabolic multimorbidity, is more advanced in males than females at first presentation in secondary care.

Mortality is lower for female participants than male participants and a greater proportion of female participants stayed in the low morbidity cluster across their diagnosis trajectories and a greater proportion of female participants transitioned into the Digestive or Musculoskeletal clusters. In comparison, a greater proportion of male participants transitioned into the Cardiometabolic cluster. Complex, high morbidity was rarely found at first presentation and took the longest of all patterns to develop with 47.4% of female and 44.5% of male cluster cases within ‘Complex, high morbidity’ developing this pattern between 11-21.5 and 10-21.5 years after first diagnosis in females and males respectively (Fig. 1E-F; Supplementary Table 3).

Whole-cohort clustering provides a low dimensional representation of the dynamics of the whole dataset but conflates and obscures the details of individual trajectories and this limits the translational value of the analysis. Consequently, we sought to explore individual participant diagnosis trajectories.

2.2 Exploring coincident diagnoses

We considered which diagnoses were most likely to appear on the same trajectory, introducing the Jaccard coefficient as a metric of diagnosis pair frequency. We identified the enriched diagnosis pairs and grouped ICD10 blocks by chapter. The list of ICD10 codes is included (Fig. 2A). We can see that coincidence in ICD10 chapters is not determined by many consistently coincident blocks, but instead by a small number of highly coincident blocks (Figs. 2B-C).

The Jaccard coefficient is a metric normalised against sample size. However, we see a general trend that diagnoses belonging to strongly coincident block pairs (in red) are highly prevalent within the UK Biobank population (Figs. 2B-C). This pattern is not universal. We see in the comorbidities of neoplasms (C00-D49) and respiratory conditions (J00-J99) that the most strongly coincident block pairs are not the most common and that the most common block pairs are not the most strongly coincident.

Coincidence rates of diagnosis pairs are highly correlated (r = 0.9) between males and females, though the Jaccard coefficient is generally greater in males (see Fig. 2D). A notable exception to this trend is a higher coincidence in females of Disorders of thyroid gland (E00-E07) with a number of diseases including Hypertensive diseases (I10-I15), Metabolic disorders (E70-E90), Arthrosis (M15-M19), Diseases of oesophagus, stomach and duodenum (K20-K31), and Other diseases of intestines (K55-K64). Conversely, males had greater coincidence in Hypertensive diseases (I10-I15) and Ischaemic heart diseases (I20-I25), Malignant neoplasms of digestive organs (C15-C26) and Malignant neoplasms of ill-defined, secondary and unspecified sites (C76-C80), Metabolic disorders (E70-E90), Ischaemic heart diseases (I20-I25) and Other forms of heart disease (I30-I52).

2.3 Mortality and hospitalisation outcomes of diagnosis trajectories

With the significant trajectories identified in females and males (See Supplementary Tables 6 and 7 and Supplementary Figs. 3–7), we used Accelerated Failure Time (AFT) models (Wei, 1992) to identify the diagnosis trajectories associated with 1-year post-diagnosis mortality and hospitalisation. Cases were matched in a 1:3 ratio with random individuals of the same age at which the final diagnosis in the trajectory (the end point) occurred.

We found 901 and 1328 trajectories associated with increased mortality for females and males, respectively (Supplementary Tables 8 and 9). Median mortality risk was 223 and 217.9 fold higher than for random age-matched participants, for females and males, respectively (Supplementary Fig. 8A-B).

We also found 1739 and 1727 trajectories associated with increased hospitalisation for females and males, respectively (Supplementary Tables 10 and 11). Median hospitalisation risk was 29.5 and 35.3 higher than random age-matched participants for females and males, respectively (Supplementary Fig. 8C-D).

In order to build rules of triage based on diagnosis trajectories, we visualised the AFT estimates grouped by endpoint (Fig. 3A-D). We introduced 6 categories of risk for both mortality and hospitalisation 1-year outcomes based on distribution (Supplementary Fig. 8A-D) quantiles: ‘Low risk’ (estimate < 20%), ‘Low-intermediate risk’ (40% > estimate ≥ 20%), ‘Intermediate risk’ (60% > estimate ≥ 40%), ‘Intermediate-high’ (80% > estimate ≥ 60%), ‘High’ (90% > estimate ≥ 80%), and ‘Very high’ (estimate ≥ 90%).

Substantial variation in outcomes was observed across prior histories of diagnosis. Age-relative fold increases in mortality rate across prior histories varied from 1.1 to 9803.9 in females and from 2.1 to 17594.4 in males (Fig. 3A-B). Diagnosis histories associated with Other bacterial diseases (A30-A49), Other and unspecified disorders of the circulatory system (I95-I99), Influenza and pneumonia (J09-J18), Other diseases of the respiratory system (J95-J99) and Renal failure (N17-N19) exhibited the greatest spread in risk for both females and males demonstrating the significance of prior history in mortality risk. Other disease endpoints had only a smaller number of specific histories exhibiting very high risk including Other forms of heart disease (I30-I52) where females with a history of both Hypertensive diseases (I10-I15) and Diseases of veins, lymphatic vessels and lymph nodes, not elsewhere classified (I80-I89) had a very high risk of mortality (Fig. 3A + Supplementary Table 8) and Other diseases of the digestive system (K90-K93) where males with a history of Nerve, nerve root and plexus disorders (G50-G59) had a very high risk of mortality (Fig. 3B + Supplementary Table 9). This suggests there are instances where high risk of 1-year mortality is highly conditional on the presence of a prior history of disease.

Hospitalisation outcomes were more variable across endpoints than mortality outcomes, indicating that triage of patients for future hospitalisation based on disease histories may have utility across a wider number of diagnosis histories than triage for future mortality (Fig. 3C-D). Age-relative fold increases in hospitalisation rate across prior histories varied from 4.2 to 459.1 in females and from 2.9 to 583.0 in males. Diagnosis endpoints with variable mortality rates based on prior history also generally exhibited variable hospitalisation rates, though the converse generally was not true (Fig. 3A-D). For example, 1-year hospitalisation rates for Intestinal infectious diseases (A00-A09) were significantly more affected by disease history than mortality rates in both females and males.

We provide reference tables of risk to support the triage of multimorbid patients. These tables are stored as tab-separated files and are available for females in Supplementary Table 18 and males in Supplementary table 19. We also present tables that identify women and men with the highest possible age-relative risk for mortality and/or hospitalisation using single disease histories (risk category 6: “Very high”) (Fig. 4B-C). We believe these latter tables have high translational value as they are sufficiently concise to not necessarily require a software implementation.

We identified several patient groups with very high risk outcomes using single disease histories. Females presenting with Cerebrovascular diseases (I60-I69) with a history of Chronic lower respiratory diseases (J40-J47) had the greatest risk of short term mortality that was 7575.8 times faster than individuals of the same age without this history (Fig. 4B), highlighting the importance of prior history in stroke mortality outcomes. Female hospitalisation rates were highest for females presenting with Renal tubulo-interstitial diseases (N10-N16) with a history of Inflammatory diseases of female pelvic organs (N70-N77). Hospitalisation occurred 244.2 times faster than individuals of the same age without this history.

Mortality rate was highest amongst single disease histories for male patients presenting with Other diseases of the digestive system (K90-K93) along with a history of Nerve, nerve root and plexus disorders (G50-G59), being 5929.1 times faster than individuals of the same age without this history (Fig. 4C). Male hospitalisation rates were highest when presenting with Other disorders of blood and blood-forming organs (D70-D77) and a history of Mental and behavioural disorders due to psychoactive substance use (F10-F19) which was 118.4 times faster than without the trajectory (Fig. 4C). Given that for all but one of the aforementioned highest risk mortality and hospitalisation trajectories, the presenting and historical diagnoses were not within the same ICD10 chapter, these results highlight the necessity of cross-specialty care for reducing both mortality and readmission rates.

Although the presence of a history of a single disease in a presenting patient is the most easily identifiable and interpretable form of disease history, it is clear that there is additional stratification of risk that can be gained from the consideration of combinations of disease histories. Both mortality and hospitalisation rates of trajectories increased with increasing trajectory length (Supplementary Fig. 9A-D) which shows that additional mortality and hospitalisation risk is identified upon consideration of larger combinations of disease histories. For instance, the highest mortality risk identified within females across all trajectories was within females presenting with Renal failure (N17-N19) along with a history of both Diseases of oesophagus, stomach and duodenum (K20-K31) and Inflammatory polyarthropathies (M05-M14) for which mortality was 9804 times faster than females of the same age (Fig. 3A + Supplementary Table 8). For males, the highest mortality risk observed across all trajectories was within males presenting with Influenza and pneumonia (J09-J18) along with a history of both Mental and behavioural disorders due to psychoactive substance use (F10-F19) and Aplastic and other anaemias (D60-D64) for which mortality was 17594 times faster than males of the same age (Fig. 3B + Supplementary Table 9).

The greatest hospitalisation risk for females across all trajectories was observed within females presenting with Intestinal infectious diseases (A00-A09) along with a history of Noninflammatory disorders of the female genital tract (N80-N98), Noninfective enteritis and colitis (K50-K52), and Mood [affective] disorders (F30-F39) for which hospitalisation rates were 459 times faster than females of the same age (Fig. 3C + Supplementary Table 10). For males, the greatest hospitalisation risk was observed within males presenting with Influenza and pneumonia (J09-J18) along with a history of Diseases of male genital organs (N40-N51), Malignant neoplasms of male genital organs (C60-C63), and Malignant neoplasms of ill-defined, secondary and unspecified sites (C76-C80) (Fig. 3D + Supplementary Table 11). Altogether these results show that consideration of greater combinations of disease histories has the potential to enhance the stratification of patients for mortality and hospitalisation risk.

2.4 Identifying high risk diagnoses and their divergence between females and males

We sought to identify the greatest observed divergence in risk between females and males. We compared the estimated mortality and hospitalisation risk of single disease histories of disease endpoints between females and males by subtracting scaled male estimates from scaled female estimates. Positive values indicate greater risk in females while negative values indicate greater risk in males. We visualised those single disease histories which exhibited absolute sex differences larger than 2 and 1 standard deviations for mortality and hospitalisation respectively. For single disease histories unique to either females or males, we visualised those histories which were within the top 10% of risk estimates (‘Very high’).

We found several strong interactions between multimorbid combinations and sex on mortality outcomes (Fig. 5A). The most female-skewed high mortality disease history was within females presenting with Cerebrovascular diseases (I60-I69) with a history of Other diseases of intestines (K55-K64) where the mortality rate was 5.52 standard deviations larger than males (Fig. 5A and Supplementary table 12). The largest male-skewed high mortality disease history was within males presenting with Other diseases of the respiratory system (J95-J99) with a history of Noninfective enteritis and colitis (K50-K52) where the mortality rate was 6 standard deviations larger than females.

Sex differences in hospitalisation rates of single disease histories of endpoints were smaller than those observed in mortality (Fig. 5B). The largest female-skewed high hospitalisation disease history was within females presenting with Other bacterial diseases (A30-A49) along with a history of Other soft tissue disorders (M70-M79) where the hospitalisation rate was 1.24 standard deviations larger than males (Fig. 5B and Supplementary Table 13). The largest male-skewed high hospitalisation disease history was within males presenting with Mood [affective] disorders (F30-F39) along with a history of Diseases of oral cavity, salivary glands and jaws (K00-K14) where the hospitalisation rate was 1.12 standard deviations larger than females (Fig. 5B and Supplementary Table 13).

We also observed that multiple disease histories unique to females or males were within the top 10% of mortality and hospitalisation estimates (Fig. 5C-F + Supplementary Tables 14–17) which highlights further divergence between females and males in risk.

As far as we are aware, this is the first hypothesis-free study of diagnosis trajectory outcomes to use data covering all common disease diagnoses within ICD10 as they present to clinicians. Few studies have investigated trajectories of multimorbidity and fewer still have focused on trajectories of time-ordered diagnosis sequences and their associated outcomes (Cezard et al, 2021). Whilst some studies have looked at differences in disease progression patterns in men and women, we are not aware of any which considered sex differences in outcomes and we observed several strong interaction effects between multimorbidity combinations and sex on mortality (Fig. 5A).

The predictive power of trajectories of time-ordered diagnoses sequences has been investigated for mortality of sepsis patients (Beck et al., 2016). Patients were identified in a Danish cohort using electronic healthcare records using the ICD10 code Other sepsis code (A41) and the authors found complex prior trajectories involving alcohol abuse, cardiovascular, diabetes, anaemias, and/or cancers to be associated with increased sepsis mortality. We corroborate this finding within the block Other bacterial diseases (A30-A49) and show that a greater range of prior diagnoses increase mortality and hospitalisation risk for females than males. We also extend this work beyond sepsis and show that prior diagnosis trajectories are also important for mortality outcomes of respiratory failure, renal failure, nerve disorders, hypotension, and influenza/pneumonia.

Triage of patients within secondary care is currently employed using various scoring systems such as Acute Physiology and Chronic Health Evaluation, Simplified Acute Physiology Score, or Mortality Probability Model (Vincent and Moreno, 2010). Although some of these scoring systems do use a select number of pre-existing conditions for assessing patients, the number utilised is small and does not make use of the wealth of disease history data that is presently available in patient Electronic Heathcare Records (EHRs) (Vincent and Moreno, 2010). Given the increasing digitalisation of healthcare, future scoring systems for triage in secondary care may benefit from integration with and consideration of whole EHR diagnosis data. Our tables of triage could provide the starting point for this integration.

It is interesting to observe that multimorbidity itself appears to be a driver of diagnosis and this may imply bias in the reporting within the UKBB. Figures 2A-B show a trend where the most coincident diagnosis pairs appear to be the most commonly reported. This was unexpected as the Jaccard coefficient is a metric that is normalised against sample size. This may be because the compound diagnosis burden incentivises individuals to report to care-providers in a way that a single diagnosis does not. We also note that the most coincident diagnosis combinations often occur within the same chapter. This may be because blocks within the same chapter affect related physiological systems and so one diagnosis is likely to lead to another. It may also be because of the greater clinical ease of making multiple diagnoses from within the same chapter, drawing upon the same clinical expertise.

In hospitalisation risk, relatively more trajectories achieved high or very high risk classification (Fig. 3A-B) than in mortality risk (Fig. 3C-D). This likely reflects the general increase in short-term healthcare utilisation that accompanies any diagnosis, especially when this is not the first diagnosis of a patient.

The effect of multimorbidity on renal failure outcomes has previously been investigated, though primarily through the effect of the number of co-occurring conditions within CKD patients (Sullivan et al, 2020). Recently, association between clusters of multimorbidity patterns within groups of CKD patients and adverse events has been investigated, finding adverse outcomes to be associated with patterns of multimorbidity including known patterns of diabetes and cardiovascular disease, and relatively unknown patterns of pain and depression (Sullivan et al, 2022). Our results support this, showing that there are multimorbidity patterns that should be identified at diagnosis of renal failure in order to triage secondary care females and males.

We performed a sex-stratified analysis of UK Biobank secondary care and unbiasedly built disease sequence trajectories, identifying 1784 and 1762 significant disease sequence trajectories for women and men, respectively. We identified significant associations between trajectories and 1-year post-trajectory age-relative rates of mortality and hospitalisation. We used these associations to create prior disease history rules for triaging multimorbid patients presenting any common disease in UK secondary care. We also identified differences between females and males in the presentation of high-risk trajectories and found multiple strong interactions between sex and multimorbid combinations on mortality outcomes. We believe that these identified associations could be used to improve current triage scoring systems and inform further research on high risk multimorbid interactions for the prevention of these high-risk trajectories and the allocation of resources in clinical care.

Acknowledgments

This research has been conducted using the UK Biobank Resource under Application Number 48433. It was jointly funded by the Department for the Economy, Northern Ireland and Novo Nordisk.

3.1 Calculating disease trajectories

We extracted the data field ‘41270: Diagnoses – ICD10’ for participants in the UK Biobank, containing diagnoses from inpatient admissions records coded in ICD10 and converted the ICD10 codes to groupings of ICD10 blocks and ICD10 chapters. We extracted the data field ‘41280: Date of first in-patient diagnosis - ICD10’ from the inpatient admissions records, containing the date of each ICD10 diagnosis. Each participant’s diagnostic history was described as a vector of ICD10 blocks ordered by date of diagnosis. Only the first recorded instance of each block was used, and duplicates were removed. We filtered the data using the field ’31: Sex’ and for ICD10 blocks with more than 1% prevalence within sexes. Blocks not describing diseases, but procedures/interactions with healthcare were also removed, corresponding to ICD10 blocks beginning with letters O, P, Q, R, S, T, U, V, W, X, Y, Z. On each trajectory, the order of diagnosis was determined and the period to each subsequent diagnosis was calculated from first diagnosis in days.

3.2 Identifying diagnoses belonging to rapidly accruing or slowly accruing disease trajectories

To identify diagnoses that were late-in-order/early-in-time or early-in-order/late-in-time, we calculated the median order of each diagnosis and median time to each diagnosis in days. After plotting median order against median time, we calculated the line of best fit, the orthonormal basis vectors parallel and perpendicular to the line of best fit. Projecting the data points on to the basis vectors yielded coordinates along these directions. We then calculated the standard deviation and identified the blocks lying outside one and two standard deviations of the mean as the late-in-order/early-in-time or early-in-order/late-in-time diagnoses.

3.3 Clustering multimorbidities

We clustered all participants with more than one ICD10 block diagnosis to explore the transitions between clusters over time. The period of record availability was divided into three periods in which diagnoses accumulated equally using the 33% and 66% quantile thresholds of the distribution of time of diagnosis in days from first diagnosis. Morbidity combinations were calculated at the start of each period and at the end of the last period.

Clustering was subsequently performed on the data obtained by merging female and male disease combinations and merging across all time points. The tabulated combined data was represented as a diagnosis matrix with rows representing individuals and columns representing diagnoses. Dimensionality reduction was first performed using Multiple Correspondence Analysis (MCA) from the ‘FactoMineR’ R package. The number of components to retain was determined with an elbow plot (Supplementary Fig. 2A). The resulting space of coordinates was used in k-means clustering using base R. Optimal cluster parameters were selected using elbow and Calinski-Harabasz (Caliński and Harabasz, 1974) plots (Supplementary Fig. 2B-C). Calinski-Harabasz scores were calculated using the ‘fpc’ R package. The movement of individuals between clusters across the four timepoints was visualised as a Sankey diagram using SankeyMATIC¹.

3.4 Identifying diagnoses sharing trajectories

Heatmaps describing the coincidence of chapters and blocks in disease trajectories were based on the Jaccard index which expresses the proportion of overlap between two groups. If \({N}_{A\cap B}\) denotes the number of participants with trajectories containing both diagnoses A and B and \({N}_{A\cup B}\) denotes the number of participants with trajectories containing either diagnosis A or diagnosis B, the Jaccard Index is defined to be \({N}_{A\cap B}/{N}_{A\cup B}\). Jaccard values were calculated using the ‘jaccard’ package in R. ICD10 chapters were clustered using complete hierarchical clustering on the mean Jaccard values across blocks and blocks were further hierarchically clustered within chapters. Clustering and associated heatmaps were generated with the ‘Heatmap()’ function of the ‘ComplexHeatmap’ package in R.

3.5 Building diagnosis trajectories

In order to build diagnosis trajectories, we adapted an approach developed elsewhere (Jensen et al., 2014). We identified pairs of diagnoses that were significantly coincident using a bootstrapping test from the ‘jaccard’ R package (Chung et al, 2019). For each diagnosis pair, A and B, we calculated the frequency of occurrences where the time to diagnosis of A is less than B, greater than B and equal to B. We subsequently performed double one-tailed binomial tests of whether A or B more frequently occurs earliest, including the cases where A and B are equal in the comparison group. From the disease pairs with both a significant Jaccard coefficient and a significant ordering, we assembled longer trajectories incrementally. Binomial tests were repeated for each longer trajectory, identifying new diagnoses that significantly occurred before or after the longer trajectory in the population of participants with the longer trajectory. Trajectory that applied to fewer than 20 individuals were removed. This process was repeated until no further trajectories were identified. P-values for all statistical tests were Bonferroni-Holm adjusted (Holm, 1979).

3.6 Calculating 1-year post-trajectory age-relative rates of mortality and hospitalisation of trajectories

We identified the subpopulations with each significant trajectory, correctly ordered and with at least 14 days between consecutive diagnoses. The age at which participants received their final diagnosis within trajectories was approximated from their age at assessment (data field ‘21003: - Age when attended assessment centre’) adjusted by the number of days between their assessment centre visit (obtained from data-field ’53: Date of attending assessment centre’) and the date at which the final disease was diagnosed plus 188 days (midpoint of a year).

For each member of the trajectory subpopulation, we identified three random controls from the whole UK biobank with at least one diagnosis, but without any of the diseases involved in the trajectory, and set the starting point for comparison as the estimated age of the case at which the final diagnosis in the trajectory occurred. With this 1:3 case-control ratio, we used Accelerated Failure Time (AFT) models (Wei, 1992) to calculate the fold increase in the mortality and hospitalisation rate of subpopulation members in the year following the final diagnosis of the trajectory compared to controls of the same age. Mortality models were censored for available follow up in the UK Biobank while hospitalisation models were censored for both available follow up and death. Hospitalisation events were derived from unfiltered hospital diagnoses data including ICD10 codes below 1% prevalence and codes not describing diseases.

3.7 Deriving triage rules

To identify trajectories associated with high 1-year post-diagnosis age-relative mortality and/or hospitalisation, we grouped trajectories by endpoint.. From the total range of trajectory mortality/hospitalisation risk values, we were able to calculate quantile thresholds to establish which trajectories of each endpoint belonged in which risk categories.

Triage tables (Fig. 4 and Supplementary Tables 18 and 19) allow us to interpret increased risks in a clinically useful manner. The endpoint of a trajectory represents a presenting diagnosis and a prior diagnoses to rule in or out whether a patient finds themselves on a high risk trajectory. Hence, the triage tables are grouped by endpoints and subgrouped by prior diagnoses. In Supplementary Tables 18 and 19, risk can be filtered for single prior diagnoses or combinations of prior diagnoses. For each endpoint and prior set of diagnoses, we state the risk category for which the estimated age-relative risk fell within .

3.8 Comparing female and male outcomes of trajectories

To identify disease histories which exhibited significant interactions with sex on future risk, we compared the estimated mortality and hospitalisation rate estimates of trajectories of length 2 which were identified in both females and males. We scaled the estimates without centring within sexes using the scale() function in base R. Estimates were thus expressed in standard deviations of the total distribution of estimates from trajectories of length 2 for each sex and allowed for direct comparison. To identify the largest divergence between sexes of these scaled estimates, we subtracted each male scaled estimate from its corresponding female scaled estimate and visualised the differences grouped by disease endpoint. For mortality, we visualised differences which were greater than 2 standard deviations in absolute terms. Hospitalisation differences were generally smaller and thus we used the lower threshold of 1 standard deviation for visualisation of differences.

To identify high risk disease histories that were unique to each sex, we visualised the estimates of single disease histories that were only identified in either females or males and which were within the top 10% of estimates (Very high risk).

3.9 Code/scripts

Scripts reproducing analysis are available upon request.

The Academy of Medical Sciences, (2018). ’Multimorbidity: a priority for global health research’. The Academy of Medical Sciences.
Barnett K, et al. (2012). ‘Epidemiology of multimorbidity and implications for health care, research, and medical education: a cross-sectional study’. The Lancet. 2012 Jul 7;380(9836):37-43. doi: 10.1016/S0140-6736(12)60240-2.
Beck, M., Jensen, A., Nielsen, A. et al. (2016). 'Diagnosis trajectories of prior multi-morbidity predict sepsis mortality'. Scientific Reports 6, 36624 (2016). https://doi.org/10.1038/srep36624
Caliński, T. and Harabasz, J. (1974) ‘A dendrite method for cluster analysis’, Communications in Statistics. Taylor & Francis, 3(1), pp. 1–27. doi: 10.1080/03610927408827101.
Cezard G, McHale CT, Sullivan F, Bowles JKF, Keenan K. (2021). 'Studying trajectories of multimorbidity: a systematic scoping review of longitudinal approaches and evidence'. BMJ Open. 2021 Nov 22;11(11):e048485. doi: 10.1136/bmjopen-2020-048485.
Chowdhury SR, Chandra Das D, Sunna TC, Beyene J, Hossain A. (2023). 'Global and regional prevalence of multimorbidity in the adult population in community settings: a systematic review and meta-analysis'. EClinicalMedicine. 2023 Feb 16;57:101860. doi: 10.1016/j.eclinm.2023.101860.
Chung, N., Miasojedow, B., Startek, M. et al. 'Jaccard/Tanimoto similarity test and estimation methods for biological presence-absence data'. BMC Bioinformatics 20 (Suppl 15), 644 (2019). doi: 10.1186/s12859-019-3118-5
Collins, R. (2012) ‘What makes UK Biobank special?’, The Lancet. Elsevier Ltd, 379(9822), pp. 1173–1174. doi: 10.1016/S0140-6736(12)60404-8.
Guthrie, B et al. (2012). 'Adapting clinical guidelines to take account of multimorbidity'. BMJ. 2012 Oct 4;345:e6341. doi: 10.1136/bmj.e6341.
Guthrie B, Makubate B, Hernandez-Santiago V, Dreischulte, T. 'The rising tide of polypharmacy and drug-drug interactions: population database analysis 1995-2010'. BMC Medicine. 2015 Apr 7;13:74. doi: 10.1186/s12916-015-0322-7.
Holm, S. (1979). 'A Simple Sequentially Rejective Multiple Test Procedure. Scandinavian Journal of Statistics', 6(2), 65–70. http://www.jstor.org/stable/4615733
Hughes, LD et al. (2013). 'Guidelines for people not for diseases: the challenges of applying UK clinical guidelines to people with multimorbidity'. Age and Ageing. 2013 Jan;42(1):62-9. doi: 10.1093/ageing/afs100.
Jensen, A., Moseley, P., Oprea, T. et al. (2014). 'Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients'. Nature Communications 5, 4022 (2014). doi: 10.1038/ncomms5022
Marx P, et al. (2017). 'Comorbidities in the diseasome are more apparent than real: What Bayesian filtering reveals about the comorbidities of depression'. PLoS Computational Biology. 2017 Jun 23;13(6):e1005487. doi: 10.1371/journal.pcbi.1005487.
Prasad B, Bjourson AJ, Shukla P. 'Data-driven patient stratification of UK Biobank cohort suggests five endotypes of multimorbidity'. Briefings in Bioinformatics. 2022;23(6):bbac410. doi:10.1093/bib/bbac410
Salisbury C et al. (2011). 'Epidemiology and impact of multimorbidity in primary care: a retrospective cohort study'. British Journal of General Practice. 2011 Jan;61(582):e12-21. doi: 10.3399/bjgp11X548929.
Sullivan, M.K., Carrero, JJ., Jani, B.D. et al (2022). 'The presence and impact of multimorbidity clusters on adverse outcomes across the spectrum of kidney function'. BMC Medicine 20, 420. https://doi.org/10.1186/s12916-022-02628-2
Sullivan MK, Rankin AJ, Jani BD, Mair FS, Mark PB (2020). 'Associations between multimorbidity and adverse clinical outcomes in patients with chronic kidney disease: a systematic review and meta-analysis'. BMJ Open. Jun 30;10(6):e038401. doi: 10.1136/bmjopen-2020-038401.
Tran J, et al. (2018). 'Patterns and temporal trends of comorbidity among adult patients with incident cardiovascular disease in the UK between 2000 and 2014: A population-based cohort study'. PLOS Medicine 15(3): e1002513. https://doi.org/10.1371/journal.pmed.1002513
Vincent, JL., Moreno, R. (2010). 'Clinical review: Scoring systems in the critically ill'. Critical Care 14, 207. https://doi.org/10.1186/cc8204
Webster, A. J. et al. (2021) ‘Characterisation, identification, clustering, and classification of disease’, Scientific reports. Nature Publishing Group UK, 11(1), p. 5405. doi: 10.1038/s41598-021-84860-z.
Wei, L.J. (1992), 'The accelerated failure time model: A useful alternative to the cox regression model in survival analysis'. Statistics in Medicine, 11: 1871-1879. https://doi.org/10.1002/sim.4780111409
Westergaard, D., Moseley, P., Sørup, F.K.H. et al. (2019). Population-wide analysis of differences in disease progression patterns in men and women. Nature Communications. 10, 666 (2019). https://doi.org/10.1038/s41467-019-08475-9
WHO. (2016). 'Multimorbidity: Technical Series on Safer Primary Care'. Geneva: World Health Organization; 2016. Licence: CC BY-NC-SA 3.0 IGO.
Zemedikun, D. T. et al. (2018) ‘Patterns of Multimorbidity in Middle-Aged and Older Adults: An Analysis of the UK Biobank Data’, Mayo Clinic Proceedings. Mayo Foundation for Medical Education and Research, 93(7), pp. 857–866. doi: 10.1016/j.mayocp.2018.02.012.

https://www.sankeymatic.com/

Yes there is potential Competing Interest. Dr Joanna Sharman and Dr Ramneek Gupta are both employed by Novo Nordisk. This work was part funded by Novo Nordisk.

ManuscriptSupplementaryTables.xlsx
Supplementary tables
ManuscriptSupplementaryFigures.pptx
Supplementary figures

Download PDF

Version 1

posted

You are reading this latest preprint version

Sex-stratified multimorbidity trajectories in UK Biobank cohort identify triage rules for the risk of mortality and hospitalisation in secondary care

Status:

Version 1

Abstract

Figures

Introduction

Results

2.1 Characterisation of diagnosis trajectories across the UK Biobank

2.2 Exploring coincident diagnoses

2.3 Mortality and hospitalisation outcomes of diagnosis trajectories

2.4 Identifying high risk diagnoses and their divergence between females and males

Discussion

Conclusions

Declarations

Online Methods

References

Footnotes

Additional Declarations

Supplementary Files

Status:

Version 1