Understanding of Comorbidities Using Modeling Techniques on EHR

Comorbidities refer to the existence of numerous, co-occurring diseases in medicine. The course of one comorbidity is typically extremely dependent on the course of the other condition due to their co-occurrence, and therapies can have major spill-over effects. Despite the high occurrence of comorbidities among patients, there is no complete statistical framework for modelling comorbidity longitudinal dynamics. We propose a probabilistic approach for studying comorbidity dynamics in patients over time in this paper. This approach is a non-homogenous transition technique/mechanism using Hidden Markov Model called as coupled-HMM. Clinical research influenced the design of our coupled-HMM: (1) It accounts for different disease stages (acute, stable) in disease progression by providing clinically meaningful latent phases. (2) It simulates a relationship between the trajectories of comorbidities and the dynamics of capturing co-evolution. (3) The transition mechanism takes into account between-patient heterogeneity (e.g., risk factors, treatments). Based on 675 health trajectories, we assessed our proposed Coupled-HMM, which investigates the concomitant evolution of diabetes mellitus and chronic liver disease. We find that our Coupled-HMM provides a superior fit when compared to competing models without coupling. We also assess the spill-over impact, or the amount to which diabetic therapies are linked to a shift in chronic liver disease from an acute to a stable condition. Immediate application in treatment planning and clinical research becomes possible as a result of our approach in context of comorbidities.


Findings:
A large longitudinal dataset is used to test our HMM model. Specifically, we use electronic health records from 675 individuals collected over a 10 year period. The concomitant advancement of (A) diabetes and (B) chronic liver disease is documented in the electronic health records. Since it is one of the most common diseases in the world, patients at risk of developing diabetes mellitus type 2 were chosen and its presence is frequently associated with a chronic liver disease as a related/concomitant comorbidity [34,37,67]. We come up with three major conclusions. First, modelling disease relationships using means coupling yields a better model fit. Second, an acute condition in one of the two diseases is linked to a higher likelihood of transitioning to an acute state in the other. Third, a diabetic treatment has significant spillover effects: it is associated with a decrease in the risk of an acute disease state in diabetes as well as chronic liver disease. Overall, the findings illustrate the necessity of modelling comorbidity co-evolution empirically.
Contributions: The work mentioned in this research advances existing research on longitudinal disease dynamics modeling in following ways: (1) The approach mentioned in this research is the first statistical model designed particularly to capture longitudinal comorbidities' dynamics.
(2) We build a coupled hidden Markov model with a non-homogeneous transition mechanism for longitudinal data and provide a Bayesian estimation approach by deriving the likelihood.
(3) The spill-over impact, which assesses the indirect effect of treatments through disease interactions, is formalized and estimated. It quantifies empirically how treating disease A impacts the course (disease state) of disease B, allowing for direct interpretability. Thereby, new knowledge for clinical practice (e. g., treatment planning) and research is generated.

RELATED WORK 2.1 Comorbidities
The term "comorbidity" is used in medicine to describe diseases that occur together (see [70]). Comorbidities are sometimes divided into main and secondary disorders in the literature, but this classification is not consistently applicable [23,69]. As a result, we no longer distinguish between the two diseases and simply refer to them as disease A and disease B. Both diseases could co-occur by chance, but they could also have a causal relationship or be caused by correlated or shared risk factors [69]. This indicates that diseases interact with one another. In particular, treating one condition frequently results in an improved prognosis for co-occurring diseases. In clinical practice, comorbidities are common.
Patients in hospitals, in particular, frequently have several comorbidities (e.g., every second hospitalization contains eleven or more separate disease codes [71]). Comorbidities are also of direct relevance for clinical decision-making due to their prevalence. Because the existence of comorbidities has been shown to affect health outcomes, clinical practice stresses that their treatment be given specific consideration [30].
There are ample examples of comorbidities, in clinical practice. For example, depression and chronic pain, frequently co-occur, and their severity is frequently dependent [4,27]. Similarly, up to 40% of cancer patients experience psychological distress, such as anxiety or depression [81]. Furthermore, diabetes patients are commonly diagnosed with comorbidities [41], such as hypertension, cardiovascular disease, or chronic liver illnesses (e.g., nonalcoholic fatty liver disease) [34,37,67], as explored in this research.

Statistical Inferences using Comorbidity Data
Despite the importance of comorbidities in medicine, only a few research have taken their structure into account in their models. We provide a summary of key references above. While they use comorbidity data as well, their objective is different from the work described in this paper.
Statistical models have been employed in the literature to detect co-occurrence patterns among diseases [e.g., 32,40,73]. This allows for the identification of diseases that co-occur and should thus be considered comorbidities. These studies, on the other hand, address the topic of which comorbidities occur frequently together but do not predict their progression. Various models can be utilized for this. Comorbidities, for instance, can be shown as networks with nodes representing symptoms and edges representing potentially causal relationships [16]. The start of comorbidities is then indicated by shared or overlapping symptoms of different conditions, as determined by the network. This hypothesis has also been tested using dynamic structural equation models [9,31] or deep diffusion processes [56]. This answers the question of whether diseases co-occur, but not how they co-occur, i.e., without elucidating longitudinal disease interactions. Comorbidities and their temporal connections have also been studied using Bayesian networks [22,46,47]. However, these studies provide only a limited understanding of disease progression, such as underlying disease conditions.
Other studies [e.g., 11,27,59,66,83], including readmission risk [71], employ data on comorbidities as input when generating inferences about patient outcomes [e.g., 11,27,59,66,83]. This helps researchers to account for the fact that the co-occurrence of diseases frequently leads to nonlinear relationships with patient outcomes. When a patient has a number of co-occurring diseases, for example, patient survival may be reduced. Different indices, such as the Charlson comorbidity index [13], reflect this in clinical practice. However, in the aforementioned studies, comorbidities are treated as independent variables rather than dependent variables.

Hidden Markov Models
Hidden Markov models (HMMs) are a flexible statistical tool for modelling time series that can be used in a variety of situations. HMMs represent a set of latent states that follow a Markov chain and are used to generate observations [29]. There are widespread application of HMMs in modeling human behavior like hiring decisions [45].
Previously, HMMs were used to analyze data on patient health [e.g., 2,42,43,54,60,65]. HMMs are further altered in these works to reflect patient trajectories via follow-up specification. First, different disease states (i.e., phases) along the trajectory frequently correspond to latent states [see 14]. Second, in order to account for patient variability and other sources of unobserved heterogeneity, the transition mechanism incorporates covariates as well as random effects [2]. The aforementioned HMMs, on the other hand, describe the evolution of single-disease outcomes but not the emergence of comorbidities in multi-disease contexts.
HMMs come in a variety of forms [e.g., 64, pp. [76][77]. Coupled HMMs, for example, are a type of HMM that can be used to simulate the evolution of numerous interacting processes. The latent states of distinct but presumably linked processes can be conditionally dependent using these coupled HMMs [8]. A Cartesian product can be used to link the latent states of distinct sequences, forming the basis of a so-called Cartesian product coupled HMM. When the underlying processes are dependent, a Cartesian product coupled HMM outperforms conventional and multivariate HMMs in a simulation study [55]. Speech recognition [53], epileptic seizure detection [15], and sleep staging [58] are only a few of the applications of coupled HMMs. They've also been used to simulate the spread of infectious diseases [19,68]. [55] describes the development of the Cartesian product coupled HMM for modelling the combined advancement of lab measures in a single-disease environment from intensive care units. A coupled HMM was used in ecology to look at the dependency of voles from different woodland zones [61]. It was especially investigated whether the presence of one disease makes the spread of another disease more likely. A multi-chain Markov switching model, similar to a coupled HMM, was also used to see if there was any volatility spillover between financial markets [26].
Research gap: Although comorbidities are widely recognized as important in clinical practice, statistical frameworks for predicting their co-evolution across time have received little attention. Understanding the underlying disease patterns, on the other hand, could be extremely useful for therapy planning. To do this, we created a Coupled-HMM that is specifically designed to capture comorbidity dynamics. On the basis of longitudinal data, we simulate disease relationships among comorbidities in patients.

Problem Statement
The goal of this study is to simulate the co-evolution of comorbidities. We do this by analyzing a longitudinal dataset containing patient health data. We also take into account two disorders, referred to as disease A and sickness B. The evolution of both is clearly assumed to be characterized by disease interactions in this case. This encourages researchers to learn more about how the two diseases interact. We want to know if there are any potential spillover effects while treating disease A: how does treating disease A alter the status of disease B? For instance, how does Metformin as a diabetes treatment affect the chance of a cooccurring liver disease becoming acute? Formally, we want to see how one sickness affects the state of the other. To allow for such statistical conclusions, we design a model that ensures interpretability.
Our model specification must include additional qualities, such as (1) disease stages, (2) coupling, and (3) between-patient heterogeneity, in order to be clinically relevant.
(1) Disease states. Many diseases (especially chronic or otherwise long-term disorders) progress through several phases throughout time, with the condition being classified as "acute" or "stable" [14]. The phases are commonly referred to as "disease states" in medicine. These states can't be readily detected; instead, they have to be deduced from data [14]. To do so, we refer to previous research [e.g., 6,50,54,60,62, 65] and model disease states using latent states. To highlight their clinical significance, we later term the latent states "acute" and "stable". We build on the HMM-based framework [57] for simulating the history of diseases A and B, which is consistent with previous literature [e.g., 6,50,54,60,62,65]. Both are intuitively represented by two different HMMs.
(2) Coupling. We expect disease interactions among comorbidities and, as a result, we explicitly assume that the states of disease A and B are interdependent. When disease A transitions from a stable to an acute condition, for example, it should raise the likelihood of disease B moving from a stable to an acute state as well. As a result, we introduce a coupling between the HMMs from disease A and B later on.
(3) Between-patient heterogeneity. The dynamics of disease are known to differ greatly between patients [e.g., 52,75]. In medicine, risk factors (e.g., age, gender) and the existence of therapies (e.g., 36) are used to define this. As a result, our approach accounts for patient heterogeneity as follows. On the one hand, we include covariates at the patient level that represent risk factors and therapies. These are incorporated into the HMMs' transition mechanism, allowing risk factors (as well as therapies) to influence disease dynamics: this allows risk factors (as well as treatments) to enhance the likelihood of acute vs. stable disease states. 1 Unobserved heterogeneity, on the other hand, is taken into account. We follow previous research [7,17,20,33,38,76] and account for within-treatment effects in the treatment variable (see Sec. 4.2 for details).
We'll build a model called Coupled-HMM in the following section. The aforesaid properties are accommodated by our Coupled-HMM (1)-(3). We define a spill-over effect based on the Coupled-HMM, which measures the impact of a treatment for disease A on the status of disease B. Here, we follow clinical standards that prioritize disease states over measurements or symptoms in therapy planning [14,48,63]. As a result, the spill-over impact is defined in terms of disease states rather than measurements or symptoms. This enables us to draw statistical conclusions about how a treatment for disease A affects the chance of disease B becoming acute vs. stable.

Model specification
The five components of our Coupled-HMM are: (i) latent states, which represent disease states; (ii) observations, which represent measurements or symptoms; (iii) an emission component, which connects states and observations; (iv) a transition mechanism; and (v) a coupling. We use it to create an HMM-based framework that is also subject to coupling to describe the evolution of two diseases, A and B. The following is a list of the components (i)-(v): where the latent states relate to disease A and B, respectively. As a result, each disease has its own latent condition and might progress to a different stage. We assume two different states "1" and "2" for each disease, i.e., S (A) = S (B) = 1, 2. State 1 is referred to as "stable" whereas state 2 is referred to as "acute." This enables for a total of |S (A) × S (B) | distinct combinations. Based on the Cartesian product of the two independent/separate latent states, we now establish/define a global latent state. ) is defined in formal terms. This will be required later when we detail the coupling as part of component (v). The underlying mapping is listed in Table 1. For instance, a global state ( ) = (2, 2) means that both the  1 Clinical research determined that risk factors (and treatments) should be included in the transition mechanism rather than the emission mechanism. Variables in the emission only affect observable measures (e.g., pain resistance) because they moderate how acute vs. stable illness states relate to measurements; nevertheless, they have no effect on the course of disease development. Our numerical experiments later back up our decision.
diseases are in an acute state. As a short form, we use the notation with mean ( ) , ( ) and variance ( ) 2 , ( ) 2 . The data is modelled using an individual mean for each state and condition. As a result, an acute condition caused by disease A may have, on average, higher observable measurements or symptoms. The variation is disease-specific, but not state-specific. We utilized a normal distribution to represent the data in the above specification since the measurements (e.g., blood glucose levels) in our research follow a normal distribution.
(iv) Transition mechanism: The probability of transitioning from one latent disease state to another is defined by the transition mechanism. We assume the latent states follow a Markov process, as is characteristic of HMMs. However, we take into account other covariates (such as risk factors and treatments), resulting in a non-homogeneous transition mechanism that is tailored to individual patient profiles. Formally, we use a multinomial logit link function to model the probability of global states [2,49]. As a result, the transition probability ( → ) denotes the probability of patient i transitioning from global state j at time t to global state k at time t + 1. We specify the transition probability via Additional variables are used in the above logit: the variable in Eq.3 corresponds to the intercept of the transition from global state j to k. All else equal, it captures how likely a certain disease state transition is. The covariates describe the between-patient heterogeneity (e.g., risk factors, treatment). These represent an increased likelihood of acute states (above stable states) as a result of risk factors. Treatment covariates are further processed to account for within-variable effects in the treatment variable (see Sec. 4.2). To ensure identifiability, the coefficients for j = k are set to zero. As a result, recurring transitions (transitions in which the same latent illness state is maintained) serve as the reference category; see Table 6.
The initial probability of latent state is given by (i. e., the probability of latent states in time step =1). The initial probabilities were estimated independently of each patient and any risk factors. This approach follows previous works [77].
(v) Coupling: We simulate the interaction between the two diseases by linking their latent states. We build on the concept of a Cartesian product connected HMM [55] in particular. We use the Cartesian product of the underlying latent states from above for this. Recall that

Figure 1: Illustrative scheme of coupling inside the HMM model on comorbidities
The coupling is mathematically expressed/captured as part of the transition process by introducing the global states. The reason for this is that the transition mechanism is now dependent on the global state, which means that the likelihood of ( ) depends not only on the prior latent state , −1 ( ) but also on the other disease , −1 ( ) . When disease A and B are exchanged, the same analogy applies. As a result, the transition mechanism accommodates for probable co-movements from both disease states.
An example of the coupling among the latent states is shown below. Remember that the transition mechanism is influenced by other covariates as well as an intercept (i. e., risk factors). Now we'll look at how the transitions of one disease differ from those of other diseases. For example, given that disease A is in an acute condition, the likelihood of a shift from a stable to an acute state for disease B may be increased. For example, if the equivalent intercept α34 [for transition (2=acute,1=stable) → (2=acute,2=acute)] is significantly bigger than the intercept α12 [for transition (1=stable,1=stable) → (1=stable,2=acute)], a case like this could develop. This indicates that when disease A is in an acute condition, there's a higher chance that disease B will be in an acute state as well (rather than a stable state). As a result, comparing the estimated intercepts of such transitions can reveal information about the disease interactions that lie beneath. Hence, we publish the findings of our estimations for αjk later because this reflects the strength of the underlying coupling, i.e., the underlying disease interaction.

Spill-Over Effect
Medication and other treatments can also be included as covariates. If such a treatment is targeted especially at one comorbidity, it will have a direct impact on the disease's state transitions. A medicine developed to treat disease A, for example, could prevent the shift from state (1, 1) to state (2, 1) (i. e., prevent A from becoming acute). However, due to their connection, the same medicine may also have an influence on disease B's state transitions. As a result, the former will be referred to as direct treatment effects, while the latter will be referred to as spill-over effects.
The following is an example of a spill-over effect. Consider the state transitions (2,2) .It depicts a situation in which a therapy causes the condition to worsen. A change from an acute to a steady state (i. e., direct effect). In the ensuing time frame, disease B also transforms to a stable condition (i. e., as an additional indirect effect).
The above spill-over effect is formalized as follows. Let z be a disease-specific treatment that impacts transition probabilities via the covariates xit, as shown in Equation 3. Then, during treatment z, the probability of the aforementioned state changes of patient i is given by Analogously, the probability under no treatment denoted with z' is given by Thus, if the therapy z increases the probability ( ), it is specifically created for condition A, but it also has a favorable effect on disease B. Hence, we refer to as spill-over effect for (2,2) → (1,2) → (1,1). If ( ) − ( ′ ) > 0, there exists a positive spill-over effect. Without loss of generality, potential spill-over effects could also arise in different state transitions such as (1,1) → (2,1) → (2,2) and, hence, the above formulation can be calculated for them analogously.

Model Performance
Model fit: The expected log pointwise predictive density (elpd), which is a standard performance metric for Bayesian modelling [28], is used to assess the fit of our HMM model. Using Bayesian leave-one-out cross-validation and Pareto smoothed importance sampling, the elpd can be determined quickly [72]. In our scenario, we assess model performance at the patient level for the hold-out sample (rather than at observation level). The loo package, which is part of the statistical software R, was used for this. In addition, we present a widely applicable information criterion (WAIC).

Estimation Procedure
Sampling: In the coupling HMM, all model parameters are derived via completely Bayesian estimates [28], especially using Markov chain Monte Carlo (MCMC) sampling. We utilized the probabilistic language Stan [12] for this. We used the Hamiltonian Markov chain technique and the No-U-Turn sampler to get posterior estimates based on it [39]. To acquire posterior estimates, we performed four chains with 1500 iterations each (1500 further iterations were discarded as part of a warm-up). To do so, we derived the likelihood ℒ of the HMM model via (7) where ( ) ( ) and ( ) ( ) map/represent the global state and corresponding individual states of each disease (see Table 1). In order to develop an efficient computation scheme, we used the forward algorithm [84, pp. [36][37][38][39]. Furthermore, to avoid label switching and improve identifiability, the state-dependent means were arranged, i.e., The intercepts were chosen with slightly narrow priors in order to stabilise the model fit. Priors were set on the transformed parameters ̃, which emerged from a thinned QR decomposition of the centred covariates , rather than on the co-efficients .
Diagnostics: We examine the convergence of the Markov chains using typical Bayesian modelling guidelines [28]. We also run posterior predictive checks to see if we can simulate plausible new observations using the model parameters we've estimated [25]. Both are reported in Sec. B as part of the model diagnostics. We also tested if our Coupled-HMM could recover the needed values from completely simulated data. We later present the posterior mean, as well as the 50% and 90% credible intervals, unless otherwise noted (CrI).

EMPIRICAL SETUP 4.1 Data
We use two co-occurring diseases to assess our HMM approach: (A) diabetes mellitus type 2 and (B) chronic liver disease. Both are frequently seen as comorbidities [34,37,67]. Diabetes is one of the most common chronic diseases, affecting millions of individuals throughout the world. Diabetes also costs a lot of money when it comes to providing care [79]. Chronic liver disease, on the other hand, is a medical disorder that frequently occurs in conjunction with diabetes [34,37,67]. We used blood glucose levels as lab measurements for diabetes and the level of alanine aminotransferase, an enzyme found primarily in the liver [24,74], as lab measurements for chronic liver disease in our study. Following the treatment of elevated blood glucose levels with Metformin, we compute spill-over effects. Metformin is a first-line therapy for type 2 diabetes that decreases blood glucose levels overall. Metformin is also thought to be useful for patients with liver disease in the literature [3,10,34]. As a result, the question of how much the spill-over impact affects the disease state of chronic liver disease arises.
We used a longitudinal dataset of patients with prediabetes in our study. The dataset is made up of annual electronic health records [1,82] that contain lab measurements, therapies, and risk factors, among other things (see next section). There are 675 patients in total in the dataset. Each patient has a time series of recordings spanning four to ten years. There are 3253 observations in total.

Model Variables
The coupled HMM is estimated using the following variables ( Table 2).

The observations
( ) and ( ) it represent the glucose level and the alanine aminotransferase level, respectively, as measurements of diabetes and chronic liver disease. Due to their right-skewed distributions, we log-transform both variables. Between-patient heterogeneity is defined by the covariates . These are risk factors and treatment information. The body mass index (BMI), sex, age (at time step t =1), and time since prediabetes are all used in this study. We explicitly split temporal data into age and time-since-prediabetes in order to isolate age as a risk factor and possible trends. Metformin medication provides information on treatments. This is a binary variable that indicates whether Metformin (Glucomin or Glucophage) was prescribed. To determine Metformin's so-called within effect, the following approach was used. Metformin is generally expected to reduce or stabilise blood glucose levels in patients [3].
Patients who use Metformin, on the other hand, may have greater glucose levels than those who do not. As a result, the so-called within and between effects may be at odds. As a result, in order to infer about the inside impact, the Metformin variable is centred with the relevant patient specific mean. This strategy is in line with earlier research [7,17,20,33,38,76]. A lagged version of the Metformin variable was included in addition to the centred version. Table 7 contains summary statistics for the model variables. Blood glucose levels are higher than recommended by the World Health Organization in this population of people with (pre-) diabetes, as is to be expected. Similarly, population of individuals had alanine aminotransferase readings as high as 143.40 u/l, indicating the existence of liver disease [e.g., 34]. Overall, patients have a relatively high BMI, indicating a significant prevalence of obesity. This is to be expected, as a high BMI is a known diabetes risk factor. Metformin was only given to a small percentage of the patients. Both measurements, notably blood glucose and alanine aminotransferase levels, show a substantial association. The Pearson's correlation coefficient is 0.11 (p < 0.001). As a result, a higher blood glucose level coincides to a larger amount of alanine aminotransferase. This suggests that the two diseases may interact, but it does not allow for statistical inferences about the underlying longitudinal dynamics. Table 3 contains the findings of the performance comparison. On the basis of the expected pointwise predictive density, the models are evaluated (elpd). In a Bayesian context, the elpd evaluates predictive accuracy and, as a result, the overall fit of a model [28,72]. A higher elpd suggests that the fit is better. The proposed coupled HMM for comorbidities has the greatest elpd (2,148.78) out of all the models that converged. As a result, it obtains the best overall model fit.

Model Performance
We employ baseline models that can be used to model numerous time series at the same time. We use the baselines from [55] in particular. Alternative models are usually developed for single time series or don't have a method to account for illness interactions (see Sec. 2). As a result, their selection for the purpose of our paper is ruled out.
What should the number of latent states be? We compare our HMM to baselines with a simplified latent state space, that is, a coupled-HMM with disease A or B latent states are removed. Here, simplified A refers to a diabetic variant with a single latent state, while simplified B refers to a chronic liver disease variant with a single latent state. As a result, the diseases are believed to have only one state (despite the fact that this contradicts the Corbin-Straus trajectory paradigm [14,63]). The suggested coupled-HMM outperforms both the simplified A (elpd: 1,620.96) and simple B (elpd: 1,450.83) by a significant margin.
Taken together, this emphasizes the significance of accounting for different trajectory phases by integrating latent states (i. e., disease states). As a result, we give estimation results from the suggested coupled HMM with 22 states in the following sections, as this model is favored in the model comparison.
How should patient heterogeneity be represented? We run more comparisons to find an answer. To begin, we use a naive linked HMM [8]. For modelling between-patient heterogeneity, this model does not employ any covariates (i.e., neither in the emission component nor in the transition mechanism). Clearly, the model is inadequate. Second, we employ Pohle et al emission's coupled HMM [55]. In the emission, this model incorporates variables (whereas our model includes covariates in the transition). However, during MCMC sampling, the chains did not converge, preventing interpretability of the model coefficients. Moreover, the estimate of the elpd may be unreliable. Third, as a remedy, we tweaked Pohle et al.'s emission coupled HMM [55]. In order to avoid label-switching across latent states, we introduce an ordering of the estimated means in addition to ordered intercepts, set initial values for intercepts based on a k-means procedure [18], and follow common practice in Bayesian modelling by subjecting the covariates to a QR decomposition. The latter should remove posterior correlations among covariates, allowing the MCMC chains to converge more quickly [28]. To validate the accuracy of our implementation, we tested simulated data through the improved emission coupled HMM, where MCMC sampling converged and the model coefficients were correctly recovered. For 12 states, the model converged effectively (but not for the other state combinations). In comparison to our proposed model, the model exhibits a lower match overall. Overall, this suggests that variables should be considered in the transition mechanism (rather than the emission component).
What is the value of disease interaction modelling? Coupled HMMs with only one latent state cannot account for disease interactions by definition. As a result, we must now evaluate the associated benefit. We find that the model with 2 X 2 states works best for the naive coupled HMM, implying that capturing disease interactions is helpful. (Due to chain divergence, we are unable to remark on the emission coupled HMM.) We also discover that the model with 2 X 2 states is the best for the suggested coupled-HMM. As a result, both diabetes and chronic liver disease must be anticipated to have longitudinal illness interactions.

Estimation Results
Emission component: Table 4 reports the estimated parameters from the emission component. The stable condition of diabetes is defined by an average blood glucose level of only exp(4.55) =94.63 mg/dl. The acute condition, as expected, has a larger value, with an average of exp(4.70) =109.95 mg/dl. The credible intervals for both do not overlap, indicating that they are statistically significant.
The mean of the stable state in chronic liver disease is exp(2.86) =17.46 u/l. This is lower than the acute state's average of 30.88 u/l. Because the credible intervals do not overlap, the posterior means of the stable and acute states are statistically significant level.
Transition mechanism: The following are parameter estimates for state transition probability. We refrain from presenting all of the coefficients due to high number, instead we focus on how Metformin treatment affects the transition likelihood. As a result, the direct treatment effect of Metformin on the disease state can be quantified. We anticipate the following, based on clinical research. (1) Metformin is a type of diabetes medication. As a result, it should solely affect diabetes and not chronic liver disease. This is confirmed (no zero in any of the related CrI). (2) Because metformin is only effective for a brief period of time, we anticipate to detect treatment effects in the non-lagged variable. However, again the approach yields positive evidence. (3) We anticipate a positive treatment impact in the transition (acute, acute) phase (stable, acute). A considerable fraction of the probability mass is above zero in this case, indicating that Metformin is having a beneficial treatment impact. In this case, the posterior mean equates to a 3.29 log-odds increase, resulting in an exp(3.29) =26.84 odds ratio for a one unit rise in the centred Metformin variable, making the aforementioned transition more likely than remaining in the state (acute, acute).
Coupling: We now look into the relationship between the two comorbidities. For this, we present the transition mechanism's baseline probability (i.e., the intercept ). It calculates the transformation based on both illness A and B's latent stages. All covariates were set to their respective sample mean to estimate this. In Eq. 3, this equates to = ̅ .
The diabetic coupling is shown in Figure 2. The plot depicts the likelihood of disease A (diabetes) transitioning from a stable to an acute latent state, based on the latent states of both diseases. If both diseases are unrelated, the credible intervals for the chance of a conditional transition should encompass zero. Stronger connection is indicated by higher conditional transition probability values. We can see from Fig. 2 that an acute condition for chronic liver disease is linked to a higher risk of switching from stable to acute diabetes; see (1, 2) → (2, 2). Simply put, if chronic liver disease is acute, diabetes is likely to go from stable to acute as well. Overall, there is significant evidence that diabetes and chronic liver disease are linked.
When exchanging A and B, the understanding is analogous. Similar effects may be seen in Fig. 3 as well. Simply put, if diabetes is already acute, chronic liver disease is likely to become acute as well. However, the effect is less apparent, and the credible intervals of 50% and 90% overlap to a large extent. Nonetheless, these findings show that the two comorbidities do actually interact.

Estimated Spill-Over Effect
Lastly spill-over consequences are discussed. Through the co-evolution of the two comorbidities, we specifically analyze the influence of Metformin on liver disease. The state transitions (2, 2) → (1, 2) → (1, 1). It reflects the progression of diabetes from acute to stable, followed by the progression of liver disease from acute to stable. We specifically mention the following: (1) Metformin, i.e. it (z), has been prescribed. (2) The centered variable is set to 0.5 in this case, and the remaining variables are considered to be equal to the sample mean as well. (3) The spillover impact, which is equivalent to the absolute difference between the two. (4) There is a quotient. When comparing Metformin vs. no Metformin, it indicates how many times more frequent a spill-effect is.
Note: The probability of diabetes moving from a stable to an acute latent state are shown here, i.e. (1,□) and (2,□). As a result, the transition probability between stable chronic liver disease (top) and acute chronic liver disease (bottom) is compared (bottom). The posterior means (thick line) and the 50% credible interval (shaded region) are shown in the estimates. During estimation, all covariates are assigned to the sample mean.     Table 5 reports the corresponding quantiles of the transition probabilities of these two scenarios, as well as their difference.
In general, we discover the following. If no Metformin is provided, the situation of liver illness through the underlying coupling seldom changes. All estimated quantiles are close to 0 in this case. When Metformin is administered, there is a non-zero chance that the state of chronic liver disease will improve as well. We can see that the distribution for it (z), i.e., Metformin prescribed, has a very large tail by comparing the different quantiles. The spillover impact is minor or absent for many patients on the left-tail (e.g., due to non-adherence or simply because the treatment is ineffective). Based on our findings, it cannot be ruled out that Metformin treatment has unfavorable consequences for a co-occurring liver condition in the 5% quantile.
However, there is a considerable estimated spill-over effect for patients on the right-tail. For the 50 percent quantile, prescribing Metformin (as opposed to not prescribing) increases the likelihood of a better liver state by 2.898 times. The relative quotient even rises to 13.695 at the 75th percentile. Our methodology, as illustrated below, allows us to estimate the impact of diabetes on liver damage caused by Metformin use. In conclusion, we find evidence that Metformin stabilizes not only diabetes but, through the coupling, also a co-occurring chronic liver disease.
Note: Here shown are the transitions probabilities that chronic liver disease moves from a stable to an acute latent state, i. e., (□,1) → (□,2). Hence, this compares the transition probability when exiting a stable diabetes (top) and an acute diabetes (bottom). Estimations show the posterior means (thick line), as well as the 50 % (shaded area) credible interval. All covariates are set to the respective sample mean during estimation. (1) According to our findings, our Coupled-HMM beats other model specifications. It is especially superior to variations in which no illness interaction is modelled. We get significant gains in model performance as assessed by Bayesian modelling predictive accuracy (i.e., the anticipated pointwise predictive density, elpd). (2) We present evidence that both diabetes and chronic liver disease are interconnected. This is revealed by the fact that transitions from one disease to the next are statistically significant dependent on the state of the other disease. Thereby, we confirm the importance of coupling in our model formation.
(3) Our model returns empirical evidence characterizing the spill-over effect of a diabetes treatment on the state from chronic liver disease. Note: The quantiles of the probabilities of the transition (2,2) → (1,2) → (1,1) are reported. Metformin prescribed assumes that the centered Metformin variables is equal to 0.5 and the remaining covariates are equal to the sample mean. In contrast, no Metformin prescribed corresponds to a centred Metformin variable equal to 0 and the remaining covariates also set to the sample mean. Additionally, the quantiles of the difference of these transition probabilities are given.
The role of Metformin in patients with chronic liver disease has been controversial in the past [3,10,34], especially because a direct effect is not always supported by the molecular mechanism of action. We present empirical evidence that Metformin has a direct influence on the condition of diabetes, but no evidence that it has a direct effect on the state of chronic liver disease. Instead, we find an indirect effect through the coupling. Interpretability: In order to warrant interpretability, our model is based on a parsimonious specification. We draw on a central concept in medicine: diseases form a socalled "trajectory" that goes through several phases, including acute and stable disease states [14,63]. These disease states are crucial in clinical decision-making because different treatment approaches should be used based on whether a patient is in an acute or stable state. It suggests, in particular, that clinical decision-making should adjust therapy to the underlying illness state (rather than measurements or symptoms, which are just noisy observations of the state) [48]. As a result, previous research has used latent states to simulate health trajectories [e.g., 6,50,54,60,62,65]. In keeping with this, we create latent illness states and utilize them to link the states of various diseases, as well as characterize our spill-over effect in terms of latent states (rather than measures or symptoms).
Limitations: Our research, like any medical research, has limitations, and we propose recommendations for future research. To begin, we have a one-of-a-kind, large-scale collection of electronic health records from a Western health provider. Our approach could be replicated in future studies with different cohorts. Second, we used a large number of risk factors based on earlier research. Other variables, on the other hand, could reflect risk factors, and future study could examine how the model's performance changes when they are included. Third, our research is centered on a specific scenario, namely the interaction between diabetes and chronic liver disease while taking Metformin. Future study could confirm our hypothesis by looking at other comorbidities. By increasing the underlying latent state space, the model can be expanded to comorbidities involving three (or more) diseases if desired. Implications: Our model offers insights that can be used in clinical treatment and research. Spill-over effects can be used to assess to what extent treatments for disease A also affect/change the course (disease state) of disease B. As a result, when attempting to stabilize the course of disease B, clinical practitioners can choose a treatment that has a direct effect on B or, alternatively, a treatment plan for A that has an indirect influence on B, resulting in disease B being stabilized. Our approach provides statistical insights in this area, allowing for evidence-based treatment planning decisions.
Diabetes and chronic liver disease were used to test our approach. However, in order to ensure broad applicability, our model is constructed in a general manner. Our clinical coauthor envisions a variety of applications: Our approach could be used to investigate comorbidities that are poorly understood or have contradictory findings in the clinical literature. Given the vast number of comorbidities, this gives rich opportunities for follow-up research.

CONCLUSION
Patients with comorbidities, which are defined as the presence of numerous co-occurring diseases, are common, but full statistical frameworks for modelling the longitudinal dynamics of comorbidities are uncommon. Such models would provide new insights into how comorbidities co-evolve throughout time. This, in turn, would enable us to answer the question: how does treating disease A affect the status of disease B?
We developed a probabilistic model for longitudinal data analysis to solve this: a linked hidden Markov model with individualized, non-homogeneous transitions (coupled-HMM).
Our coupled-HMM is the first statistical model that is especially optimized to capture longitudinal dynamics among comorbidities, to the best of our knowledge. Our methodology enhances interpretability even further by providing clinically relevant insights. We presented a spill-over effect, which evaluates a treatment's indirect effect on disease states via the codevelopment of comorbidities. A longitudinal dataset of 675 patients was used to test our model.

DECLARATIONS
Ethics approval and consent to participate Not applicable, as the data is openly accessible in the form of MIMIC-III. Consent for publication Not applicable Availability of data and material All data generated or analysed during this study are included in this manuscript. The data consists of electronic health records from Medical Information Mart for Intensive Care (MIMIC III) database which is openly accessible.
[84] Competing interests The authors declare that they have no competing interests.

B MODEL DIAGNOSTICS
The following model diagnostics are performed in accordance with standard Bayesian modelling technique [28]. This is done to assure the MCMC algorithm's convergence and, as a result, precise estimates. First, we looked at the effective sample size eff , which showed that the number of MCMC samples was adequate. Second, for each model parameter, we calculated the Gelman-Rubin convergence diagnostiĉ. The ̂ is below the critical threshold of 1.1, indicating that the MCMC chain is convergent. Finally, we obtained trace plots (Fig.  4). The trace plots illustrate the MCMC draws of the emission mean for different latent states and illnesses (i.e., ( ) and ( ) , s ∈ {1, 2}). The trace plots indicate that the chains are nicely intermingled. The previous diagnostics, taken together, show that the MCMC algorithm has converged.
Note: Trace plot for 1500 iterations of each chain (warmup of MCMC algorithm omitted). We further assessed our model by performing posterior predictive checks in accordance with Bayesian modelling recommendations [25,28]. New observations were simulated using the Viterbi algorithm, which was based on the posterior draws of the model parameters as well as the most likely latent states. Figure 5 shows the results. We use it to compare the generated observations' credible intervals to the real observations. Our model appears to suit the data well, based on the results. We also tested if Coupled-HMM could recover specific parameters from simulated data. The underlying dynamics being successfully captured are represented with the below results that are confirmatory.