Discover overlooked complications after preeclampsia using electronic health records

Background Preeclampsia (PE) is a severe pregnancy complication characterized by hypertension and end-organ damage such as proteinuria. PE poses a significant threat to women’s long-term health, including an increased risk of cardiovascular and renal diseases. Most previous studies have been hypothesis-based, potentially overlooking certain significant complications. This study conducts a comprehensive, non-hypothesis-based analysis of PE-complicated diagnoses after pregnancies using multiple large-scale electronic health records (EHR) datasets. Method From the University of Michigan (UM) Healthcare System, we collected 4,348 PE patients for the cases and 27,377 patients with pregnancies not complicated by PE or related conditions for the controls. We first conducted a non-hypothesis-based analysis to identify any long-term adverse health conditions associated with PE using logistic regression with adjustments to demographics, social history, and medical history. We confirmed the identified complications with UK Biobank data which contain 443 PE cases and 14,870 non-PE controls. We then conducted a survival analysis on complications that exhibited significance in more than 5 consecutive years post-PE. We further examined the potential racial disparities of identified complications between Caucasian and African American patients. Findings Uncomplicated hypertension, complicated diabetes, congestive heart failure, renal failure, and obesity exhibited significantly increased risks whereas hypothyroidism showed decreased risks, in 5 consecutive years after PE in the UM discovery data. UK Biobank data confirmed the increased risks of uncomplicated hypertension, complicated diabetes, congestive heart failure, renal failure, and obesity. Further survival analysis using UM data indicated significantly increased risks in uncomplicated hypertension, complicated diabetes, congestive heart failure, renal failure, and obesity, and significantly decreased risks in hypothyroidism. There exist racial differences in the risks of developing hypertension and hypothyroidism after PE. PE protects against hypothyroidism in African American postpartum women but not Cacausians; it also increases the risks of uncomplicated hypertension but less severely in African American postpartum women as compared to Cacausians. Interpretation This study addresses the lack of a comprehensive examination of PE’s long-term effects utilizing large-scale EHR and advanced statistical methods. Our findings underscore the need for long-term monitoring and interventions for women with a history of PE, emphasizing the importance of personalized postpartum care. Notably, the racial disparities observed in the impact of PE on hypertension and hypothyroidism highlight the necessity of tailored aftercare based on race.


Introduction
Preeclampsia (PE) is a severe pregnancy complication that emerges after the 20th week of gestation and is characterized by hypertension and end-organ damage [1].Its prevalence in the United States is estimated at approximately 4%, with a global prevalence of 2-8% [3,4].PE profoundly impacts maternal and neonatal health, primarily through a hypertensive state that increases vascular resistance, potentially impairing blood ow to essential organs [2].This condition leads to endothelial dysfunction, which can cause extensive damage to the mother's kidneys, liver, and central nervous system [2].Additionally, PE adversely impacts placental perfusion, potentially resulting in fetal growth restrictions and preterm births [2].While the resolution of pregnancy often mitigates the acute symptoms of PE, its long-term effects on maternal health, manifesting as various chronic complications such as hypertension, cerebrovascular disease, diabetes, cardiovascular disease, and renal disease, can persist for years postpartum [8].
Despite the well-documented immediate consequences, there is a notable gap in a comprehensive examination of PE's long-term impacts.Previous studies, including cohort studies and meta-analyses, have indicated the increasing risks of chronic hypertension, cardiovascular, renal, and metabolic disorders post-PE [5-9, 11-13, 21-26].However, these studies adopted a hypothesis-based approach, focusing on con rming known associations and potentially overlooking important associations.Moreover, the majority of these studies utilized basic statistical models (e.g., t-tests, chi-square tests, linear regression) [5-8, 11-13, 21-26] despite the need for more robust analytical methods to capture crucial time-to-event information and other competing factors (e.g.death, loss of followup), especially when dealing with longitudinal data over a long period.Moreover, most of these studies did not adjust for signi cant confounders, especially for pre-existing conditions, or only adjusted for basic confounders such as age, race, and gestational age [5-7, 11, 13, 21-24].This left the causal relationship of PE and researched complications unclear.In addition, few studies have paid attention to the racial disparities in PE's longterm effects.As statistical methods evolve, a more robust and comprehensive approach utilizing modern analytical methods and large datasets is imperative to explore the long-term effects of PE and offer clinical insights.
With the widespread use of electronic medical systems, the Electronic Health Record (EHR) has become an important source of patient data for medical research.EHR provides comprehensive and timestamped patient information in large volumes, including demographics, diagnosis records, social history, laboratory results, etc.These features make it compatible with modeling quantitative outcomes, such as patient survival.Survival analysis enables the examination of time until the occurrence of speci c events [10].Compared with basic statistical models such as t-tests and linear regression, survival analysis offers advantages in terms of handling censorship, accommodating non-normally distributed data, and providing a more detailed understanding of disease trajectories.Incorporating survival models into the analysis offers a better understanding of PE's long-term consequences.
This study adopts a non-hypothesis-based exploration, leveraging EHR data to investigate connections between PE and subsequent health trajectories.Our goal is to rst identify any long-term complications signi cantly associated with PE and then comprehensively study their trajectories, as well as potential racial disparities.Through such analysis, we aim to enhance the clinical understanding of PE's long-term consequences and ultimately help care providers adopt preventive aftercare strategies.

Data Source
We obtained original EHR data for two patient cohorts: the discovery cohort and the con rmation cohort.The discovery data are from the University of Michigan (UM) Medicine Healthcare System, an academic health system in Michigan State.The University of Michigan Medical School's Institutional Review Board (IRB) granted data utilization approval under HUM#00168171.The EHR from the UM Medicine Healthcare System provides comprehensive features, including diagnoses, encounter information, demographics, medications, etc.The complete records for these features are available starting in 2006 [38].We obtained con rmation data from the UK Biobank, a long-term study with approximately 500,000 volunteers, providing similar features [31].All UK Biobank data for the con rmation cohort pertained to project 86494.These records are available starting in 2006.Our study initially utilized features including diseases diagnosed, age at diagnosis, medical history, pre-existing complications, race, and social history.
The discovery data comprise case patients having at least one PE diagnosis between 2003 and 2023, and the controls having at least one pregnancy during the same period but no diagnosis of PE-related diseases (PE, eclampsia, pre-existing hypertension complicating pregnancy, gestational edema, and maternal hypertension).Inclusion and exclusion criteria were based on the International Classi cation of Diseases (ICD)-9 and − 10 codes [14,15].We followed the same selection criteria for the con rmation data from the UK Biobank.Detailed case-control selection criteria based on ICD-9 and ICD-10 codes are listed in Supplementary Table 1.

EHR feature engineering
The diagnoses in EHR were initially recorded using a combination of ICD-9 and ICD-10 codes.We rst standardized all diagnoses to ICD-10 using the "touch" package [28] to ensure consistency.Subsequently, we classi ed diagnoses into 31 medical complications, such as uncomplicated hypertension, complicated diabetes, obesity, etc., according to the Elixhauser Comorbidity Index [16].The Elixhauser comorbidity index is a widely used method in healthcare to comprehensively assess and quantify the severity of multiple health conditions a patient might have [17].A complete list of matches between ICD-10 codes and Elixhauser Comorbidities is shown in Supplementary Table 2.
To investigate the disease risk change after PE, we calculated the cumulative risk of each disease at intervals of one year, two years, and up to ten years after PE/normal pregnancy.We measured the followup length of each patient, de ned as the time interval between their most recent record and their rst PE diagnoses (for cases) or normal pregnancy (for controls).Then, for each disease, we calculated its risk in N years after PE, by comparing the occurrence of disease within N years in patients with at least N year follow-up length (N = 1, 2, …10).
Pre-existing complications of PE are confounders to be adjusted for in this study.For each patient, preexisting complications are de ned as those presented at or before their rst PE (for cases) or normal pregnancy (for controls) diagnosis.We encoded the pre-existing complications as binary entries (1 for presence, and 0 for absence) following the Elixhauser Comorbidity Index, as well as additional features, including PE history, gestational diabetes history, and substance use (alcohol and smoking status).To enhance statistical power and reduce model complexity, we removed features with more than 20% missing values or a p-value > 0.10 in the initial univariable test as advised [41].We applied the feature selection process independently to each group.The complete list of nal features in each group is shown in Supplementary Table 3.

Non-hypothesis-based PE-induced complications discovery
In each group (N = 1, 2, … 10), we applied logistic regression to the complications that reached the effective sample size recommended for clinical logistic regression [18,39].We then introduced a binary variable to indicate the presence of complications after the rst PE (for cases) or normal pregnancy (for controls) diagnosis of each patient, as the response variable in logistic regression.A value of 1 means the patients had the complication after the rst PE (for cases) or normal pregnancy (for controls) diagnosis, and 0 otherwise.
Using the R package "glmnet" [18], we tted a logistic regression model for each complication in each group, and recorded the regression coe cient, standard error, and p-value for predictor "PE".We calculate the odds ratios (OR) and corresponding 95% con dence intervals (CI) based on the results.OR is a measure of the association between the presence of a particular condition and a speci c outcome, expressing the odds of the event occurring in the cases relative to those in the controls [36].We considered the complications that show signi cance in OR (p < 0.05) in more than 5 consecutive years to be signi cantly associated with PE.We further calculated Variance In ation Factors (VIF) using the package "qacReg" [35] to check for potential collinearity among independent variables.

Con rmation of complications after PE using UK Biobank data
To con rm the signi cant complications identi ed in the UM database, we constructed new logistic regressions on these complications in UK Biobank data following a similar process.A binary variable indicating the presence of complication after the patient's rst PE (for cases) or normal pregnancy (for controls) diagnosis serves as the response variable, while PE history, pre-existing complications, social history (smoking and alcohol-use status), and race serve as the regressors.We checked the OR and CI of the feature "PE" in each complication's model for con rmation.

Inspection of identi ed PE-induced complications and their racial disparities
We conducted survival analyses on the complications that are signi cantly associated with PE to further explore the disease trajectories.Patients who lost follow-up before the occurrence of events of interest were marked as censored.We built a Cox proportional-hazards (Cox-PH) survival model [19] for each signi cant complication.To further account for unrelated deaths' effect on patient statistics, we adjusted the Cox-PH models for competing risks caused by unrelated deaths.The competing risks model was built using the "tidycmprsk" package [20].Features used in the models can be found in Supplementary Table 4.We calculated the Hazard ratios (HR) of the factor "PE" in each model.HR is a measure used in survival analysis to compare the event rates at any given time point between two populations [37], which indicates the complication risks caused by PE in our study.
To examine potential racial disparities in PE's effects, we further strati ed patients according to their races (Caucasian, African American, Asian, American Indian or Alaska Native, Native Hawaiian and Other Paci c Islanders, Other Races, Unknown).We conducted comparative analysis on Caucasians (71.3%) and African Americans (13.8%), and omitted other races due to their very low representations in the population (less than 15% combined).We applied a similar survival analysis on complication risks and calculated the HR of PE on each complication for Caucasians and African Americans.

Data Sharing Agreement
The EHR used in this study contains sensitive patient information and cannot be made public.Researchers meeting the criteria for using sensitive data may contact the Research Scienti c Facilitators at the University of Michigan Precision Health by emailing PHDataHelp@umich.edu or visiting https://research.medicine.umich.edu/our-units/data-oce-clinical-translational-research/data-access for more information about requesting data access.

Study overview and patient characteristics
Using two large EHR datasets, we are the rst to discover and validate the overlooked post-PE complications following a non-hypothesis-based approach and rigorous confounder adjustment.Utilizing EHR from the UM Medicine Healthcare System, we incorporated logistic regression and survival analysis to comprehensively investigate the long-term effects of PE, identifying persistent complications and racial disparities.We subsequently validated the results using EHR data from the UK Biobank.The overall work ow of the study is shown in Fig. 1.
The discovery EHR dataset from UM Medicine Healthcare System includes 4,348 patients for the cases and 27,377 patients for the controls.The con rmation data from UK Biobank include 443 patients for the cases and 14954 patients for the controls.The overall patient characteristics are summarized in Table 1.In both discovery and con rmation data, we observed signi cant differences between the cases and controls in demographic and clinical factors.To account for the differences, we included these factors in the regression models as confounders.5.If the population of a particular year falls below a reasonable effective size, we exclude that year from the plot to avoid bias(see Method).We then calculated the ORs of PE for each Elixhauser category and their CIs from these logistic regression models (Fig. 2).Some complications exhibit statistical signi cance in more than 5 consecutive years after the rst PE (for cases) or normal pregnancy (for controls) diagnosis, during the 10-year period.These include uncomplicated hypertension (median OR = 8.85), renal failure (median OR = 1.84), complicated diabetes (median OR = 1.57), congestive heart failure (median OR = 1.47), obesity (median OR = 1.40), and hypothyroidism (median OR = 0.84).Notably, the ORs of PE for hypothyroidism are signi cantly lower than 1 in the rst 5 years after patients' rst PE (for cases) or normal pregnancy (for controls) diagnosis, but with an overall elevating trend.
We then con rmed the signi cance of the six signi cant complications above, using an external dataset (UK Biobank) to ensure the generalizability of our discoveries.We re tted a logistic regression model over the same six complications on the UK Biobank data (Fig. 3C).Renal failure (OR = 5.

Survival Analysis reveals the trajectories of identi ed complications PE
We next conducted survival analyses of all six signi cant complications identi ed from the UM Medicine Healthcare System, to gain holistic views (Fig. 3A).All targeted complications show signi cant differences between the survival curves of the cases and controls (Log-Rank test, p < 0.05).For each of the six complications, we regressed the time until the rst diagnosis of PE and confounders, such as age and pre-existing conditions, using the Cox-Proportional Hazard (Cox-PH) model.Diseases before developing the complication of interest were treated as competing risk events (see Methods).A complete list of the feature coe cients and signi cant levels is shown in Supplementary Table 4.We obtained the HRs CIs speci c to PE from the Cox-PH models and plotted them in Fig. 3B.Compared to women without a PE-related history, the HR of PE was 6.

The racial disparities related to complications after
Further, we strati ed the patients according to their race and compared the difference between Caucasians and African Americans, who account for 71.3% and 13.8% of the UM cohort.We applied similar processes to the earlier but strati ed on Caucasians and African Americans.The race-speci c survival curves for each identi ed complication are shown in Fig. 4A.Hypothyroidism shows signi cant differences between the survival curves of African Americans and Caucasians within the cases (Log-Rank test, p < 0.05), where African Americans show better survival than Caucasians.We also applied Cox-PH models to each complication individually, to assess their associations with PE, while adjusting for confounders and competing risks.The race-speci c HRs of PE are shown in Fig. 4B.After adjustment, we only observe signi cant racial disparities in hypothyroidism and uncomplicated hypertension.For hypothyroidism, PE's effect on decreased hypothyroidism risk only occurs in African American postpartum women (HR = 0.63, 95% CI: 0.43-0.91),but is not signi cant in Caucasian postpartum women (HR = 1.00, 95% CI: 0.88-1.13).For uncomplicated hypertension, Cacausians postpartum women (HR = 6.62, 95% CI: 6.00-7.30)have signi cantly higher HR of PE (p < 0.05) than African American postpartum women (HR = 4.12, 95% CI: 3.50-4.85).Other complications also exhibit racial differences between African American and Caucasian postpartum women; however, the disparities are not signi cant.

Discussion
This study employed a non-hypothesis-based approach and rigorous statistical analysis to comprehensively investigate and validate the overlooked long-term effects of PE for the rst time.Using EHR from the UM Medicine Healthcare System and UK Biobank database, we discovered overlooked complications of PE and their racial disparities, in addition to con rming previously known conditions.These ndings can encourage better management and interventions that bene t post-PE patient care.
The logistic regression models identi ed six complications from the UM Medicine Healthcare databaseuncomplicated hypertension, complicated diabetes, congestive heart failure, renal failure, obesity, and hypothyroidism-that exhibited sustained signi cance over 5 or more consecutive years following PE, as indicated by their ORs.The increased risks of uncomplicated hypertension, obesity, complicated diabetes, renal failure, and congestive heart failure due to PE were con rmed using UK Biobank data.We then conducted survival analyses on all six complications, further taking time-to-event factors into account and con rming the enduring impact of PE.All six complications showed signi cant increasing risks due to PE in the longitudinal analysis.The ndings underscore the need for long-term monitoring and interventions for patients who have a PE history.
This study on PE's long-term complications is by far the rst non-hypothesis-driven, comprehensive statistical analysis using multiple large-scale EHR datasets, with systematic adjustment for multiple confounding factors (demographics, social history, medical histories, etc.).While previous studies have recognized the increasing risks of hypertension [5,8,11,12,13], diabetes [8, 24,25], congestive heart failure [5,6,8,13,23], renal failure [8, 21,22], and obesity [8, 24,25] after PE, most of them focused a speci c complication.They did not necessarily consider sophisticated statistical modeling that handles confounders, leading to simplistic or even potentially biased results.One novel nding of our study is the discovery of the apparent protective effect of PE against subsequent hypothyroidism and the racial disparities in such effect.We observe an overall lowering risk of hypothyroidism due to PE.Interestingly, after stratifying patients into Caucasians and African Americans, we only observed PE's protective effect on hypothyroidism among African American women but not Caucasian women.One previous review study suggested that Cacausian patients were generally referred for consultation for hypothyroidism at a younger age compared to African American patients and that non-Cacausian patients have a higher probability of being underdiagnosed [27].Therefore, it remains to be con rmed whether the racial disparities in hypothyroidism after PE are primarily related to genetic disparities or bias in the timing of diagnosis between the two races.
Another novel nding of our study is the discovery of racial disparities in PE's effect on subsequent hypertension.While the racial disparities in general hypertension have been widely recognized [46], there is a lack of research on disparities in PE's inducing effect on subsequent hypertension.Among postpartum women without a PE history, we observe that African Americans have a signi cantly higher risk of hypertension than Caucasians, which is consistent with previous ndings [46].However, when taking into account PE's effects and adjusting for confounders, we observe that Caucasians have signi cantly higher HR of PE on hypertension than African Americans.This potentially indicates that Caucasians are more sensitive to PE's inducing effect on subsequent hypertension than African Americans.
Our study has several notable strengths compared to previous studies on the long-term effects of PE.Methodologically, we utilized a non-hypothesis-based approach to identify any association between complications and PE, allowing for a comprehensive investigation beyond hypothesis-driven approaches to allow the discovery of overlooked associations.We adjusted the analysis for pre-PE medical histories, minimizing the bias due to pre-existing conditions as many of them are indeed associated with PE.This addressed the pitfall in traditional case-control matching studies with only t-tests as statistical evidence [7,9,[11][12][13].In each step of our analysis, we conducted rigorous inference that ensured su cient sample size and statistical power.We adopted a discovery-con rmation research strategy from two different datasets, strengthening our ndings' generalizability and reliability.There are few previous studies on the racial disparities of PE's long-term adverse effects.Our study lls the research gap in this topic, revealing the association between race and PE's long-term effects.Overall, we extended the PE research to a novel dimension, encouraging future work from a racial perspective and promoting precision aftercare of PE, which is important given the known impact of systemic racism on maternal health outcomes [40].
While our study has provided valuable insights into the long-term effects of PE, it is important to acknowledge the limitations.One aspect pertains to the reliance on ICD codes for case-control identi cation.ICD coding practices can vary between institutions, which may introduce biases and inaccuracies to the original data.Furthermore, the complexity of maternal health, encompassing various genetic, environmental, and lifestyle factors, makes it challenging to delineate direct causal relationships between PE and long-term complications.Future studies incorporating causal inference or randomized controlled trials may offer more insights into the causal pathways between PE and subsequent health outcomes.Given the increasing recognized subtypes of preeclampsia, which exhibit different pathological processes, diagnosis time, symptoms, time to deliveries, as well as outcome, it may be necessary to stratify PE by subtype if the patient size is su ciently large in future studies [42][43][44].Lastly, our ndings are purely based on EHR data.Other types of data that perturb the molecular and pathological processes, such as genetics and genomics data [45], would be bene cial to deepen the mechanistic understanding of PE's long-term effects.

Figures
Figures

Figure 1 Project
Figure 1

Figure 2 OR
Figure 2

Table 1
Patient Characteristics for UM Data and UK Biobank Data.or not diagnosed with the disease during N = 1 ~ 10 years as the response, adjusting for other factors such as age, pre-existing conditions, and social history.All features used in this study, such as complications, social history, and social history are listed in Supplementary Table3.The population counts in each group (N = 1 ~ 10) are shown in Supplementary Table 31 rst investigated the change in all disease risks each year for 10 years after PE.We adopted the Elixhauser Comorbidity Index and categorized all ICD diagnosis codes into31Elixhauser disease categories (see Method).For each Elixhauser category, we applied logistic regression with PE as a predictor and whether