Multiomics Longitudinal Modeling of Preeclamptic Pregnancies

Recommended Citation Maric, I., Contrepois, K., Moufarrej, M., Stelzer, I. A., Feyaerts, D., Han, X., Tang, A., Stanley, N., Wong, R. J., Traber, G. M., Ellenberger, M., Chang, A., Fallahzadeh, R., Nassar, H., Becker, M., Xenochristou, M., Espinosa, C., De Francesco, D., Ghaemi, M. S., Costello, E., Culos, A., Ling, X. B., Sylvester, K. G., Darmstadt, G. L., Winn, V. D., Shaw, G. M., Relman, D. A., Quake, S. R., Angst, M. S., Snyder, M. P., Stevenson, D. K., Gaudilliere, B., & Aghaeepour, N. (2021). Multiomics Longitudinal Modeling of Preeclamptic Pregnancies. , https://scholarlycommons.pacific.edu/dugoni-facarticles/729


INTRODUCTION
The World Health Organization estimates that more than 800 women worldwide die from pregnancy-related causes every day, with the highest rates of maternal mortality and morbidity in low-income countries 1 . One of the main causes is the hypertensive disorder of pregnancypreeclampsia -for which the only treatment is to deliver, often too early. Preeclampsia affects 3-5% of pregnancies in the United States and up to 8% of all pregnancies globally 1 , and accounts for 10-15% of maternal deaths 2 and 15-20% of preterm births 3 .
The pathophysiology of preeclampsia is complex and is thought to be caused in part by abnormal placentation as well as a women's predisposition through genetic and immunological factors 4 . It is believed that the abnormal placentation leads to a maternal inflammatory response 4 . Placental ischemia, oxidative stress and the presence of a maternal angiogenic imbalance are all characteristics of preeclampsia 5,6 , leading to endothelial and end-organ damage, and in some cases to stroke and even death.
Specific biological processes involved in the development of preeclampsia have remained understudied. Early prediction of preeclampsia has remained a clinical challenge, owing to incompletely understood causes, various risk factors and likely multiple pathogenic phenotypes of preeclampsia 7,8 . The recent availability of high-throughput omics (including the genome, transcriptome, proteome and metabolome) assays, where each can be performed on small sample volumes, has enabled joint analyses of the high-dimensional multidomain or multiomics data measured from the same biological sample 4,9,10 . The integrated analysis may capture complex dynamics involved in the pathogenesis of preeclampsia that could ultimately lead to novel therapeutic interventions. Furthermore, applying machine learning methods capable of extracting the most predictive features from high-dimensional multiomics data, could lead to more accurate predictive models and better early detection of women at risk to develop preeclampsia.
In this unique study, we performed a multiomics analysis of the transcriptome, proteome, metabolome, lipidome, and microbiome from a coordinated set of biospecimen collected longitudinally from pregnant women; we then integrated immune system mass spectrometry features that were available for a subset of the patients; and we combined the multiomics data with the available clinical/demographics data and performed joint analysis. Our goals were to: 1) build an integrated multiomics predictive model of preeclampsia; 2) compare prediction capabilities of different omics sets; 3) develop a simple and interpretable predictive model based on a small number of biomarkers that can be used for a diagnostic test; 4) identify a specific signature of preeclampsia; and 5) gain insights into pathways involved in the pathogenesis of preeclampsia.

Multiomics Characterization of Normal and Preeclamptic Pregnancy Over Gestation
Thirty-three and sixteen women were included in the discovery and validation cohorts, respectively (Fig. 1A). Maternal characteristics, demographics, and gestational ages at delivery are shown in Table S1. In the discovery cohort, 17 women developed preeclampsia and 16 had a pregnancy unaffected with preeclampsia. Among the preeclampsia patients, severe vs. mild preeclampsia was observed in 10 and 7 women, respectively; early-vs. late-onset preeclampsia out of which 12 had preeclampsia, was used to validate the metabolomics results. Blood samples were collected longitudinally at two or three time points during pregnancy (early, mid, and late -see Fig. 1). Plasma, urine, and vaginal swabs from each woman were used for measurements of cfRNA (plasma transcriptome), proteome (plasma), metabolome (plasma and urine), lipidome (plasma), and microbiome (vaginal swab). The number of measurements differed markedly among omics datasets, with transcriptome containing the highest number of measurements (Fig. S1A). In contrast, the number of principal components explaining 90% of variance, that quantifies the internal correlation of a dataset, exhibited smaller difference among datasets (Fig. S1B). Thus, although the amount of data varied several orders of magnitude among the dataset, their variability and thus the amount of information content was much more similar.

Machine Learning Modeling of Preeclampsia Over Gestation
Multivariate models of preeclampsia were built for each dataset using the Elastic Net (EN) algorithm (see Methods). Predictions from separate models were then integrated in a final model using stacked regression. The performance of all models was evaluated using the leave-A.
In order to explore multiomics interactions of analytes associated with preeclampsia, correlation network for controls and preeclampsia patients were generated ( Control PE clusters in the correlation network for the two groups of patients, potentially reflecting differential mechanisms in preeclampsia.

Prediction of Preeclampsia in Early Pregnancy
From a clinical perspective, early prediction of preeclampsia (i.e., within the first 16 weeks of gestation) is of critical importance as it would allow for early treatment of high-risk women (e.g., with low-dose aspirin 12 ). It would also enable closer monitoring of high-risk pregnancies and allow for the enrichment of preemptive interventional studies in women at risk for developing preeclampsia. 13 S4B) and genes (Fig. S4C) selected by EN, measured early in pregnancy, markedly differed between controls and preeclamptic women. Fig. 7 summarizes the most predictive plasma protein and urine metabolites both in early pregnancy and during entire pregnancy. Plasma proteins that were predictive in both models included LEP, SELL, CCL23, HIPK3, APCS, GPNMB, and IL-24, while some were mostly predictive in early pregnancy (FGF19, TIMP2), and others over entire gestation (VEGFA, SELE, SPARCL1, APOB, ROR1, and IL-22) (Fig. 7A). Urine metabolites that were predictive over entire gestation mostly differed from urine metabolites predictive in early pregnancy (Fig 7B) with the exception of adenine and nonadienoylcarnitine.

Single Cell Characterization of the Immune System
Preeclampsia is strongly associated with inflammation and aberrant maternal immune system adaptations during pregnancy 14 . To assess immunity --which is complementary to pathways covered by proteins and metabolites -and connect differential abundances of plasma proteins and urine metabolites in preeclamptic pregnancies to biological changes, immune-system wide mass cytometry measurements of single cells obtained in a subset of the same patient cohort were integrated with our plasma proteome and urine metabolome prediction models, as these two models had the best accuracy. Immune cell dynamics between 1 st and 2 nd trimester blood samples obtained from high-dimensional mass cytometry were previously used to develop a prediction model of preeclampsia 15 . We found that seven of the immune features reported by

Relationship Between Clinical Data and Omics Measurements
Clinical and demographics data contains maternal characteristics known to be associated with the risk of preeclampsia, e.g., preexisting hypertension, race, BMI, height, gravida. We combined ten variables that were available in this data set (Table S1)

Over the Course of Pregnancy
Changes over gestation of 1215 metabolic features among 8718 were significantly associated with preeclampsia outcome (FDR < 0.05 LME Model with Benjamini-Hochberg procedure).
Pathways enrichment analysis using these urine metabolites identified the following pathways (p<0.05) (Fig. 10A interleukin-1 receptor accessory protein (IL1RAP) and SELL -both known to play a role in the immune response 21 (Fig. S5). Enriched pathways grouped into ten biological processes, the most prevalent being positive regulation of cellular process (including biological, cellular, protein metabolic, immune system, and apoptotic processes among others) (Fig. S7). In the cfRNA set, 306 features were significantly associated with preeclampsia outcome over gestation (FDR < 0.05 LME Model with Benjamini-Hochberg procedure) resulting in several enriched pathways (Fig. S8). Top features included YOD1, BIRC2, CEP63, and LCP1. Top proteome, transcriptome and urine and plasma metabolome features formed 17 distinct communities (Fig.   S9).

Early Pregnancy
In early pregnancy, 497 out of 8718 urine metabolic features had changes significantly associated with preeclampsia when compared to controls (FDR<0.05, Wilcoxon signed-rank test with Benjamini-Hochberg procedure). Pathways enrichment analysis on these urine metabolites

Outlier analysis
We observed that a few patients in our cohort were consistently misclassified by our prediction algorithm (Fig. S10). A few control patients resembled preeclamptic patients on a molecular level in some of the top predictive features, across omics sets. And vice versa, there were some preeclamptic patients whose top molecular features more closely resembled those of controls.
Reexamination of the clinical charts revealed that one of the preeclampsia patients, while

DISCUSSION
Recent omics studies of preeclampsia typically included up to two omics datasets 10,23,24 . Ours is the first study to present the integrated analysis of six high-throughput omics datasets, containing more than 50,000 measurements per sample. This multiomics analysis enabled uniform comparison of omics sets, and revealed improved predictive ability for preeclampsia status relative to individual biological modalities, and indications of biological processes associated with the disease across multiple modalities.
One of the main strengths of our study is that, in our cohort, biological samples were not only collected longitudinally from each woman; but also, each individual sample was simultaneously measured for proteome, transcriptome, metabolome, lipidome, and vaginal swab for microbiome, thereby providing a unique opportunity to systematically study changes due to preeclampsia over gestation, and compare the capability of each of these omics sets to predict and characterize preeclampsia. These analyses involved more than 50,000 measurements, which were used in the prediction algorithm to agnostically identify the best biomarkers of preeclampsia.
Among our six datasets, plasma proteomic and urine metabolomic datasets had the highest prediction accuracies, both over gestation and early in pregnancy. A prediction model using only ten urine metabolites provided high accuracy over gestation (AUC=0.88, cross-validated and AUC = 0.87 validated on an independent cohort) and early in pregnancy (AUC=0.875, crossvalidated).
The EN prediction model with ten plasma proteins achieved AUC of 0.83 over gestation and of 0.88 in early pregnancy. Vascular endothelial growth factor A (VEGF-A) was among the most predictive proteins. Reduced levels of VEGF-A have previously been described in preeclamptic pregnancies due to increased levels of placental soluble fms-like tyrosine kinase-1 (sFLT-1) which validate our study [25][26][27] . Observed changes in several other proteins such as LEP, SELL, SELE, and ROR-1, were in agreement with existing literature 21,[28][29][30] . Other biomarkers including IL-24, IL-22, CCL23, and HIPK3 were also identified as highly predictive. In early pregnancy, FGF19 and TIMP2 were the most predictive. Univariate analysis also identified IL1RAP and IL6features known to play a role in immune response. Some of the other known biomarkers of preeclampsia -sFLT-1, PAPP-A, PIGF and ENG -were not significantly different between the controls and preeclamptic women neither over gestation, nor early in pregnancy (Fig. S11) and were, consequentially, not identified by our prediction model. This is possibly due to a small size of our cohort. We did not have PP13 (Galectin13) measurements, another known biomarker of preeclampsia.
Preeclampsia is accompanied by a dysregulated maternal immune adaptation to pregnancy, which is already detectable in early pregnancy 14,15 . This aberrant signature was previously identified in women who developed preeclampsia later on 15  was only observed in early pregnancy may explain why clinical studies note that low-dose aspirin initiation prior to 16 weeks is needed for significant prevention of preeclampsia 45 .
Tryptophan pathway was identified as highly associated with preeclampsia over gestation ( Fig   10A). Indoleamine-2,3-dioxygenase (IDO) is the first and rate limiting enzyme in this pathway producing kynurenine which then is converted into a number of bioactive metabolites. IDO is an intracellular enzyme produced by many cell types and while not secreted, impacts neighboring cells by tryptophan depletion and production of bioactive metabolites. The role of IDO in both normal and abnormal pregnancies, including preeclampsia, has been recently reviewed 46 . IDO expression increases with pregnancy and tryptophan depletion in the placenta inhibits T-cell-mediated rejection of semiallogeneic fetal tissues 47 . Kynurenine is an endogenous ligand that activates the aryl hydrocarbon receptor (AhR) 48 . This activation skews the differentiation of T cells to immunosuppressive T regulatory cells rather than proinflammatory Th17 cells after exposure to TGF-β 49,50 . Notably, kynurenic acid and xanthurenic acid, two metabolites of kynurenine, can also activate AhR signaling and may participate in immune regulation 51,52 . Therefore, deficiency of IDO impacts Treg development.
Notably, IDO KO mice, when pregnant, develop a preeclampsia-like phenotype 53 . The metabolic signal related to tryptophan metabolism in the model over gestation may be related to the immune signature of preeclampsia, highlighting the importance of the immune alteration occurring in the later stages of preeclampsia. Caffeine metabolism was also identified as highly associated with preeclampsia over gestation. This pathway has previously been associated to pregnancy progression 19 .
Models to predict preeclampsia early in pregnancy were previously based on maternal characteristics (demographics and medical history), followed by addition of uterine artery Doppler measurements and specific biomarkers [54][55][56][57][58][59] . Levels of angiogenic and/or antiangiogenic proteins (PlGF, sFlt-1, and endoglin), or their ratios, have been established as biomarkers with high prediction accuracy later in pregnancy 25,26,60 . More recently, analysis of omics datasets have been successfully applied to identify various biomarkers related to preeclampsia 10,23,61 . Most of these studies were based on measurements from one or at most two omics datasets, and often from samples taken only at one time point during pregnancy.
Here we show that clinical and demographics characteristics (i.e., weight, height, race) were complementary to omics measurements and improved prediction models Our study is limited by a small sample size and consideration of a cohort from a single hospital.
Inherently to machine learning approach, developing a prediction model depends on the underlying sample distribution of the data which is used. Distribution shift, caused by differences among various cohorts, can impact the performance of a machine learning algorithm 62 . In this study, the mass cytometry data was not included in the multiomics prediction model because this data was not available for 14 out of 33 patients. However, integrative analysis on the restricted set of common samples revealed important connections between our model and key immune features.
While encouraging, our results need to be validated on a larger, more diverse set of patients. If the results prove generalizable, our findings demonstrating high predictive power from a small number of urine metabolites and proteins could lead to a simple prediction test based on a small number of urine metabolites suitable for use both in developed and developing parts of the world.

Study Design
We performed a longitudinal, prospective study of a cohort of pregnant women receiving routine ante-and post-partum care at the Lucile Packard Children's Hospital at Stanford University, California, as previously described 15,63 . Women were eligible for the study if they were at least 18 years of age and were in their first trimester of pregnancy. The study was approved by the Institutional Review Board of Stanford University (#21956), and all participants signed an informed consent.
Peripheral blood samples (for mass cytometry analysis), plasma samples (for proteomic, cellfree transcriptomic (cfRNA), metabolomic, and lipidomic analyses), urine samples (for metabolomics analysis), and vaginal swabs (for microbiome analysis) were collected from each woman at two or three time points during pregnancy. Sample collection and their analyses were previously described 9 and are presented in the Supplemental Materials. The validation cohort included 16 women from the same hospital, for which longitudinal samples with only metabolomic analyses were available. Metabolomic analyses were performed following the same methodology as for the discovery cohort.

Definition of Preeclampsia
Preeclampsia was defined using the American College of Obstetrics and Gynecology classification 3 as follows: hypertension that develops after 20 weeks of gestation (systolic or diastolic blood pressure 140 mm Hg and/or 90 mm Hg, respectively, measured on at least two occasions, 4 hours to 1 week apart) and proteinuria (300 mg in a 24-hour urine collection, a protein/creatinine ratio of at least 0.3 (each measured as mg/dL) or if these were not readily available a random urine specimens containing 1+ protein by dipstick). In the absence of proteinuria preeclampsia was diagnosed if the presence of thrombocytopenia (platelet count less than 100,000/microliter), impaired liver function (elevated blood levels of liver transaminases to twice the normal concentration), the new development of renal insufficiency (elevated serum creatinine greater than 1.1 mg/dL), pulmonary edema, or new-onset cerebral or visual disturbances. Early-onset and late-onset preeclampsia were distinguished based on whether diagnosis was before or after 34 weeks of gestation.

Machine learning analyses
A two-level cross-validation approach was used to build predictive models to estimate the risk of preeclampsia. At the first level, prediction models were developed for each omics set using an elastic net (EN) model 64   At the second level, predictions of EN models were integrated using stacked regression [65][66][67] .
Specifically, in order to use EN models in the two-level approach, for each modality , = 1, … and data = ( 1 , … ) a leave-one-out EN model, denoted − ( ) was repeatedly fitted and evaluated at patient . At the second level, stacked regression with nonnegative coefficients 11 was used, so that the regression coefficients of the final model ( 1 , … , γ K ) were determined by: Note that the leave-one-out approach used in stacked regression has a purpose to form an unbiased linear combination of EN models 66 . In contrast to the original stacking approach in which different prediction models fit on the same data are stacked, here, we use the same model (EN) but fit to different omics to obtain different estimators which are then stacked. A stacked regression model can be regarded as a special case of a two-layer neural network; its special construction provides for an easier interpretation.
One of our main goals was to identify a small subset of specific biomarkers that can predict preeclampsia with high accuracy and could thereby be used as a simple diagnostic test. For these reasons, performance of the refitted EN model for each omics set was evaluated by treating the EN model as a model-selection procedure and performing a refitting step on the selected support, in the same cross-validation step 68 . It is known that 1 -penalization used in EN performs excessive shrinkage of the large coefficients of the prediction model 69 . Refitting can resolve this problem and obtain a model with a smaller number of features. Finally, to investigate a possible gain from integration of available clinical and demographics characteristics, a prediction model that takes omics (from a specific multiomics set), and clinical and demographics variables as an input to an EN model was fit and evaluated.
Performance was estimated using a leave-one-out cross-validation procedure, such that in each cross-validation step all measurements of one patient are left out from training set and are used for testing. In addition, urine metabolome prediction models, with and without clinical/demographics variables were validated on a separate validation cohort. The prediction accuracy of the model in terms of the area under receiver operating characteristics curve was evaluated. For the network visualization, a k-nearest neighbor graph (with = 2), was constructed between features. The network layout was computed with the LargeVis algorithm 70 . The analysis was performed using R software (version 3.6.1).

Pathway Enrichment Analysis
Univariate analysis was performed to identify features with significant associations between each feature and the pregnancy outcome, both in early pregnancy (Wilcoxon signed-rank test) and over gestation (Linear Mixed-effects Model). The Benjamini-Hochberg procedure was used to control the false discovery rate (FDR) 71 . Metabolome pathway enrichment analysis on identified metabolites was performed using MetaboAnalyst 72 . The hypergeometric test was used for overrepresentation analysis in MetaboAnalyst. Proteome pathway enrichment analysis was performed using GeneOntology 73,74 . Circular Gene Ontology (CirGO) software for visualizing two-level hierarchically structured gene ontology terms 75 , was used to visualize proteome and transcriptome pathway enrichment.

ACKNOWLEDGEMENTS:
This study was supported by the March of Dimes Prematurity Research Center at Stanford University School of Medicine, Stanford Maternal & Child Health Research Institute, the