Subject recruitment and BMI evaluation
The ELGAN cohort was established to identify factors contributing to the risk of adverse neurocognitive outcomes among infants born extremely preterm (pregnancy ending with a live birth prior to completing 28 weeks of gestation). Study procedures were approved by the Institutional Review Board at each of the participating institutions, and methods of subject recruitment and sample collection have been described elsewhere (38–40). In short, from 2002 to 2004, women giving birth at one of the participating sites prior to completing 28 weeks of gestation were asked to enroll in the ELGAN study. Informed consent was obtained from all mothers, and all methods were performed in accordance with the Declaration of Helsinki. For mothers younger than 18, informed consent was obtained from a parent and/or legal guardian. In total, 1506 infants and 1249 mothers enrolled in the ELGAN study, and a subsample of N=426 provided sufficient placental tissue for multiple -omics analyses. Maternal demographic and clinical characteristics were measured at or prior to the time of delivery. Each mother self-reported her height and pre-pregnancy weight prior to or after delivery when she was interviewed, and these data were used to calculate pre-pregnancy BMI (40).
Placental tissue collection
Delivered placentas were placed in a sterile examination basin and transported to a sampling room, with 82% percent of samples obtained within 1 hour of delivery (39). Further details on histologic analyses have been provided elsewhere (41,42). Briefly, the area sampled was at the midpoint of the longest distance between the cord insertion and the edge of the placental disk. Amniotic tissue was pulled away from the underlying chorion using a sterile technique in order to expose the chorion. A tissue sample was cut out of the base of the chorion, the fetal component of the placenta, after applying traction to the chorion and the underlying trophoblastic tissue. Tissue samples were subsequently placed into a sterile 2 mL cryo-vial that was immediately submerged in liquid nitrogen. Samples were shipped to the University of North Carolina at Chapel Hill and stored at −80 °C until further processing (39). For the present study, a total of 390 placentas were available for subsequent RNA analyses based on RNA quality and tissue abundance.
Placental RNA extraction and sequencing analyses
In processing placental samples, cryotubes were first placed on dry ice. A subsection of placental tissue (0.025 g) was cut from each frozen tissue sample using a sterile dermal curette and washed in 1x PBS (Fisher Scientific, Waltham, MA) to remove any residual blood. To preserve sample integrity, washed samples were immediately snap frozen in homogenization tubes and placed on dry ice. Tissue segments were homogenized using a sterile stainless-steel bead (Qiagen, Germantown, MD) in RLT + lysis buffer (Qiagen) with the TissueLyserII instrument (Qiagen). Samples were spun to collect the bead and any cellular debris, with homogenate samples stored at -80°C until nucleic acid extraction. RNA molecules 18 nucleotides and greater were extracted using the AllPrep DNA/RNA/miRNA Universal kit (Qiagen). RNA quantity was measured using the NanoDropTM 1000 Spectrophotometer (Thermo Scientific, Waltham, MA) and tested for quality based on RNA integrity scores produced by the QIAxcel system (Qiagen).
Using isolated RNA samples, genome-wide miRNA expression profiles were measured using the HTG EdgeSeq miRNA Whole Transcriptome Assay (HTG Molecular Diagnostics, Tucson, AZ). This assay uses next-generation sequencing technologies to analyze expression levels of n=2,083 human miRNA transcripts. The counts of sequencing reads per miRNA were aligned to miRBase v20 and organized using Parser (HTG Molecular Diagnostics). The isolated RNA samples were also used to measure genome-wide mRNA expression profiles using the QuantSeq 3′ mRNA-Seq Library Prep Kit (Illumina). Libraries were pooled and sequenced (single-end 50 bp) on one lane of the Illumina Hiseq 2500 and the counts of sequencing reads per mRNA were aligned to the GENCODE database v3 (43) and organized using Salmon (44). This process yielded measures of n=37,268 unique human RNA transcripts, including protein-coding and non-coding RNAs. The resulting summarized count data were then used in data processing and statistical analyses.
Statistical analyses of differential miRNA and mRNA expression
In total, 360 placentas were included in the mRNA and miRNA analyses. Subjects were removed from the mRNA and/or miRNA analyses if they met the following exclusion criteria: (i) low expression values (i.e., non-detection across all genes) (n = 2 for the mRNA analysis); (ii) identified as sample outliers through PCA (n = 2 for the mRNA analysis, n = 2 for the miRNA analysis); (iii) demographic data were missing for the included covariates (n = 26). miRNA and mRNA sequencing data were processed separately using packages within R (v3.6.2). Count data were first filtered to exclude universally lowly expressed transcripts, requiring that > 25% of the samples be expressed at signals above the overall median signal intensity. This resulted in the final inclusion of 10,412 mRNAs and 1,131 miRNAs for the analyses. Potential sample outliers were identified via principal component analyses (PCA) and hierarchical clustering, using the prcomp and hclust functions. Count data were then normalized by median signal intensity using algorithms within DESeq2 (v1.24.0) to produce variance-stabilized expression counts (45). Potential sources of sample heterogeneity, including batch effects, were accounted for using surrogate variable analysis (SVA) within the SVA R package (v3.32.1). This method uses default parameters to estimate control probes (45). Three significant surrogate variables were calculated and included as covariates in the final model.
Demographic covariates with plausible associations with pre-pregnancy BMI and placental gene expression were included in the model: maternal race (White/non-White), smoking status (none/first- or secondhand), maternal age (continuous), and a summative proxy for socioeconomic status indicating (1) reliance on public assistance, (2) less than 12 years of education, and/or (3) publicly or uninsured status (0-3). These covariates were selected if they were significantly associated (p < 0.05) with the exposure and/or based on their a priori status as confounders using a directed acyclic graph (DAG) approach. Pre-pregnancy BMI was categorized into four bins, according to standards provided by the Centers for Disease Control and Prevention (CDC): underweight: < 18.5 kg/m2, normal: 18.5 to < 25.0 kg/m2, overweight: 25.0 to < 30.0 kg/m2, and obese: ≥ 30.0 kg/m2 (46). Normal weight was the referent to which each group (underweight, overweight, and obese) was compared.
Statistical methods incorporating negative binomial generalized linear models within the DESeq2 package were implemented to identify mRNAs and miRNAs with expression levels associated with pre-pregnancy BMI status. This method uses a Wald test and generates z-statistics based on the calculation of shrunken logarithmic fold changes in expression for each BMI category (e.g. overweight vs. normal) divided by their standard errors. Resulting z-statistics are compared against standard normal distribution curves, generating Wald test p-values. These p-values are then adjusted for multiple testing using the Benjamini and Hochberg (BH) procedure (47). Separate models were run for mRNAs and miRNAs. Differentially expressed mRNAs and miRNAs were defined as those with false discovery rate (FDR) p-value < 0.10, based on a BH-adjusted p-value. Both mRNA- and miRNA-seq data were first analyzed for statistical relationships with pre-pregnancy BMI using placental data from all newborns. Next, a sex-stratified approach was used, analyzing placenta data derived from male newborns separate from female newborns to evaluate potential sexually dimorphic patterns.
Identification of gene targets of miRNAs and correlation between miRNA-mRNA expression
An in silico approach was utilized to identify known miRNA-mRNA interactions among miRNAs and mRNAs that were both associated with pre-pregnancy BMI. The Ingenuity Knowledge Database (Ingenuity Systems®, Redwood City, CA), which couples experimentally observed miRNA-mRNA interactions curated from literature and computational predictions, was queried. First, known mRNA targets of the miRNAs identified with expression levels significantly associated with pre-pregnancy BMI were identified. To detail further, experimental observations were gathered from TarBase, which contains approximately 670,000 unique miRNA-mRNA interactions shown through published literature (48). Computationally predicted interactions were derived using algorithms generated through TargetScan Human v7.2, which identifies miRNA-mRNA interactions based on potential base pairing homologies between the 3’ untranslated mRNA regions and miRNA seed sequences (49). Second, resulting potential miRNA-mRNA interactions were filtered to include those that have been experimentally observed of those with high predicted confidence, measured as cumulative weighted context scores £-0.4. These scores represent an aggregation of factors influencing the likelihood of miRNA-mRNA interactions, including binding site type and location, local adenine and uracil content, target site abundance, seed-pairing stability, and supplementary pairing (50). Third, predicted gene targets were filtered to include only those that were observed to be significantly associated with pre-pregnancy BMI. These steps resulted in a list of miRNA-mRNA predicted interactions. Thus, a miRNA-mRNA predicted interaction is defined hereafter as a miRNA and a mRNA in which 1) the mRNA is a predicted gene target of the miRNA and 2) both miRNA and mRNA were found to be significantly associated with BMI. Correlations between variance-stabilized miRNA and mRNA expression counts were quantified using Pearson correlations.
Pathway enrichment analyses
Canonical pathway analyses were performed on mRNAs associated with underweight status among males only given that few (N=2) genes were differentially expressed in female placentas. Using the Ingenuity Knowledge Database (Ingenuity Systems®, Redwood City, CA), we sought to identify the systems-level response to pre-pregnancy BMI status and uncover biological functions and diseases linked to mRNAs dysregulated in relation to underweight status. Over-represented canonical pathways were defined as those containing more pre-pregnancy BMI-associated mRNAs than expected by random chance, as based on a BH-corrected p-value calculated from a right-tailed Fisher’s Exact Text (47,50). Expression analyses first considered molecules and/or relationships where “(confidence = Experimentally Observed)”. Next, molecules and/or relationships where “(confidence = Experimentally Observed) AND (tissues = Placenta)” were considered. Pathways with enrichment BH-corrected p-values <0.05 were considered significant.