Study populations
Placental tissue samples were collected from the InTraUterine sampling in early pregnancy (ITU) study, the Prediction and Prevention of Preeclampsia and Intrauterine Growth Restriction (PREDO) study [18], and the Betamethasone (BET) study [19].
ITU and PREDO are Finnish cohort studies consisting of women and their children who were followed throughout pregnancy and beyond. In ITU, women were recruited through the national voluntary prenatal screening program for trisomy 21. If this screening indicated an increased risk of fetal chromosomal abnormalities based on routine serum, ultrasound screening, age and patient history, women were offered further testing including chorionic villus sampling (CVS) at the Helsinki and Uusimaa Hospital District Feto-maternal Medical Center (FMC). During this visit, women were informed about the ITU study. If the chromosomal test indicated no fetal chromosomal abnormality, those who had expressed interest in participating were contacted for final recruitment. Another set of women were informed about ITU when attending the routine screening at maternity clinics. If interest in participating was expressed, they were contacted for final recruitment into the ITU study. In PREDO, the recruitment took place when women attended their first routine ultrasound screening. Some of the women were recruited based on having clinical risk factors for preeclampsia and intrauterine growth restriction, others were recruited independently of these factors [18]. The aim of the BET study was to investigate the effect of antenatal betamethasone on the transplacental cortisol barrier and fetal growth [19]. Pregnant women with preterm labor and cervical shortening were treated with a single course of antenatal BET (Celestan®, MSD GmbH, Haar, Germany) for fetal maturation between 23 + 5 and 34 + 0 weeks of gestation and were recruited prospectively before birth. A gestational-age-matched control group consisted of pregnant women who received no antenatal BET.
Placental tissue samples
In the ITU study, first-trimester placental biopsies were obtained from leftover CVS, following indications of elevated risk for chromosomal abnormalities between 10 and 15 weeks of gestation. Placenta samples were also collected at birth, whereby midwives/trained staff took nine-site biopsies (within maximum 120 minutes after delivery) from the fetal side of the placenta, at 2–3 cm from umbilical cord insertion. In the PREDO study, placenta nine-site biopsies (within maximum 90 minutes after delivery) were taken from the decidual side of the placenta. In the BET study, full-thickness placental biopsies were taken by a uniform random sampling protocol [20, 21] from both peripheral and central areas. All samples were stored at − 80°C.
Throughout the manuscript, we refer to all placental samples collected at birth as “term placenta”, and to all placental CVS samples collected during early pregnancy as “CVS”.
DNA methylation (DNAm)
From the collected samples, DNA was extracted according to standard procedures and DNAm was assessed using the Illumina Infinium MethylationEPIC array (Illumina, San Diego, USA). In total, DNA methylation levels were assessed in 1,055 samples: n = 277 CVS samples (ITU), and n = 500 placental samples (ITU), n = 140 placental samples (PREDO), and n = 138 placental samples (BET) taken at birth. All DNAm data were pre-processed in the same way, using an adapted pipeline from Maksímovíc et al. [22] and the R package minfi [23]. Beta values were normalized using stratified quantile normalization [24], followed by BMIQ [25]. Batch-effects were removed using Combat [26].
The final data sets comprised 264 CVS samples from ITU (n = 716,331 probes) and 486 placental samples (n = 665,190 probes) from ITU, 139 placenta samples (n = 755,154 probes) from PREDO and 137 placenta samples (n = 708,222 probes) from the BET study. Of these, 652,341 probes overlapped across all four data sets.
Gestational age, child sex and ethnicity variables
Gestational age (GA) at sampling was based on fetal ultrasound. Child sex was extracted from the Finnish Medical Birth Register (MBR) in ITU and PREDO and obtained from postnatal assessment in the BET study. To retrieve information about genetic background, we performed multi-dimensional scaling (MDS) analysis on the identity-by-state (IBS) matrix of quality-controlled genotypes [27] and extracted the first two components for ITU and PREDO and the first four components for the BET study. This was available for n = 200 individuals with CVS tissue in ITU, and n = 439 individuals with term placental tissue in ITU, in n = 118 individuals with term placental tissue in PREDO and n = 136 individuals with term placental tissue in BET. Genotyping was performed on Illumina Infinium Global Screening arrays for BET and ITU and on Illumina Human Omni Express Arrays for PREDO. DNA for genotyping was extracted from cord blood in ITU and PREDO, if available, otherwise placental tissue was used in ITU. DNA was extracted from placental tissue in the BET study. Further details about genotypic assessment and quality control in the ITU and PREDO cohorts has been published elsewhere [28]. For the BET study the genotype quality control pipeline was set up in PLINK 1.9 [29] and included only markers with a callrate of at least 98%, a minor allele frequency of at least 1% and a p-value for deviation from Hardy-Weinberg-Equilibrium > 1.0 × 10− 06. Furthermore, individuals with a call rate below 98% were removed. All individuals were unrelated (pi-hat, i.e., estimated IBD fraction, < 0.125 with any other sample).
An overview of study sample characteristics is given in Table 1.
Table 1 Study sample characteristics (Mean (SD) or N (%) for each variable)
|
ITU
|
PREDO
|
|
BET
|
|
CVS
|
Placenta
|
Placenta
|
|
Placenta
|
Sample size
|
264
|
470
|
139
|
|
137
|
Phenotypes
|
|
|
|
|
|
Gestational age
|
12.79 (0.82)
|
39.99 (1.55)
|
39.89 (1.43)
|
|
38.16 (1.95)
|
Child sex (male)
|
140 (53%)
|
238 (51%)
|
67 (48%)
|
|
70 (51%)
|
Reference-based cell types
|
|
|
|
|
|
Trophoblasts
|
0.26 (0.06)
|
0.01 (0.03)
|
0.04 (0.05)
|
|
0.13 (0.06)
|
Stromal
|
0.17 (0.06)
|
0.01 (0.02)
|
0.04 (0.03)
|
|
0.11 (0.02)
|
Hofbauer
|
0.00 (0.01)
|
0.00 (0.01)
|
0.00 (0.00)
|
|
0.00 (0.00)
|
Endothelial
|
0.00 (0.01)
|
0.01 (0.02)
|
0.08 (0.03)
|
|
0.11 (0.02)
|
nRBC
|
0.00 (0.01)
|
0.04 (0.03)
|
0.00 (0.01)
|
|
0.00 (0.00)
|
Syncytiotrophoblasts
|
0.57 (0.04)
|
0.93 (0.06)
|
0.83 (0.08)
|
|
0.66 (0.08)
|
Reference-free cell types
|
|
|
|
|
|
C1
|
0.26 (0.14)
|
0.11 (0.09)
|
0.43 (0.19)
|
|
0.35 (0.2)
|
C2
|
0.30 (0.15)
|
0.07 (0.07)
|
0.51 (0.20)
|
|
0.46 (0.2)
|
C3
|
0.14 (0.07)
|
0.23 (0.13)
|
-
|
|
0.14 (0.1)
|
C4
|
0.10 (0.07)
|
0.13 (0.09)
|
-
|
|
-
|
C5
|
0.14 (0.10)
|
0.13 (0.09)
|
-
|
|
-
|
C6
|
-
|
0.11 (0.08)
|
-
|
|
-
|
C7
|
-
|
0.09 (0.07)
|
-
|
|
-
|
C8
|
-
|
0.08 (0.07)
|
-
|
|
-
|
Cell type composition estimation
Reference-based cell type composition into six cell types (nucleated red blood cells, trophoblasts, syncytiotrophoblasts, stromal, Hofbauer, endothelial) was estimated using a reference recently published by Yuan et al. [17] and implemented within the R package planet, by applying the robust partial correlation algorithm [30]. Reference-free cell types were estimated following the protocol suggested in the R package RefFreeEWAS [31], which led to five estimated cell types in CVS (ITU), and eight estimated cell types (ITU), two estimated cell types (PREDO) and three estimated cell types (BET) in term placenta.
Statistical Analyses
All statistical analyses were performed in R, version 4.0.5 [32].
Filtering of invariable probes in DNAm
To assess the influence of cell types on DNAm, we first filtered for variable CpGs by excluding placenta-specific non-variable CpGs. We applied a procedure described by Edgar et al. [33] to the overlapping CpGs (n = 652341) of all four placental methylation data sets from the EPIC array, to identify sites with < 5% range between 10th and 90th percentile in DNAm beta values using our data sets. This resulted in 120,548 CpGs (listed in Supplementary Table S1) that we identified as non-variable for placental EPIC methylation data and excluded from further analyses. Identifying these CpGs is useful to reduce dimensionality, and becomes especially relevant for future studies, e.g., epigenome-wide association studies (EWAS), aiming to use our resources. Furthermore, the 1,050 CpGs used to predict cell type composition in the model by Yuan et al. [17] were excluded from the following analyses to prevent circular conclusions.
Capturing DNAm variance through principal components and filtering of individuals
To capture the major variance in DNAm, we performed singular value decomposition on methylation beta values, and extracted the first principal component (PC1) explaining most of the variance for every data set (Supplementary Fig. S1). For term placenta from ITU we identified n = 16 outliers representing values greater than three times inter-quartile-range in PC1 (see Supplementary Fig. S2a). The same samples showed lower sample-sample correlations in DNAm beta values with the other placenta samples (Supplementary Fig. S2b) and presented different cell type proportions (Supplementary Fig. S2c). Thus, we excluded these samples from the ITU placenta data set, resulting in n = 470 term placenta samples from the ITU cohort. We re-calculated the principal components (PC) without these outliers in the ITU term placenta data set. For CVS from ITU and term placenta data sets from PREDO and BET no such outliers were identified.
Correlation between reference-based and reference-free estimated cell types
Spearman correlations were calculated between reference-based and reference-free cell types in every tissue.
Models to predict DNAm by cell type proportions (reference-based versus reference-free)
To compare the impact of reference-based versus reference-free estimated cell types on the main variance in DNAm, PC1 of DNAm beta values was regressed linearly on different predictors in six models for every data set:
1. PC methylation ~ 1
2. PC methylation ~ GA at sampling + child sex + PCs ethnicity
3. PC methylation ~ reference-based cell type estimation
4. PC methylation ~ reference-based cell type estimation + GA at sampling + child sex + PCs ethnicity
5. PC methylation ~ reference-free cell type estimation
6. PC methylation ~ reference-free cell type estimation + GA at sampling + child sex + PCs ethnicity
Using cross-validation with 10 folds, 500 repeats and RMSE as loss function, implemented in the R package xvalglms [34], enabled us to evaluate which model best explains placental DNAm variability. This is defined by the number of times a particular model wins in the repeated cross-validation procedure, i.e., the number of times that the model has a smaller prediction error (RMSE, in our case) than all other models considered. RMSE is on the same scale as the outcome variable.
We further tested how much of DNAm variability in all single CpGs could be explained by either reference-based or reference-free estimated cell types. Linear models were fitted for every CpG by predicting DNAm (beta values) with either reference-based or reference-free cell types.
For every CpG, the adjusted R2 was extracted (see Supplementary Fig. S3 for a histogram of R2 values). Afterwards, CpGs with adjusted R2 > 0.30 in all four data sets were extracted and considered as CpGs at which variability of DNAm (beta values) was relatively strongly influenced by cell type proportions. For the following enrichment analyses, the genes (20,038) mapping to all CpGs (534,510) overlapping between the data sets were used as background.
Enrichment analyses
All CpGs were mapped to the closest gene using the R package bumphunter functions annotateTranscripts and matchGenes [35]. Afterwards, the genes corresponding to the extracted CpGs were used as input for the TissueEnrich package [36], while the genes corresponding to all CpGs overlapping between the data sets (without any filtering for R2) were considered as background genes (n = 20,038). The same input and background genes were further used for the PlacentaCellEnrich Tool [37]. Human placental single-cell RNA-Sequencing data [38] were used to retrieve enrichments for placenta cell-specific expression patterns. For both enrichment analyses we used an adjusted p-value of .01 as threshold for enrichment, as recommended by the authors of the PlacentaCellEnrich Tool [37].
Cell type composition analyses
Differences in reference-based cell type proportions between the three term placenta data sets were analyzed using nonparametric global multivariate analysis of variance [39] implemented in the R package npmv [40]. To test for significant differences between the study groups, we applied the global test using the R function nonpartest with default settings, which provides F-distribution approximations, performs multivariate permutation and calculates nonparametric relative effects. The global test was supplemented with a more detailed comparison (R function ssnonpartest) of study groups and cell types using the F approximation of Wilks’ lambda, to identify which variables/factor levels contribute to the significant differences, while controlling for the familywise error rate (α = .01).
Differences in reference-based cell type proportions between CVS and term placenta from the same individuals (n = 85, ITU) were calculated using paired Wilcoxon signed-rank tests. All p-values were corrected for multiple testing (n = 6 cell types) using Bonferroni correction and compared to α = .01.
Spearman correlations and Wilcoxon signed-rank tests were performed to test for relationships between reference-based cell type proportions and GA and child sex (for every cell type separately and corrected for multiple testing among the n = 6 cell types using Bonferroni correction and α = .01).