Early Pregnancy Peripheral Blood DNA Methylation and Risk of Gestational Diabetes Mellitus: A Nested Case-Control Study

Genome-wide DNA methylation proling has been used to identify CpG sites relevant to gestational diabetes mellitus (GDM). However, these sites have not been veried in larger samples. Here, our aim was to evaluate the changes in target CpG sites in the peripheral blood of pregnant women with GDM in their rst trimester. This nested case-control study examined a large cohort of women with GDM in early pregnancy (10–15 weeks; n = 80). Target CpG sites were extracted from related published literature and bioinformatics analysis. The DNA methylation levels at 337 CpG sites located in 27 target genes were determined using MethylTarget™ sequencing. The best cut-off levels for methylation of CpG sites were determined using the generated ROC curve. The independent effect of CpG site methylation status on GDM was analyzed using conditional logistic regression.


Introduction
Gestational diabetes mellitus (GDM) is de ned as any degree of glucose intolerance with onset or rst recognition during pregnancy. 1 Alongside the increasing prevalence of obesity and type 2 diabetes, the incidence of GDM has risen annually. 2 Speci cally, in China, this incidence has been estimated at 14.8% (12.8-16.7%). 3 GDM in pregnant women can result in severe adverse pregnancy outcomes complications, including macrosomia, premature birth, and fetal malformations. 4 GDM also has serious consequences for the offspring, such as childhood obesity, insulin resistance, 5 and impaired neurodevelopment. 6 In addition, GDM can recur, may lead to increased risk of postpartum type 2 diabetes. 7 The mechanisms underlying GDM include genetic background, in ammatory factors, and oxidative stress. Wu et al. have shown that GDM has a genetic component, and the differences in GDM among races may be due to the interaction between genes and the environment. 8 Epigenetics bridges the gap between genes and the environment at the molecular level. 9 10 In particular, DNA methylation is the most common and the earliest form of epigenetic modi cation. This process is catalyzed by DNA methyltransferases and uses S-adenosylmethionine as the methyl donor to convert the CpG site dinucleotide 5'-cytosine to the 5'-methylcytosine. The CpG site density and methylation degree of the upstream promoter region of the gene directly affect gene activity and expression. 10 DNA methylation therefore plays a key role in regulating genome transcription. An increasing number of studies have shown that DNA methylation levels might affect the occurrence and development of GDM; most such studies investigated the relationship between GDM and DNA methylation levels using cord blood and placental tissues of pregnant women. However, these samples are not representative of the DNA methylation levels during early pregnancy because they can only be collected in the third trimester. Chim et al. showed that changes in DNA methylation occur during pregnancy using peripheral blood samples form pregnant women. 11 Kang et al. conducted genome-wide DNA methylation chip analysis of the peripheral blood samples of eight patients with GDM and eight healthy women in late pregnancy and revealed varying methylation levels in 200 CpG sites across 151 genes. 12 13 Moreover, Wu et al. evaluated DNA methylation levels in the peripheral blood of pregnant women in the rst trimester using genomewide DNA methylation analysis. They observed varying DNA methylation levels at 100 CpG sites corresponding to 66 genes in the GDM group (n = 11), further suggesting that the DNA methylation status of 5 CpG sites, in the COPS8, PIK3R5, HAAO, CCDC124, and C5orf34 genes could be used as clinical biomarkers of GDM. 14 Similarly, also using genome-wide DNA methylation analysis, Enquobahrie et al.
found 17 hypomethylated and 10 hypermethylated CpG sites in the GDM group. 15 However, because these studies employed whole-genome DNA methylation analysis, their sample sizes were necessarily small. The signi cance of CpG site methylation has not yet been veri ed in a large study cohort.
Here, we evaluated the DNA methylation levels in the peripheral blood of women in early pregnancy using speci c target gene DNA methylation detection in order to verify the relationship between the methylation status of the targeted CpG sites and GDM.

Study design and population
This was a nested case-control study based on an early pregnancy follow-up cohort. The cohort was established in Hunan Province Maternal and Child Health Hospital (ChiCTR1900020652) between March 2017 and December 2018, and a total of 890 pregnant women were enrolled. All the eligible participants agreed to participate in this study and provided written informed consent. The study protocol was approved by the Medical Ethical Committee of the Hunan Provincial Maternal and Child Health Hospital in South China (approval number: EC201624 on January 11, 2017) and all the methods were carried out in compliance with relevant guidelines. Pregnant women were recruited in their rst trimester (10-14 weeks) and followed up for 42 days post-partum. The inclusion criteria were: (i) singleton pregnancy and natural conception; (ii) diabetes-free at recruitment; (iii) had not received any antibiotic treatment throughout the current pregnancy; (iv) no acute infection in the 2 weeks before sample collection; (v) planned to attend for all obstetric examinations and delivery at the above hospital. We collected questionnaire data and venous blood samples; additional patient information and data concerning their clinical examinations were collected through the hospital's electronic recodes system. The venous blood samples (5 ml) were collected during early pregnancy by certi ed nurses in the morning following a 10-h overnight fast. Serum and blood cells were separated by centrifugation at 3500 rpm for 15 min and stored at − 80°C until further use.

Diagnostic criteria for GDM and selection of controls
All subjects underwent a 2-h standard 75g oral glucose tolerance (OGTT) 16 in the hospital outpatient department at 24-28 weeks of gestation. The oxidase method was used to estimate blood glucose levels, with measurement completed using an automatic biochemical analyzer (Hitachi 7600) at the hospital. GDM was de ned according to the International Association of Diabetes and Pregnancy Study Groups (IADPSG) standard. That is, GDM was considered to be present when at least one of the following blood glucose concentrations was obtained: ≥ 5.1 mmol/L (fasting), ≥ 10.0 mmol/L (after 1 h), and ≥ 8.5 mmol/L (after 2 h). 16 The controls were selected from among women in the same cohort who had normal blood glucose levels throughout the pregnancy. A 1:1 pair match for each GDM patient was identi ed, based on age (± 3 years) and gestational age (± 1 week), resulting in a nal study population of 80 eligible GDM patients and 80 healthy controls.

Selection of candidate CpG sites
Suitable target CpG sites were identi ed from the literature and from bioinformatics analysis aimed at nding CpG sites related to GDM. Published studies have used whole-genome methylation sequencing methods to identify the associations of CpG sites. In the literature, we selected the 21 target CpG sites that were most likely to be related to the occurrence of GDM. 12-15 17 We also found two target CpG sites related to GDM through bioinformatics methods. Moreover, the literature 14 18 19 provided four additional genes closely related to the occurrence of GDM, namely RDH12, GCK, PPARG, and IL6. Since these reports did not indicate any selectable CpG sites, we tested all CpG islands in the promoter regions of these four genes, generating a total of 6 CpG islands, with one or two per promoter region. Overall, this procedure therefore identi ed a total of 29 CpG islands, containing 337 CpG sites. We obtained the sequences for each of the 29 CpG island target fragments. The primers used and their sources are shown in Supplementary Tables S1, S2, and S3.

DNA extraction
Genomic DNA was extracted from frozen samples using Genomic Tip-500 columns (Qiagen, Valencia, CA, USA) and from bisul te-converted samples using the EZ DNA Methylation™-GOLD Kit (Zymo Research, CA, USA) in accordance with the manufacturer's instructions. Genomic DNA integrity was measured using agarose gel electrophoresis and quality control was ensured using a NanoDrop 2000 (NanoDrop technologies, Wilmington, DE, USA), which requires that the DNA concentration ≥ 20 ng/µL, and that the total amount of DNA ≥ 1 µg.

DNA methylation analysis
The DNA methylation level of the target CpG site is de ned as the number of methylated reads at that site (i.e., the number of reads with base C detected) divided by the total number of reads at that site, and was obtained by MethylTarget™ sequencing (Genesky Biotechnologies Inc. Shanghai, China), a method based on next-generation sequencing-based multiple targeted CpG methylation analysis. Primer design and validation were performed using bisulfate-converted DNA samples on the Methylation Primer software. The primer sets were designed to ank each target CpG site by 100-300 nucleotides and are summarized in Supplementary Table S1. After PCR ampli cation (HotStarTaq polymerase kit, TAKARA, Tokyo, Japan) and library construction, paired-end sequencing was performed (Illumina Hiseq Benchtop Sequencer, CA, USA) in accordance with the manufacturer's protocol. To ensure the accuracy of sequencing results, we added a quality control step to the DNA methylation sequencing method; for details, see Supplementary Text S1.

Covariates
In this study, we collected information on maternal demographics, lifestyle, and pregnancy history through structured questionnaires during each follow-up. This included factors with the potential to confound the exposure-outcome relationship, including pre-pregnancy body mass index (BMI) (continuous), history of drinking (yes/no), history of smoking (yes/no), parity (continuous), pregnancy times (continuous), polycystic ovary syndrome (PCOS) (yes/no), and waist circumference at enrolment (continuous). Many study participants were primiparas, so history of GDM was not considered in the analysis.

Statistical analysis
Continuous data and categorical data were represented by the mean ± standard deviation (SD) and frequency (percentage), respectively. Paired-samples T test were used to compare normally distributed continuous data, whereas Wilcoxon signed rank test were used to analyze non-normally distributed continuous data. Dichotomous variables were analyzed using McNemar χ 2 test. P < 0.05 was considered statistically signi cant, and all statistical tests were two-sided. ROC curve analysis was used to assess the possible predictive value of the methylation level of individual CpG site for the occurrence of GDM.
CpG sites with positive correlation between methylation and GDM were considered risk sites, and those with inverse correlation were considered protective sites. We used the control group to provide the state variable value in the ROC curve analysis for the protective CpG sites. Through the ROC curve, the methylation status (high or low) of the target CpG site was classi ed based on the best cut-off value, de ned as the DNA methylation level with the highest Youden index. Conditional logistic regression analysis was used to determine the independent in uence of target CpG site methylation status on GDM.
Covariates with signi cant differences were included in the model for correction. The model variable selection criterion was α in = 0.05; the variable elimination criterion was α out = 0.10; the Wald forward method was used to establish a conditional logistic regression model to screen CpG sites with independent effects. All the statistical analyses were performed using SPSS software v25.0 (IBM Corporation, Armonk, NY, USA).

Patient characteristics
The participant characteristics are summarized in Table 1. The age of the GDM patients ranged between 23 and 43 years (mean: 31.6 years), whereas that of the healthy controls ranged between 24 and 45 years (mean: 32.0 years). No signi cant difference (P > 0.05) was observed between the two groups in terms of gravidity, parity, PCOS, smoking history, alcohol intake history, age, or gestational age. The GDM group had higher fasting glucose, 1-h post-OGTT glucose, 2-h post-OGTT glucose, pre-pregnant BMI, and waist circumference than the control group.    Figure S1). Brie y explanations the functions of the genes where these sites are located can be found in Supplementary Table S5. Missing data were replaced by the average (Supplementary Table S4). The methylation levels at 6 CpG sites within the ARHGAP40, STAT1, C5orf34, RDH12, and YAP1 genes were higher in the GDM group than in the control group, whereas those at 7 CpG sites within the HAPLN3, IFNGR2, NAGA, YAP1, NFATC4, and DNAJB6 genes were lower in the GDM group than in the control group. Predictive value of the methylation status of the CpG site For the 13 signi cantly different DNA methylation CpG sites, we further estimated the possible predictive value of the methylation level of individual CpG site for the occurrence of GDM using the ROC curve. The ROC curve parameters and the cut-off value are summarized in Table 3. The largest area under the curve (AUC) reached 0.650.

Comparison of the DNA methylation status of target CpG sites
To clearly show the effect of DNA methylation at the target CpG sites on GDM occurrence, we classi ed the DNA methylation levels into hypermethylation and hypomethylation statuses based on the best cutoff value (Table 3). Table 4 presents the differences in the DNA methylation statuses of the CpG sites between the GDM and control groups. Signi cant differences were observed in 8 CpG sites based on the McNemar χ 2 test (P < 0.05).

Conditional logistic regression analysis for DNA methylation status and GDM
Conditional logistic regression analysis was used to analyze the independent effect of the methylation status of the individual site on GDM occurrence. The independent variables included the methylation status of the eight signi cantly different CpG sites listed in Table 4 (0= "hypomethylation"; 1= "hypermethylation"). The confounding variable included waist circumference and pre-pregnancy BMI.

Discussion
An increasing number of studies have explored the pathogenesis of GDM from the perspective of epigenetics. However, most of these were small (< 30 GDM cases), and they mainly observed the associations between GDM occurrence and the DNA methylation level of cord blood or placental tissue. 20 21 In this study, we evaluated the DNA methylation status of GDM-related CpG sites in the peripheral blood of women in early pregnancy using MethylTarget™ sequencing. In addition, we veri ed the associations between target CpG sites and GDM using a larger sample size (n = 80). Overall, we identi ed 13 CpG sites with signi cant differences in DNA methylation levels between the GDM and control groups based on quantitative analysis. The AUCs of the ROC curve for each methylation level of the signi cant CpG sites ranged from 0.593 to 0.650 predictive utility in relation to GDM. The methylation status of eight individual CpG sites were identi ed as differing signi cantly between GDM and control groups by qualitative analysis, and these were located in the promoter regions of RDH 12, HAPLN3, NFATC4, YAP1, and DNAJB6, and the intron region of C5orf34. Importantly, we found that the methylation statuses of four CpG sites were signi cantly associated with GDM occurrence, namely cg89438648 (HAPLN3), cg68167324 (RDH12), cg157130156 (DNAJB6), and cg24837915 (NFATC4), using conditional logistic regression analysis.
In this study, hypermethylation of the CpG site cg89438648, located in the promoter region of HAPLN3, was found to suggest a lower risk of GDM (OR = 0.206; 95%CI: 0.065 ~ 0.655). HAPLN3 codes for hyaluronan and proteoglycan link protein 3 (HAPLN3), and the connexin 3 belong to the hyaluronic acid and proteoglycan connexin (HAPLN) family, which plays roles in the aggregation of proteoglycans and hyaluronic acid, and in cell adhesion. 22 HAPLN3 is involved in the organization and stability of the hyaluronic acid (HA)-dependent extracellular matrix (ECM) in many tissues. HA is one component of the ECM within the islet tissue of humans and mice. 23 It can cause islet amyloid deposition, which is associated with decreased β-cell area and an increase in β cell apoptosis. 24 Hull et al. suggested that islet amyloid deposition could reduce the number of β-cells. 24 25 Hypermethylation of the CpG site cg89438648 located in the HAPLN3 promoter region, could reduce the level of HAPLN3, in turn reducing the stability of the HA-ECM, and consequently reducing the impact amyloid deposition on β cells.
We found that the hypermethylation status of CpG site cg68167324 located in RDH 12, can increase the risk of GDM (OR = 3.168; 95%CI: 1.038-9.666). RDH 12 encodes retinol dehydrogenase 12 (RDH12), a member of the short-chain dehydrogenases/reductases (SDRs) family, 26 which participates in steroid and retinol metabolism. 27 RDH12, a NADPH-dependent all-trans retinol dehydrogenase, is the key enzyme in the metabolism of retinoids. 28 Two oxidation products of retinoids, 9-cis-retinoic acid and all-trans retinoic acid, function to stimulate insulin secretion. 29 In adipocytes, retinoic acid induces the expression of the insulin signaling gene PDK-1 and that of the glucose transporter GLUT4. Activating retinoic acid induces the expression of genes involved in lipid and glucose metabolism, thereby improving insulin action. 30 Thus, hypermethylation of the CpG cg68167386 located upstream of the promoter region of RDH12 may inhibit its transcriptional activity and reduce RDH12 levels in peripheral blood. Subsequently, the retinoic acid metabolic pathway would be inhibited, affecting insulin secretion, and reducing its effectiveness.
The DNAJB6 (DnaJ homolog, subfamily B, member 6) protein is a member of the heat shock protein 40 (HSP40) family 31 and acts as a molecular chaperone for various cellular processes. While observing insulin resistant and diabetic patients, Kurucz et al. found that HSP expression was signi cantly changed without diabetes, and that the mRNA level of HSP72-inducible subtypes was signi cantly reduced in patients with type 2 diabetes. 32 Additionally, the expression of HSP70 in the skeletal muscle of patients with type 2 diabetes is reduced and has been shown to correlate with the degree of insulin resistance. 33 These HSP molecular chaperones are related to diabetes. 34 However, the exact association between DNAJB6 and type 2 diabetes needs further study. In this study, hypermethylation of CpG sites cg157130156, located in the promoter region of DNAJB6, was observed in the GDM group. This might result in increased DNAJB6 levels via the up-regulation of DNABJ6 transcription, thereby reducing the risk of GDM (OR = 0.361; 95%CI: 0.135-0.966).
NFATC4 codes the nuclear factor of activated T cells 4 (NFATC4), which is a member of the transcription factor family under the control of calcineurin (a Ca 2+ -dependent phosphatase) . 35 In adipose tissue, NFATC4 has been shown to promote the secretion of in ammatory factors, 36 and to act as a transcriptional repressor in regulating adiponectin gene expression, suggesting that adiponectin expression is down-regulated in obesity and type 2 diabetes. 37 In this study, hypermethylation of the CpG site cg24837915 located in the promoter region of NFATC4, was associated with the presence of GDM (OR = 5.232; 95% CI, 1.659-16.506).
During pregnancy, early anabolism increases and mild insulin resistance occurs. 38 When insulin secretion fails to balance insulin resistance, impaired glucose tolerance develops, which might subsequently lead to GDM. 39 Therefore, impaired secretion by β cells is also a key factor in GDM pathogenesis. Here, we explored the pathogenesis of GDM from an epigenetic perspective and identi ed 13 CpG sites that had methylation levels showing associations with GDM pathogenesis. Furthermore, conditional logistic regression analysis showed that the methylation status of four CpG sites located in the promoter regions of four genes was associated with GDM pathogenesis. These CpG sites are located in genes that could contribute to the development of GDM. Of these four CpG sites, hypermethylation of cg24837915 and cg68167324 was shown to be associated with GDM, whereas that of cg89438648 and cg157130156 could indicate reduced risk of GDM. Thus, the methylation status of these genes may function as predictors of GDM. No publications reporting on the relationship between methylation of these four CpG sites and GDM have been found, so our suggestion of such a relationship is based on the known modes of action of the genes concerned.
However, the study also had some limitations. First, our limited sample size, further veri cation of our results using a larger sample size is needed. Second, the selection of our target CpG sites is based on published literature, and we did not screen for differential CpG sites in the same population in this study, so there may be other CpG sites related to the pathogenesis of GDM that have not been veri ed.

Conclusions
In summary, by determining the DNA methylation of target CpG sites in the peripheral blood of women in early pregnancy, we found that the methylation levels of 13 CpG sites were related to the occurrence of GDM by quantitative analysis. After adjusting for possible confounding factors by conditional logistic regression, four CpG sites showed independent effects on the occurrence of GDM. These ndings indicate that the methylation status of these CpG sites in the peripheral blood of pregnant women during the rst trimester was related to the pathogenesis of GDM, and may be a potential predictor of GDM.

Declarations
Data availability statements: Raw data of patient characteristics and the data of target fragment DNA methylation sequencing generated during the current study is available from the corresponding author upon reasonable request.