DNA methylation as a potential mediator of the association between indoor air pollution and neurodevelopmental delay in a South African birth cohort

Abstract


Introduction
The detrimental effects of air pollution on pregnancy outcomes such as low birth weight and respiratory disease in infants are well-known and have been con rmed by many studies over the last several decades [1,2].However, there is limited literature on the impact of prenatal air pollution exposure on neurodevelopmental outcomes, and even less work on the biological mechanisms underpinning these associations.A handful of studies have reported signi cant associations between prenatal air pollution exposure and neurological conditions such as autism spectrum disorder (ASD), attention de cit hyperactivity disorder (ADHD), and more general neurodevelopmental delays [3][4][5][6][7] .
However, the bulk of these studies have been conducted in high income country (HIC) contexts and have focused on the effects of outdoor air pollution; therefore, ndings may not be wholly generalizable to other settings [8,9].It is equally important to address the impact of indoor air pollution, particularly in low-and middle-income country (LMIC) settings where burning fuels such as coal, para n, or wood for cooking or heating indoors is common.In such settings, fuel burning can greatly increase indoor air pollution concentration and its impact on neurodevelopment and other health outcomes [5,10,11] .
Epigenetic modi cation has long been considered as a key missing link to understanding how geneenvironment interactions affect neurodevelopment [12,13].As such, careful dissection of the relationship between air pollution, epigenetic modi cation, and neurological outcomes may allow us to better understand the complex mechanisms underlying the impact of environmental risk factors on neuropsychiatric disorders and neurodevelopment.With the rise of high-throughput genomics, the eld of epigenetics has undergone rapid development.Epigenetic modi cation, speci cally DNA methylation (DNAm), has been linked to a number of neuropsychiatric outcomes such as severe neurodevelopmental delay, schizophrenia, ASD, and ADHD [14][15][16][17][18][19][20].DNAm is known to play a key role in embryonic development and has been hypothesized to impact neural stem cell differentiation and maintenance [21], thereby affecting neuropsychiatric outcomes throughout the life course.DNAm is potentially reversible and identi cation of differentially methylated CpG sites may be useful in multiple contexts, including clinical therapy design and biomarker identi cation [22] .
DNAm levels are altered by a number of environmental exposures such as drugs, nutrition, stress, and air pollution [8,9,23,24].Data analyzed as part of the Pregnancy and Child Epigenetics (PACE) consortium showed the effects of prenatal exposure to nitrogen dioxide (NO 2 ), airborne particulate matter with a diameter of 10 microns or less (PM 10 ), and airborne particulate matter with a diameter of 2.5 microns or less (PM 2.5 ) on newborn and childhood DNAm [8,9].Prenatal exposure to each of these pollutants has been associated with differential DNAm in neonates, so highlighting the need for additional research to understand how environment-driven epigenetic changes impact fetal development and downstream health outcomes [8,9].While there is evidence of an association between air pollution and DNAm as well as between DNAm and neurodevelopment, few studies have examined the interconnections between them.
To the best of our knowledge, only one study has examined the relationship between prenatal indoor air pollution exposure, DNAm, and neurodevelopment in a mediation analysis [5].The study was conducted in the Drakenstein Child Health study (DCHS) and investigated deviations of epigenetic gestational age from chronological gestational age (ΔGA) as a potential mediator of the association between indoor air pollution and adverse neurodevelopment.However, this previous study did not nd evidence of mediation by ΔGA, leading us to take a more granular approach to understand the role of DNAm as a potential mediator of the association between prenatal indoor air pollution and neurodevelopment [5].
In this study, we aimed to identify any differentially methylated CpG sites and gene regions that mediate the association between prenatal exposure to indoor PM 10 and neurodevelopment measured at two years of age in the DCHS using a combination of high dimensional mediation analysis methods (HIMA, DACT, and gHMA) and traditional causal mediation analysis [25][26][27].

Study Population
The DCHS is a South African, population-based birth cohort that enrolled pregnant women from two primary health care clinics in peri-urban communities: Mbekweni and TC Newman.T These clinics serve two demographically distinct populations, speci cally a majority Black African ancestry community and a majority mixed ancestry community [28].The DCHS has followed infants from birth until at least 5 years of age [28].The current study population is composed of 142 mother-child pairs with measures available for cord blood DNA methylation, genotype data, and Bayley Scores of Infant and Toddler Development, 3 rd edition (BSID-III) in at least one of the following domains: cognitive development, general adaptive behavior, language, and motor outcomes.Inclusion was also limited to mother-child pairs with measures available for relevant covariates which included principal components (PCs) from genome-wide genotype data, maternal age, maternal smoking, maternal alcohol use, birth weight, sex, and socioeconomic status (SES)(Table 1).Smoking status was determined by maternal urine cotinine levels collected prenatally, while alcohol use was measured via the Alcohol, Smoking and Substance Involvement Screening Test (ASSIST), a tool introduced by the World Health Organization (WHO) and which has shown good validity in LMIC settings [29].Socioeconomic status was captured as a validated score comprising four socioeconomic indicators: maternal educational attainment, employment status, household income and assets.[29] .
The DCHS staff obtain written consent from mothers on an annual basis and the study was approved by the Ethics Committee of the Faculty of Health Sciences, University of Cape Town, by Stellenbosch University and the Western Cape Provincial Research committee [28].

DNA methylation measurements
As described previously [14], DNA was measured from cord blood collected at time of delivery [30].DNA methylation measures were obtained with both the Illumina In nium HumanMethylation450 BeadChips (n=156) and the MethylationEPIC BeadChips (n=160).Pre-processing and statistics were done using R 3.5.1 and raw iDat les were imported into Rstudio where intensity values were converted into beta values.The 450K and EPIC datasets were merged using the min R package [31].Background subtraction, color correction and normalization were performed using the preprocessFunnorm function [32] .Following sample and probe ltering, 273 samples and 409,033 probes remained for downstream analysis.Of these samples, 142 had genotype data, at least one BSID-III score measured at 2 years of age, and data available for all relevant covariates (Table 1).Batch effects were removed using ComBat from the R package sva [33] .Cord blood cell type composition was predicted using the most recent cord blood reference data set [34] .

Neurodevelopment measurements
The BSID-III is a widely used tool for assessing neurodevelopment in children up to 42 months of age.We included scores from across four distinct domains: cognitive development, language skills, motor function, and adaptive behavior [29].The BSID-III has been validated in LMIC settings and previous research reports its use in the DCHS speci cally [29] .As described previously [29], the DCHS assessed neurodevelopment using BSID-III at two years of age.The DCHS BSID-III assessment was conducted by a trained professional and incorporates direct observation of the child as well as caregiver input.
Composite scores for cognitive, motor, language, and general adaptive behavior domains were scaled to have a mean of 100 and standard deviation of 15 as per standard use of the tool.

Assessment of Indoor Air Pollution Exposure
As described previously [35][36][37], PM 10 was measured using a personal air sampling pump (AirChek 52; SKC, Eighty Four, PA, USA), connected to a styrene lter cassette (37 mm cassette blank; SKC) with a gravimetrically pre-weighted lter (PVC lter 37 mmx5 mm with support pad; SKC) left in the home for 24 hours during the 2 nd trimester of pregnancy [35,36].Filters were weighed after sampling and the National Institute for Occupational Safety and Health method 0600 was used to calculate an average PM 10 concentration over 24 hours [37,38].These 24-hour average PM 10 measurements were used for our analyses.

Statistical Analysis
We used different high-dimensional mediation analysis approaches to investigate whether differential DNAm mediates the association between prenatal indoor air pollution exposure and neurodevelopmental delay.Previous research using this subsample of the DCHS has found a signi cant total effect of PM 10 on neurodevelopment in the cognitive domain, but not in the BSID-III general adaptive behavior, language, or motor function domains [5].Therefore, we treated the BSID-III cognitive domain as our primary outcome and the other BSID-III neurodevelopment domains were used as secondary outcomes to evaluate consistency of results across domains.
Confounders were selected based on existing literature [40,41] and a minimal su cient adjustment set was identi ed for each pathway.Exposure-mediator models were adjusted for SES score, genetic ancestry, and maternal smoking and mediator-outcome models were adjusted for maternal alcohol use, maternal age, SES score, child sex, genetic ancestry, and maternal smoking.We adjusted for genetic ancestry by including the rst ve genotype PCs to account for population strati cation [14].Models were also adjusted for the rst three cell type principal components (PCs), which explained >90% of cell type heterogeneity [14,42,43].Birth weight as a proxy for gestational age was recognized as another possible mediator of the exposure-mediator association and is a possible mechanism through which prenatal indoor air pollution exposure could impact DNAm, therefore we did not control for birth weight in our analyses.
To assess the role of DNAm as a potential mediator of the association between prenatal exposure to PM 10 and neurodevelopment at age two years, we employed three well-documented methods for highdimensional mediation analysis: HIMA (high-dimensional mediation analysis), DACT (divide-aggregate composite-null), and gHMA (gene-based high-dimensional mediation analysis) [25][26][27].As these methods are novel, there is no consensus as to which is the gold standard [44].Therefore, we have incorporated an analysis pipeline based on a combination of these methods with traditional causal mediation analysis in order to assess the robustness of results (Figure S1).It should be noted that all models, used both for high-dimensional mediation analysis and causal mediation analysis, were adjusted for the appropriate confounders as de ned above.
HIMA, a high-dimensional mediation analysis method introduced by Zhang et al. (2016) [25], employs a dimension reduction technique followed by minimax concave penalty (MCP) -penalized estimation of mediation effects and joint signi cance testing for mediation effects in order to identify signi cant mediators.Dimension reduction is performed using the sure independence screening (SIS) method which is built on a correlation learning framework that lters out features that are weakly correlated with the response variable [45].The HIMA joint signi cance test rejects the null hypothesis of no mediation only when both the exposure-mediator () and mediator-outcome effects () are signi cant [25] .
DACT leverages epigenome-wide multiple testing to estimate the proportions of the composite null hypothesis to improve power [26,44].A preliminary step for the DACT method is to create two linear models pertaining to each CpG site: the exposure-mediator association () and the mediator-outcome association ().While HIMA uses a screening technique to reduce dimensionality, DACT does not involve a screening step by default.We performed a pre-screen of CpG sites based on their association with the exposure (PM 10 ) and the outcome (BSID-III scores of neurodevelopment); only CpG sites with p<0.05 for both associations were included in the downstream analyses.(Figure S1).Given previous ndings indicating a negative association between prenatal indoor air pollution exposure and neurodevelopment in the DCHS cohort [5], we chose to additionally lter our sites by only allowing a negative natural indirect effect (NIE) de ned by (acting in the same direction as the association between indoor air pollution exposure and neurodevelopment).As we chose to pre-screen our CpG sites, we used the Efron correction feature of the DACT package to estimate the proportions of the composite null (Figure S2) as opposed to Jin and Cai correction which is recommended if performing epigenome-wide mediation effect testing with DACT [26].Of note, neither of these methods account for correlation between mediators.As such, we calculated Pearson correlation coe cients for all sets of identi ed CpG sites to better understand if and how these CpG sites were correlated with one another.
The gHMA method was developed by Fang et al. (2020) [27] and focuses on genes as functional units as opposed to individual CpG sites.gHMA is primarily composed of three components: 1) linear mediation analysis, 2) nonlinear mediation analysis and 3) an omnibus test of mediation effects.Signi cance testing results for both the linear and nonlinear mediation analysis steps are combined using the gHMA omnibus (gHMA-O) test.As the true relationship between mediators and outcomes are often not well understood in practice, gHMA-O transforms and combines p-values from the linear and nonlinear analyses in order to construct the gHMA-O test statistic, which is used to assess mediation effects at the gene level [27] .CpG sites were annotated by closest gene using the Bioconductor package hiAnnotator (https://bioconductor.org/packages/release/bioc/html/hiAnnotator.html) and the Ensembl gene predictions (ensGene, version of Apr-06-2014; http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/ensGene.txt.gz) as previously described elsewhere [46].P-values for CpG sites which passed the screening step and were tested using DACT and HIMA were corrected for multiple testing using the Benjamini-Hochberg false discovery rate correction (BH FDR) [47].
Due to the fact that gHMA assesses differentially methylated gene regions as opposed to individual CpG sites, gHMA p-values were FDR corrected for the total number of gene regions tested, rather than the number of distinct CpG sites.HIMA and DACT CpG sites that remained signi cant at a false discovery rate of 0.05 were then validated via traditional causal mediation analysis using the function mediate from the R package mediation to obtain estimates of natural indirect effect (NIE), direct effect (DE), total effect (TE), and proportion mediated (PM) [48].
A recent DCHS study identi ed three CpG sites (cg26971411 [SPTBN4], cg00490349 [intergenic], cg15660740 [intergenic]) associated with neurodevelopment measured by BSID-III [14].We also examined these CpG sites in our causal mediation analysis step to investigate whether they mediate the association between prenatal PM 10 exposure and neurodevelopment.
Maternal HIV has been previously linked to neurodevelopment; however, the association between maternal HIV and DNAm is not well understood [49,50].Therefore, we also conducted a sensitivity analysis to determine the effect of including maternal HIV as a potential confounder of the mediatoroutcome association.

Description of Study Participants
The study sample consisted of 142 mother-child pairs with data available for genotype, cord blood methylation, PM 10 concentration, scores for one or more BSID-III domains, and all relevant covariates (Table 1).In total, 48.6% of infants were of self-reported Black African ancestry and 51.4% were of selfreported mixed-ancestry; 40.1% of infants were female.The mean PM 10 concentration was 64.5 mg/m 3 (SD=96.8mg/m 3 ).Mean composite BSID-III scores were 85.14 (SD=8.65)for the cognitive domain, 84.31 for the language domain (SD=12.22),94.04 for the motor function domain (SD=13.74),and 83.72 for the general adaptive behavior domain (SD=13.29).The prevalence of maternal alcohol use was 19.7% and the prevalence of maternal smoking was high, with 40.1% of mothers classed as passive smokers and 33.1% classed as active smokers based on urine cotinine levels.The prevalence of maternal HIV infection was also high with21.1% of mothers testing with a con rmed HIV diagnosis.

CpG-based High-Dimensional Mediation Analysis
After BH FDR adjustment for multiple testing, DACT identi ed a total of 123 distinct CpG sites across the cognitive (35 CpG sites, primary outcome), language (45 CpG sites), motor function (13 CpG sites), and general adaptive behavior (39 CpG sites) domains as signi cant mediators of the association between PM 10 and neurodevelopment (Tables S1-S4).A total of 9 CpG sites were shared between at least two domains (Table S5) and one CpG site (cg26858414 [CDSN]) was shared across the language, general adaptive behavior, and motor function domains.These 123 CpG sites were further examined via causal mediation analysis.Results for our primary outcome (cognitive development) are presented here (Table 2; Figure 2) and results for our secondary outcomes, for which we did not nd a total effect of PM 10 are presented in Supplementary Tables S6-S9.
Of the 35 CpG sites identi ed with DACT for the cognitive domain, 29 demonstrated signi cant natural indirect effects (NIE), signi cant estimates for proportion mediated, and signi cant estimates for total effect (TE) and two CpG sites (cg13690126 [CNKSR1] and cg03234186 [ZNF154]) were also identi ed via DACT for the language domain (Table 2; Figure 2).All effect estimates were multiplied by the interquartile range (IQR) of PM 10 observed in this cohort (58.78 mg/m 3 ) and therefore represent estimated effects per one IQR increase in PM 10 .Estimated proportion mediated (95%-con dence interval) ranged from 0.29 After correction for multiple testing, HIMA did not identify any CpG sites that signi cantly mediated the effects of prenatal PM 10 exposure on neurodevelopment in any domain (Table S10-S13).However, prior to multiple testing correction, one CpG site was identi ed as a signi cant mediator (cg05796992 [LRRK1]); this site was also identi ed with DACT and demonstrated a signi cant NIE estimate (95%con dence interval) of -0.438 (-0.942, -0.0426) per one IQR increase in PM 10 and an estimated proportion mediated (95%-con dence interval) of 0.484 (0.0468, 1.58) in our causal mediation analysis (Figure 2, Table S6, Table S14-S15).
We also conducted a sensitivity analysis to examine the effects of adjusting for maternal HIV status.In this sensitivity analysis, 27 of our 29 CpG sites remained signi cant at a threshold of 0.05 after adjusting for maternal HIV status with only cg16975959 [intergenic] and cg15074838 [HLA-DRA] losing signi cance after adjustment (Table S16-S19).
Additionally, we tested the mediation effects of three CpG sites that were identi ed in a recent DCHS study in association with severe neurodevelopmental delay in the cognitive (cg26971411 [SPTBN4], cg00490349 [intergenic], and cg15660740 [intergenic]), language (cg26971411 [SPTBN4] & cg00490349 [intergenic]), and motor (cg26971411 [SPTBN4] & cg00490349 [intergenic]) domains using causal mediation analysis [14].We did not observe signi cant evidence of mediation in any corresponding BSID-III domain for these CpG sites (Table S25).This discrepancy may be explained by a lack of association between these three CpG sites and prenatal exposure to PM 10 .

Gene-based High-Dimensional Mediation Analysis
Differential methylation in four gene regions (GOPC, RP11-74K11.1, RNMT, and DYRK1A) was identi ed as a signi cant mediator of the association between PM 10 and cognitive development (Table 3).Each of these differentially methylated gene regions were also identi ed for the other domains (Table 3).No CpG sites mapping to any of these four genes were identi ed for the cognitive domain with either of the CpG site-based methods (DACT or HIMA) (Tables S20-S23).
Additionally, we identi ed ve differentially methylated genes (DCAF13, TNN, TAL1, AC011648.1,SPINK2) as signi cant mediators for the secondary outcome "motor domain", however, none of these were found to be signi cant for the other domains (Table 3).
Lastly, we conducted a sensitivity analysis to examine the effects of adjusting for maternal HIV status on our results.After adjusting for maternal HIV status in our gHMA models we did not identify any differentially methylated genes as signi cant mediators using a BH FDR threshold of 0.05 (Table S24).

Discussion
This study of 142 mother-child pairs from a low SES population in South Africa found a total of 29 distinct, differentially methylated DNAm probes to signi cantly mediate the effect of prenatal exposure to PM 10 on neurodevelopment at age two years measured by BSID-III scores.Additionally, we found four differentially methylated gene regions which signi cantly mediate the effect of prenatal PM 10 exposure on neurodevelopment using a gene-based high-dimensional mediation analysis technique.To our knowledge, this study is the rst to examine differential DNAm at individual probes as potential mediators of the association between prenatal PM 10 exposure and neurodevelopment.
A number of prior studies have examined the association between prenatal PM 10 exposure and DNAm as well as the association between DNAm and neurodevelopment [8,9,14-20].While many epigenome-wide association studies (EWAS) have reported differentially methylated CpG sites associated with prenatal air pollution exposure [8,9,51-53], we did not identify any overlap between our ndings and existing ndings.It should be noted that our ndings are not entirely comparable as, per the underlying assumptions of mediation analysis, probes must be associated with both PM 10 exposure and neurodevelopment.
Replication is a common problem in EWAS which often lack robust associations at single CpG sites across cohorts [51].Further research is needed to validate our ndings from high-dimensional mediation analysis.
A recent meta-analysis examining epigenome-wide associations between DNAm at birth and childhood cognitive skills synthesizing data from eight pregnancy cohorts within the Pregnancy and Childhood Epigenetics (PACE) consortium (N=3300) did not nd substantial evidence that differential cord blood DNAm at individual CpG sites is associated with cognitive skills [19].We compared our ndings to those from several EWAS investigating DNAm and cognitive development examined in the PACE study, However, no overlap was identi ed between our ndings and those of previous studies [19,54] .

CpG-based High-Dimensional Mediation Analysis
We identi ed 29 differentially methylated CpG sites as signi cantl mediators of the association between prenatal PM 10 exposure and neurodevelopment in the cognitive domain.Of the 29 CpG sites, differential DNAm at 21 of these CpG sites has been associated with age in EWAS examining DNAm trajectories occurring over the course of childhood [55,56].Differential methylation at three CpG sites (cg23560546 ) has been associated with fetal brain development [57] .Differential methylation at one CpG site (cg16975959 [intergenic]) has been previously identi ed as a mediator of the association between maternal smoking and birth weight [58] (Table S61).
Proportion-mediated estimates for these 29 CpG sites vary, ranging from cg00694520 [KCNE4] with 0.29 (0.014,0.87) to cg05023582 [PRRG2] with 0.54 (0.11,1.56).However, such high PM estimates for each CpG site should be interpreted with caution due to the associated wide con dence intervals, our small sample size, and the fact that causal mediation analysis does not account for correlation between mediators, which we found to be present among these CpG sites (Figure S3).Several of these CpG sites are located within or adjacent to genes known to in uence fetal development and/or neurological outcomes.Herein we discuss CpG sites that map to genes that have been previously linked to neuropsychiatric outcomes.
Cg13690126 is located in CNKSR1, a protein-coding gene with low tissue expression speci city.Defects in CNKSR1 have been linked to syndromic autosomal recessive intellectual disability (ID) [59,60].Najmabadi et al. (2011) [61] speculate its function as a scaffold protein mediating communication between Ras and Rho GTPase signaling pathways which have in turn been shown to play a role in neurodevelopmental disorders [61,62] .Cg07070893 is located in a promotor region for Importin 13 (IPO13), a gene showing tissue enhanced speci city in the brain and skeletal muscle [59,63].IPO13 is associated with agenesis of the corpus collosum and has been implicated in embryonic stem cell survival.IPO13 has been proposed as integral to brain development, particularly for the purposes of neural cell-speci c cargo tra cking [59,64,65] .
Cg23560546 is found in an enhancer region of Death Associated Protein Like 1 (DAPL1), a protein coding gene thought to be involved in early stages of epithelial differentiation and/or apoptosis [59].DAPL1 has been identi ed as a signi cantly differentially methylated region (DMR) in a 2021 study comparing DNAm in peripheral blood cells of toddlers with Down syndrome to neurotypical toddlers [66] .
Cg15007548 is located in the gene body of Tetratricopeptide Repeat, Ankyrin Repeat and Coiled-Coil Domain-Containing 1 (TANC1) [59].TANC1 is a protein-coding gene with low tissue speci city thought to regulate dendritic spines and excitatory synapses [59,63,67] .Dendritic spines are integral to synaptic function and loss of function in dendritic spines has been associated with a number of neurological disorders [68] .Cg26668632 is located in a promoter region for IFNGR1 which belongs to the type II cytokine receptor family and encodes a ligand-binding chain of the gamma interferon receptor (IFN-) [59].Several studies have found that IFN-signaling targets play a role in neuronal development and synaptic activity and a recent study [69] suggests that IFN-signaling is involved in neurodevelopmental disorder etiology [69][70][71].Cg23989635 is located in the rst exon of Cadherin 1 (CDH1), a protein coding gene that has been implicated in neuronal differentiation and synaptic development in the central nervous system [59,63,[72][73][74] .CDH1 downregulation has been proposed to play a role in congenital neurodevelopmental disorders [75].
Cg0060655 occupies the same position as SNP rs147829886 and is found within an intron on the gene FAM207A/SLX9 ribosome biogenesis factor.FAM207A hypermethylation in umbilical cord tissue has been linked to pre-term birth, which in turn is associated with delayed neurodevelopment [59,[76][77][78].
Cg23054321 is located in a promotor-associated region proximal to Leucine Rich Repeat and Ig Domain Containing 3 (LINGO3).LINGO3 is protein coding gene with tissue enriched in the brain [59,63].The LINGO gene family has been found to show increased expression as an embryo develops whereas only low levels of these genes are found in adult brains with only LINGO1 and LINGO3 being detectable [79].Epigenetic changes in LINGO3 have been correlated with depression and a paralog to LINGO3, LINGO1, acts as a negative regulator of a number of processes key to cognitive function [80].The genes associated with the remaining CpG sites do not appear to be as well-represented in neuropathological and neurodevelopmental literature.Additional research is needed to elucidate the roles of each of these differentially methylated sites on neurodevelopment.

Gene-based High-Dimensional Mediation Analysis
gHMA identi ed four differentially methylated gene regions associated with BSID-III neurodevelopmental scores (GOPC, RP11-74K11.1, DYRK1A, RNMT; Table 3).Golgi-Associated PDZ and Coiled-Coil Motif-Containing Protein (GOPC) is a protein coding gene that has been linked to regulation of GRID2 gene expression which has been shown to impact neurodegeneration [59,81].RP11-74K11.1 is a pseudogene on chromosome 12 which is most highly expressed in the brain, particularly the cerebellum.F studies have investigated RP11-74K11.1, [59,63,82]; and therefore additional research is needed to understand its role on neurodevelopment.Dual Speci city Tyrosine Phosphorylation Regulated Kinase 1A (DYRK1A) is located on chromosome 21 and encodes a protein kinase.DYRK1A has been strongly linked to brain development and function across the life course [63,83].Decreased expression of DYRK1A has been found in patients with autism spectrum disorder (ASD), while elevated expression has been linked to Down syndrome (DS) [83].Lastly, RNA Guanine-7 Methyltransferase (RNMT) is a protein coding gene found on chromosome 18; it is known to play a role in RNA-binding and mRNA-methyltransferase activity [59].The role of RNMT in neurodevelopment is not well-documented and further research is needed to better understand if such a link exists.
Although no individual CpG sites located in these gene regions were identi ed using DACT or HIMA, it is possible that differential methylation on the gene region scale plays a crucial role in neurodevelopment.
Probes contained within or proximal to gene regions identi ed as signi cant mediators using the gHMA method were eliminated from the DACT analysis pipeline due to lack of signi cant exposure-mediator and/or mediator-outcome associations (Table S16-S19).The discrepancy between differentially methylated gene regions identi ed with gHMA and gene regions associated with CpG sites identi ed with DACT may be attributable to interaction effects between proximal, differentially methylated DNAm probes that were not captured via DNAm probe-speci c mediation analysis.As both DACT and HIMA account for correlated CpG sites, HIMA through its use of single multiple mediation models and DACT through integration of Efron's null empirical null inference framework, these methods may be unable to detect such interacting CpG sites on an individual scale [25,26].
Our sensitivity analysis, which evaluated the effect of including maternal HIV status as a potential confounder of the mediator-outcome association, showed major differences between gHMA ndings with and without adjustment for maternal HIV status.Following adjustment for maternal HIV status, we were unable to identify any differentially methylated gene regions as signi cant mediators of the association between prenatal PM 10 exposure and neurodevelopment in any domain (Table S24).However, in our CpGbased analyses, adjustment for HIV did not greatly alter our ndings as both the magnitude and direction of estimated IDE remained consistent (Table S24).Although maternal HIV status has been linked to neurodevelopmental delay in several studies [49,50], existing literature is sparse regarding the role of maternal HIV status in DNAm pathways and additional research is needed to better understand if and how maternal HIV status impacts the association between DNAm and neurodevelopment.

Strengths and limitations
This study has many strengths.It adds to the limited literature dedicated to investigating epigenetic modi cation and associated outcomes in LMIC settings, particularly related to air pollution exposure.
Additionally, the mothers and infants involved in the DCHS are of majority Black African or mixed ancestry, two populations underrepresented in epigenetic and genetic literature at large.To account for population strati cation, which may play a role in DNAm variation, our study incorporated genome-wide genotype data which is the preferred approach to account for genetic ancestry.To our knowledge, this is the rst study to investigate DNAm as a mediator of the association between prenatal indoor air pollution exposure and neurodevelopment.A unique feature of this study is our use of three different methods of high-dimensional mediation analysis techniques.As no single high-dimensional mediation analysis method has been deemed the gold standard, we employed several methods of high-dimensional mediation analysis in order to compare.Although we did not see good concordance between the methods (Table 2; Table S14-S15), we recognize that the probe-speci c methods (HIMA and DACT) are differentially powered.A key advantage of the DACT method is that it is better powered than the joint signi cance test used in HIMA, which tends to be overly conservative [26].However, additional research is needed to validate these methods and to better understand why ndings from these three mediation methods did not demonstrate substantial overlap.
There are several limitations.Our analyses were constrained by a small sample size (N=142), which may have limited the statistical power to detect mediation effects in neurodevelopment across domains.Our sample size may have also limited statistical power to detect underlying associations (total effects) between prenatal PM 10 exposure and neurodevelopment.Secondly, as DNAm signatures are tissue and cell-type speci c, our ndings are limited in that we investigated DNAm in cord blood and not brain tissue.Although brain tissue DNAm data would have been more appropriate for a study of neurodevelopment, cord blood is far more feasible to collect from living study participants.

Conclusion
Differential DNAm was found to signi cantly mediate the association between prenatal exposure to PM 10 and neurodevelopment as measured by BSID-III at two years of age in the DCHS.Twenty-nine differentially methylated CpG sites as well as four differentially methylated gene regions were identi ed as signi cant mediators of this association in the DCHS cohort.Due to our small sample size and the general lack of consensus on a gold standard high dimensional mediation analysis tool in the scienti c community, this study should be regarded as a preliminary investigation.Nevertheless, these ndings are novel and encourage further research to replicate and expand these results so as to better understand how DNAm and other biological pathways help explain the impact of air pollution exposure on neurodevelopment.

Declarations
Ethics approval and consent to participate   [1] Missingness in this table is resultant of our DACT filtering process which is described in more detail in the methods section.35 CpG sites were considered for multiple testing correction in the cognition domain, 45 in the language domain, 39 in the general adaptive behavior domain, and 13 in the motor function domain.
ii No CpG sites with both a significant indirect effect and significant total effect were identified in the language, motor, or general adaptive behavior domains.Two significant CpG sites identified in the cognitive domain overlap with two in the language domain and one in the general adaptive domain and therefore, information two CpG sites identified with DACT are depicted for the language domain and one for the general adaptive domain alongside the cognition domain.

Table 1 .
Study Characteristics of the DCHS participants.

Table High
-dimensional mediation analysis for the association between PM 10 (exposure), DNAm (mediator) and (outcome) using DACT for CpG sites identified via formal causal analysis with a significant indirect effect and total effect.table includes raw and p-values from the DACT method for CpG identified across domains restricted to those for which we found with nominally significant indirect effects and total effects (domain-dependent) via causal mediation analysis following highdimensional mediation analysis (BH FDR <= 0.05).

Table 3 .
High-dimensional mediation analysis for the association between PM 10 (exposure), DNAm and neurodevelopment (outcome) using gHMA.Presented associations were for at least one neurodevelopmental domain (BH FDR <0.05).FDR values with bold values indicate significance at a threshold of 0.05.