Novel Epigenetic Clock for Fetal Brain Development Predicts Fetal Epigenetic Age for iPSCs and iPSC-Derived Neurons.

1 Induced pluripotent stem cells (iPSCs) and their differentiated neurons (iPSC-neurons) are a widely 2 used cellular model in the research of the central nervous system. However, it is unknown how well 3 they capture age-associated processes, particularly given that pluripotent cells are only present during 4 the early stages of mammalian development. Epigenetic clocks utilize coordinated age-associated 5 changes in DNA methylation to make predictions that correlate strongly with chronological age, and 6 is has been shown that the induction of pluripotency rejuvenates predicted epigenetic age. As existing 7 clocks are not optimized for the study of brain development, to investigate more precisely the 8 epigenetic age of iPSCs and iPSC-neurons, here, we establish the fetal brain clock (FBC), a bespoke 9 epigenetic clock trained in prenatal neurodevelopmental samples. Our data show that the FBC 10 outperforms other established epigenetic clocks in predicting the age of fetal brain samples. We then 11 applied the FBC to DNA methylation data of cellular datasets that have profiled iPSCs and iPSC-derived 12 neuronal precursor cells and neurons and find that these cell types are characterized by a fetal 13 epigenetic age. Furthermore, while differentiation from iPSCs to neurons significantly increases the 14 epigenetic age, iPSC-neurons are still predicted as having fetal epigenetic age. Together our findings 15 reiterate the need for better understanding of the limitations of existing epigenetic clocks for 16 answering biological research questions and highlight a potential limitation of iPSC-neurons as a 17 cellular model for the research of age-related diseases as they might not fully recapitulate an aged 18 phenotype. calibrated clock, iPSCs be early prenatal. the FBC to five datasets of iPSCs and iPSC-derived NPCs and neurons, we found this to be the case, where iPSCs were estimated as having a mean age of 72.1 dpc, fitting our hypothesis that they reflect at least first trimester developmental stages. These results align with studies that have reported rejuvenation effects on the transcriptome, telomeres and mitochondria of iPSCs following reprogramming [26 – 28]. In addition, we profiled the effect on predicted epigenetic age following the differentiation of iPSCs towards neurons reporting that this increases the mean predicted age to 84.3 dpc. This age coincides with fetal neurogenesis and suggests that while differentiation does induce an aging process, it does not accelerate iPSC-neurons to a postnatal state. DNAm in for precise predictions a bespoke is required. To this we developed the FBC, a robust predictor of prenatal age in human fetal brain samples. Using this clock to assess the epigenetic age of iPSCs and differentiated neurons, we found that iPSCs and derived NPCs and neurons reflect prenatal developmental stages. Our findings question the suitability of the iPSC-neurons for the study of aging associated processes.


1
Induced pluripotent stem cells (iPSCs) and their differentiated neurons (iPSC-neurons) are a widely 2 used cellular model in the research of the central nervous system. However, it is unknown how well 3 they capture age-associated processes, particularly given that pluripotent cells are only present during 4 the early stages of mammalian development. Epigenetic clocks utilize coordinated age-associated 5 changes in DNA methylation to make predictions that correlate strongly with chronological age, and 6 is has been shown that the induction of pluripotency rejuvenates predicted epigenetic age. As existing 7 clocks are not optimized for the study of brain development, to investigate more precisely the 8 epigenetic age of iPSCs and iPSC-neurons, here, we establish the fetal brain clock (FBC), a bespoke 9 epigenetic clock trained in prenatal neurodevelopmental samples. Our data show that the FBC 10 outperforms other established epigenetic clocks in predicting the age of fetal brain samples. We then 11 applied the FBC to DNA methylation data of cellular datasets that have profiled iPSCs and iPSC-derived 12 neuronal precursor cells and neurons and find that these cell types are characterized by a fetal 13 epigenetic age. Furthermore, while differentiation from iPSCs to neurons significantly increases the 14 epigenetic age, iPSC-neurons are still predicted as having fetal epigenetic age. Together our findings 15 reiterate the need for better understanding of the limitations of existing epigenetic clocks for 16 answering biological research questions and highlight a potential limitation of iPSC-neurons as a 17 cellular model for the research of age-related diseases as they might not fully recapitulate an aged Introduction 21 Induced pluripotent stem cells (iPSCs) offer a unique cellular system to investigate disease in human-22 derived cells. iPSCs are obtained by taking tissue samples (e.g. skin or blood) from patients and treating 23 the cells with a set of core pluripotency transcription factors that reprogram the cells to a pluripotent 24 state [1]. Established iPSC lines have the capacity to be further differentiated into any cell type, 25 including neurons, when treated with the appropriate factors [2][3][4]. This is of particular interest for 1 neuroscience, as the only alternative cellular model for human neurons are immortalized cell lines. 2 However, as immortalized cell lines retain some physiological properties of the cancerous cells they 3 were derived from, (for example undergoing an infinite number of cell divisions) [5] they do not purely 4 represent the neuronal phenotype. iPSC-derived neurons (iPSC-neurons), on the other hand, express 5 appropriate morphological and neurophysiological properties of neurons and subject to different 6 protocols can be differentiated into a wide range of specific neuronal subtypes [6]. iPSCs and their 7 neuronal derivatives have been widely used to research disorders of the central nervous system, 8 including developmental disorders such as autism and schizophrenia and age-related diseases such as 9 Alzheimer's disease (AD) or Parkinson's disease. However, it is unknown how well iPSCs and especially 10 iPSC-neurons capture age-associated processes, which are fundamental to the study of age-related 11 diseases. Of specific relevance is the fact that pluripotent cells only occur during the early stages of 12 mammalian development and the effect of differentiation from iPSCs towards neurons on the 13 developmental or aging trajectory of the cellular model[7] has yet to be adequate profiled.

15
Epigenetic mechanisms, such as DNA methylation (DNAm), are chemical processes that stably regulate 16 gene expression, and while they are sensitive to environmental stimuli, they also underpin key 17 developmental processes [8,9]. As a consequence, they are not only dynamic over the life course, but 18 are dynamic in a consistent manner across individuals [10]. There has been much interest and success 19 in capitalizing on these patterns of epigenetic variation to predict the age of an individual from a 20 biological sample. Chronological age predictors based on DNAm profiles, known as "epigenetic clocks" 21 or "DNAm clocks", have become commonplace in the epigenetic literature, as they can predict the 22 "epigenetic age" of a sample, which correlates strongly with chronological age. Epigenetic age, defined 23 here as age predicted by an epigenetic clock, correlates strongly with chronological age, albeit not 24 perfectly, and there is interest in whether the deviations from this prediction, referred to as age 25 acceleration, are meaningful in the context of disease [11,12]. The most well-known epigenetic clock 1 is the Horvath multi-tissue clock (MTC) which was developed using a large number of samples (n > 2 8000) from 51 different tissues and cell types [13]. While in general the MTC generates reliable 3 predictions of chronological age for most samples, there is increasing recognition that its performance 4 is dependent on the characteristics of the training dataset (e.g. tissue/cell type and age range) and 5 greater accuracy can be achieved with clocks trained on more refined sample sets, such as those from 6 a single tissue [14,15]. To this end, a number of new DNAm clocks have been established based on 7 specific tissue types, like whole blood [16] or brain tissue [14], which demonstrate more accurate 8 predictions within the specified tissue. A less established refinement of epigenetics clocks is the 9 application to specific developmental stages, with embryonic or fetal samples, in particular, either 10 omitted from or underrepresented in most existing clocks. An exception here are algorithms that 11 predict gestational age (GA) of newborns, developed using pre-and perinatal DNAm data from blood 12 samples [17] or placental samples [18]. While existing epigenetic clocks have been shown to accurately 13 predict age in either postnatal brain samples (predominantly adult and older age) or non-brain 14 prenatal samples, these tools have not been through tested on prenatal brain samples and it is 15 unknown whether they are able to delineate the earliest stages of brain development.

17
Previous analysis applying the MTC to DNAm data generated from iPSCs and their corresponding 18 primary cells from adult donors found that despite the primary cells predicting the donor's 19 chronological age accurately, the induction of pluripotency reversed the aging process, with the 20 matched iPSCs predicted as postnatal but close to zero [13]. As human pluripotent cells only occur 21 during prenatal development, we hypothesize that existing clocks are not sensitive enough to 22 accurately predict iPSCs at prenatal developmental stages. The inability to accurately estimate age 23 during this crucial stage of neurodevelopment limits our ability to profile changes in epigenetic age 24 induced by the differentiation of iPSCs towards neurons using already established DNAm clocks. Here 25 we present a novel DNAm clock developed using prenatal brain samples that accurately predicts fetal 1 age, outperforming other DNAm clocks in neurodevelopmental samples. We then apply our clock to 2 iPSCs, iPSC-derived neuronal progenitor cells (NPCs) and neurons to characterize the epigenetic age 3 of these cellular models before and throughout differentiation.

6
Data pre-processing and quality control. 7 All statistical analyses were performed using R version 3.5.2 (https://www.r-project.org/). All datasets 8 of which raw data was available were pre-processed by our group following a standard quality control 9 (QC) and normalization pipeline as described before [14] using either the R package wateRmelon [19] 10 or bigmelon [20]. Briefly, samples with low signal intensities or incomplete bisulphite conversion were 11 excluded prior to applying the pfilter() function from the wateRmelon package excluding samples with 12 >1 % of probes with a detection P value >0.05 and probes with >1 % of samples with detection P value 13 >0.05. This was followed by the exclusion of probes known to be affected by SNPs or known to cross-

18
To develop and profile the performance of our fetal brain clock (FBC), we collated a dataset of 258 19 fetal brain samples (see Table S1) of which 194 were processed by our lab and 64 are a subset (age < dataset where DNAm was quantified using the Illumina EPIC DNA methylation array. To harmonize 1 the age variable across datasets, age was converted into days post-conception (dpc), as it represents 2 the most precise unit of age available in the datasets. Age provided as weeks post-conception was 3 transformed to days post-conception by dividing by 7, and age reported in (negative) years was 4 transformed to days post-conception by multiplying by 365 and adding 280.

6
Training and testing datasets.

7
To create two separate datasets for the purpose of training and testing the FBC, 75% of the samples 8 from each dataset were randomly assigned into a training dataset (n = 193, age range = 37-184 dpc, 9 age median = 99 dpc), while the remaining 25% were collated into the testing dataset (n = 65, age 10 range = 23-153 dpc, age median = 99 dpc) ( Figure S1), such that there was no overlap between the 11 training and testing dataset. Of note, a few samples (15 out of 258) would be actually defined as 12 embryonic (GA < 63 dpc) and not fetal. To simplify the FBC development, only probes available in all 13 cohorts and with complete data after QC were taken forward (n = 385,069 probes).

16
To evaluate the performance of the DNAm clocks in adult brain samples we utilized the Brains for 17 Dementia Research (BDR) cohort DNAm data [14]. Briefly, these data consist of 1,221 samples from 18 632 donors (age range 41-104 years, median = 84 years), with DNA extracted from the prefrontal 19 cortex (n = 610) and occipital cortex (n = 611). DNA methylation was quantified using the Illumina EPIC 20 DNA methylation array, and were pre-processed using our group's standard QC pipeline as described 21 in [14].

24
Five different DNAm datasets profiling iPSCs, iPSC-derived NPCs and iPSC-derived neurons were used 1 to characterize epigenetic age of the neuronal cell model, details of which can be found in Table S1.
2 Three of these datasets (Imm, Price, Bhinge) were generated by our lab, where DNA methylation was 3 quantified using the Illumina EPIC DNA methylation array. These were supplemented by two publicly 4 available datasets, downloaded from GEO (Sultanov, GSE105093, and Fernández-Santiago, GSE51921) 5 consisting of Illumina 450K DNA methylation array data [3,4]. References describing the origin of cell 6 lines and the different methods used for cell culture and differentiation are listed in Table S1. Pre-7 processing and QC for the Fernández-Santiago dataset was not performed in our lab as no raw data 8 was available. 9 10 Fetal brain clock development.

11
To develop the fetal brain clock we applied an elastic net (EN) regression model, using the approach 12 described by Horvath [13], regressing chronological age against DNAm level of all available probes.

13
The EN algorithm selects a subset of DNA methylation probes that together produce the optimal with reported chronological age and root mean squared error (RMSE). To investigate potential effects 5 of sex on the predicted epigenetic age a linear model was fitted in the testing dataset with FBC 6 predicted epigenetic age as dependent variable, chronological age and sex as main effects and an 7 interaction of chronological age and sex.

9
Comparison of cellular states.

10
The FBC was applied to DNAm data for all cellular samples available. To test for differences in 11 predicted epigenetic age between cell stages within each dataset, either two sample t-tests or ANOVA 12 followed by Tukey HSD multiple comparison (when three cell stages were available), were used. To 13 combine results across all datasets, a mixed effects linear model was fitted with predicted epigenetic 14 age as the dependent variable, a fixed effect for cell stage represented as two dummy variables 15 contrasting NPCs vs iPSCs and iPSC-neurons vs iPSCs as and a random effect for dataset.

18
Fetal brain clock outperforms existing DNAm clocks at predicting age of prenatal brain samples.

19
We applied EN regression to genome wide DNAm data from a set of prenatal brain samples (n = 193; 20 Table S1 and Figure S1), to develop the fetal brain clock (FBC). A subset of 107 DNAm probes were 21 assigned non-zero coefficients and therefore were selected as the basis of the FBC (Table S2). We 22 found no overlap in the DNAm sites selected for the FBC and DNAm sites used in the other clocks we 23 considered. Testing the FBC clock in an independent test dataset of fetal brain samples (Table S1 and 24 Figure S1) to evaluate its performance we found a strong linear relationship between chronological 1 and predicted prenatal age (r = 0.80; Figure 1A)

4
These clocks were selected as they represent either the most well-established algorithm with the 5 broadest applicability (MTC) or were specifically developed to predict pre-and perinatal gestational 6 ages, albeit in non-brain tissue (GAC and CPC). Of note, the MTC only predicted 27 fetal brain samples 7 (41.2%) as prenatal (dpc < 280) with a very weak correlation between chronological and predicted age 8 (rMTC = 0.06). This correlation is much weaker than those reported in the original manuscript when 9 Horvath tested the clock in adult samples [13], highlighting the challenges with extrapolating clocks 10 to samples which were not well represented in the model development. By comparison, the GAC and 11 CPC perform better than the MTC, although they have smaller correlation coefficients (rGAC = 0.52 and 12 rCPC = 0.76) and are associated with a larger error (RMSEGAC = 21.32 and RMSECPC = 60.08) than the FBC.

13
Interestingly, while the predictions from the GAC are more precise, it's not as effective at ranking the 14 samples by age as the CPC. Taken together, these results demonstrate that our novel FBC outperforms 15 existing clocks at predicting age in fetal brain samples, and therefore is the optimal tool available to 16 profile the epigenetic age in models of neuronal development. When applying clocks to the training 17 data, the three established clocks produce similar correlations and RMSEs as in the testing data. As 18 expected, the predictions of the FBC in the training data are more accurate than the predictions in the 19 testing data reflecting overfitting of the model (Figure S2). Given our previous finding of divergent, 20 sex-specific age trajectories at multiple DNAm sites during prenatal development [23], we tested 21 whether the FBC performed differently between males and females in our testing dataset. Although 22 this analysis initially indicated a significant difference in the correlation with age between males and 23 females (PSex = 0.0007), on closer inspection we suspected that this was likely driven by outliers.

24
Indeed, a sensitivity analysis excluding the two samples with youngest and oldest predicted ages 25 produced a non-significant result (PSex = 0.081).

1
Fetal and gestational age clocks are not able to predict adult ages in adult brain tissue 2 All four clocks were additionally compared using a dataset where DNAm was profiled from adult brain 3 samples ( Figure S3). As expected, the FBC performs poorly in this sample set, with all samples 4 predicted as prenatal, while the MTC performs the best (rMTC = 0.65, RMSEMTC = 20.11 years) as it is 5 the only clock we considered that was developed using adult samples. As with the FBC, the GAC and 6 CPC fail to produce predictions of post-natal age again reflecting the fact that they were also 7 constructed using data from pre-or perinatal samples. 8 9

10
Having demonstrated that our novel FBC is the optimal clock to profile age in prenatal brain samples, 11 we applied it to DNAm data from five cellular studies to determine the neuronal development 12 trajectory of iPSCs differentiating towards neurons. All samples were estimated to have a fetal 13 epigenetic age, regardless of cell stage, cell line origin or differentiation protocol. Performing 14 statistical comparisons of the estimated ages between iPSCs and neurons, we observed significant 15 differences for all datasets (Figure 2). For the Imm dataset, which also included proliferative NPCs in 16 comparison to postmitotic neurons, we additionally found a significant difference between NPCs and 17 iPSC-neurons (Δmean = 20.0 dpc, P = 0.00039), but not between iPSCs and NPCs (Δmean = 10 dpc, P = 18 0.24). Combining the data across the five studies, the observed means of the three cell stages were 19 meaniPSC = 72.14 dpc, meanNPC = 68.5 dpc and meaniPSC-neurons = 84.28 dpc (Figure 3). Using a mixed 20 effects model to meta-analyze across studies, we found that iPSC-neurons were predicted to have a 21 significantly advanced epigenetic age compared to iPSCs (Δmean = 12.14 dpc, P = 6.94E-9) but no 22 significant difference was observed between iPSC and NPCs (Δmean = -3.64 dpc, P = 0.39).

1
In this study we established a novel epigenetic clock, the fetal brain clock (FBC), to specifically profile 2 the earliest stages of human neurodevelopment and applied it to determine the epigenetic age of 3 iPSCs and iPSC-derived NPCs and neurons. Given the lack of fetal brain samples in the development of 4 existing DNAm clocks prior to this study, there was no tool optimized for estimating the age of fetal 5 brain samples from DNAm data limiting the ability to characterize neuronal models or indeed any 6 model of neuronal development. We showed that, in an independent test dataset, the FBC generates 7 predictions that correlate strongly with chronological age in prenatal brain samples. Furthermore, it 8 outperforms both a pan-tissue epigenetic clock (Horvath's MTC), and epigenetic clocks focused on the 9 same developmental stage, but based on DNAm profiled in different tissues (Knight's GAC and Lee's 10 CPC) [13,17,18]. The FBC outperforms these clocks using either correlation or error statistics (RMSE), 11 indicating that it is not only better at ranking the samples, but it generates more precise estimates. In 12 contrast, the FBC, performed poorly in an adult brain dataset. Altogether, this reinforces the findings 13 of previous studies that have also demonstrated that the applicability of DNAm clocks is dependent 14 on the characteristics of the dataset(s) they were trained on, with the tissue and age range of the 15 training samples of particular relevance [14,15]. More specifically, we note that while the accuracy of 16 a DNAm clock is typically decreased in tissues not represented in its training data, clocks are limited 17 to predicting ages represented in the training data. If the testing sample lies outside of the age range 18 of the training data, the clock is unable to provide an appropriate prediction, suggesting that, in 19 general, age range is more important than tissue when training a clock. 20 21 Previous use of epigenetic clocks has shown that the predicted epigenetic age of iPSCs is significantly 22 younger than the somatic tissue the cells originate from and the chronological age of the donor at 23 sample donation [13]. This highlights that the induction of pluripotency reprograms the epigenome, 24 including at the loci used in the clock algorithm, ultimately leading to a younger predicted epigenetic 25 age. However, in these analyses the predicted ages remained postnatal, which is unexpected as 1 human pluripotent cells only occur during the early stages of human development and hence we 2 hypothesized that, with an adequately calibrated clock, iPSCs would be expected to be estimated as 3 early prenatal. Applying the FBC to five datasets of iPSCs and iPSC-derived NPCs and neurons, we 4 found this to be the case, where iPSCs were estimated as having a mean age of 72.1 dpc, fitting our 5 hypothesis that they reflect at least first trimester developmental stages. These results align with 6 studies that have reported rejuvenation effects on the transcriptome, telomeres and mitochondria of 7 iPSCs following reprogramming [26][27][28]. In addition, we profiled the effect on predicted epigenetic 8 age following the differentiation of iPSCs towards neurons reporting that this increases the mean 9 predicted age to 84.3 dpc. This age coincides with fetal neurogenesis and suggests that while 10 differentiation does induce an aging process, it does not accelerate iPSC-neurons to a postnatal state.

11
Of note, Mertens and colleagues found that, while iPSCs lose age related transcriptomic signatures, 12 induced neurons (iNs; neurons directly reprogrammed from fibroblasts) keep their specific aging 13 signatures [27]. Therefore, it would be interesting to apply our FBC to iNs, iPSCs, iPSC-neurons and 14 their corresponding somatic tissues to verify whether age associated methylation differences are also 15 preserved in iNs. Altogether, our results indicate that iPSC-neurons may be limited when researching 16 age-related diseases, like Alzheimer's disease or other dementias, as many molecular processes 17 related to an aging phenotype, may not be recapitulated.

19
While a strength of our study is the development of a bespoke clock to optimally profile prenatal age 20 in human samples obtained during fetal development, the FBC has in all datasets tested, only 21 generated prenatal estimates. Despite this, we are confident that the FBC correctly predicts fetal 22 epigenetic ages for the cellular data as the predictions for both iPSC and iPSC-neurons are comfortably 23 contained within the training data age range. This indicates that the predictions are not induced purely 24 by saturation of the coefficients. In contrast, estimates for the adult brain dataset did lie at the 25 extreme ends, bringing doubt on the meaningfulness of the predictions, an observation that is 1 confirmed by the lack of correlation with chronological age.

3
In summary, we demonstrate that established DNAm clocks struggle to capture changes in epigenetic 4 age during neurodevelopment and for precise predictions a bespoke clock is required. To this end, we 5 developed the FBC, a robust predictor of prenatal age in human fetal brain samples. Using this clock 6 to assess the epigenetic age of iPSCs and differentiated neurons, we found that iPSCs and derived 7 NPCs and neurons reflect prenatal developmental stages. Our findings question the suitability of the 8 iPSC-neurons for the study of aging associated processes.   Fetal brain clock outperforms other DNAm clocks when applied to neurodevelopmental samples. Shown are scatterplots comparing chronological age (x-axis; days post-conception) against (y-axis; days postconception) predicted epigenetic age calculated using A Fetal Brain Clock (FBC) B Horvath's multi tissue clock (MTC); C Knight's Gestational Age Clock (GAC); D Lee's Control Placental Clock (CPC); in an independent fetal brain sample (n = 65). The black line indicates the identity line of chronological and predicted epigenetic age and represents a perfect prediction. Two statistics were calculated to evaluate the precision of each DNAm clock: Pearson's correlation coe cient (r) and the root mean squared error (RMSE).

Figure 2
Comparisons of predicted epigenetic age from fetal brain clock between iPSC differentiation states. Boxplots comparing the distribution of predicted epigenetic age (days post-conception) separated by cellular stage, where each panel represents a different dataset. P values of Tukey HSD corrected ANOVA for the Imm dataset and two-sample t-tests for Price, Fernández-Santiago and Sultanov datasets are given. F. -S. = Fernández-Santiago. Figure 3 iPSCs are signi cantly younger than iPSC induced neurons using age estimated by the FBC. Boxplots of predicted epigenetic age calculated using the FBC where samples are grouped by cell stage (n = 82, 30 iPSCs, 4 NPCs, 48 iPSC-neurons) and colored by dataset. P values from mixed effects model are given for differences between iPSCs and NPCs (non-signi cant) and iPSC and neurons. F.-S. = Fernández-Santiago.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. AdditionalFile5FigureS3.pdf AdditionalFile4FigureS2.pdf AdditionalFile3FigureS1.pdf AdditionalFile2TableS2.pdf AdditionalFile1TableS1.pdf