LINE-1 and Alu methylation signatures in autism spectrum disorder and their function in the regulation of autism-related genes


 BackgroundLong interspersed nucleotide element-1 (LINE-1) and Alu elements are retrotransposons whose abilities cause abnormal gene expression and genomic instability. Several studies have focused on DNA methylation profiling of gene regions, but the locus-specific methylation of LINE-1 and Alu elements has not been identified in autism spectrum disorder (ASD). MethodsDNA methylation age was predicted using Horvath’s method. We interrogated locus- and family-specific methylation profiles of LINE-1 and Alu elements (22,352 loci) in ASD blood using publicly-available Illumina Infinium 450K methylation datasets from heterogeneous ASD (n = 52), ASD with 16p11.2 del (n = 7), and ASD with Chromodomain Helicase DNA-binding 8 (CHD8) variants (n = 15). The differentially methylated positions of LINE-1 and Alu elements corresponding to genes were combined with transcriptome data from multiple ASD studies.Results We identified the epigenetic age acceleration significantly decelerated in ASD children over the age of 11 years. We further interrogated locus- and family-specific methylation profiles of LINE-1 and Alu elements (22,352 loci) in ASD blood using publicly-available Illumina Infinium 450K methylation datasets from heterogeneous ASD (n = 52), ASD with 16p11.2 del (n = 7), and ASD with Chromodomain Helicase DNA-binding 8 (CHD8) variants (n = 15). DNA methylation profiling revealed LINE-1 and Alu methylation signatures in each ASD risk loci by which global methylation were notably hypomethylated exclusively in ASD with CHD8 variants. When LINE-1 and Alu elements were clustered into subfamilies, we found methylation changes in a family-specific manner in L1P, L1H, HAL, AluJ, and AluS families in the heterogeneous ASD and ASD with CHD8 variants. Interesting, our results showed that LINE-1 and Alu methylation within target genes is inversely related to the expression level in each ASD variant. Finally, we demonstrate the potential for LINE-1 and Alu methylation signatures to predict ASD individuals from non-ASD. ConclusionsThe DNA methylation signatures of the LINE-1 and Alu elements in ASD, as well as their functional impact on ASD-related genes, have been identified. If confirmed in future larger studies, these finding may contribute to the identification of epigenomic biomarkers of ASD in those with high risk of ASD.

subsequently identi ed these genes that were reproducibly differentially expressed in transcriptome data obtained from multiple ASD cohorts. Finally, we used logistic regression to assess the accuracy of a unique LINE-1 and Alu methylation signatures in discriminating ASD with a genetic variant from controls.

Data collection
Differentially methylated retrotransposon loci were identi ed in publicly available Illumina In nium 450K datasets through GEO DataSets: http://www.ncbi.nlm.nih.gov/gds (69): GSE113967 (63). In this dataset, ethical approval was granted by the Research Ethics Boards of the respective institutions (University of Michigan SickKids, Holland Bloorview Kids Rehabilitation Hospital, Western University, McMaster University) (63). Data were collected from the heterogeneous ASD (n = 52), ASD with con rmed typical 600 Kbp deletion in 16p11.2 del (n = 7), ASD with con rmed de novo CHD8 sequence variants (n = 15), and age-matched controls (non-ASD) (n = 48) ( Table 1). Validation was performed in a cohort of genome-wide DNAm pro ling of post-mortem brain tissue in the subventricular zone of the lateral ventricles from ve individuals with ASD and ve without (GSE131706) (33) ( Table 1).
Genes with differently methylated loci were analyzed in publicly available gene expression datasets from publicly available datasets accessed via GEO using the following inclusion criteria: (1) the study must include ASD cases and controls; (2) the study must use microarray/RNA-seq technology; and (3) the study must use blood or post-mortem brain tissues. Finally, we obtained seven ASD studies, four of which used blood and three of which used post-mortem brain tissues (see Additional le 1). Table 1 Characteristics of the In nium450K datasets in the present study including ASD and non-ASD. The numbers of patients for each group are given for each gender. The ages for each group are given as means ± standard deviation.

Groups
Gender Age

Differential methylation of retrotransposon subfamilies
Methylation datasets were normalized using the single-sample normalized (ssNoob) method in min package (72). CpG sites within retrotransposons were identi ed using Illumina In nium 450K annotation (23). To identify the variant-associated differential methylation of REs, probes with single nucleotide polymorphisms (SNPs) located at or within 10 base pairs of the target CpG site were included in the analysis. The CpG sites were mapped to LINE-1, Alu, half-L1 (HAL1), fossil Alu monomer (FAM), free right Alu monomer (FRAM), and free left Alu monomer (FLAM). Due to the evolution age of the REs, LINE-1 elements were clustered into oldest (L1M, mammalian-wide), intermediate (L1P, primate-speci c), and youngest (L1HS, human-speci c and L1PA, primate-ampli ed). Concomitantly, Alu elements were categorized into AluJ (oldest), AluS (intermediate) and AluY (youngest).
Mean β value across all loci of REs was calculated as global DNAm in non-ASD, ASD with 16p11.2 del, and ASD with CHD8 variants. Differential methylation of LINE-1 and Alu subfamilies between 1) non-ASD vs ASD, 2) non-ASD vs ASD with 16p11.2 del, and 3) non-ASD vs ASD with CHD8 variants, were identi ed. Differentially methylated positions (DMPs) to ASD were examined in the validation dataset. DMPs were identi ed in ASD, ASD with 16.p11.2 del, ASD with CHD8 variants and non-ASD with 16.p11.2 del, by two-tailed t-test with correction for false discovery rate (FDR) using the Benjamini-Hochberg (BH) method (73) and signi cance de ned as P FDR ≤ 0.05. To nd the unique DMPs of each data set, the signi cant loci from 1) to 3) comparisons were computed to create Venn diagrams (https://bioinfogp.cnb.csic.es/tools/venny/).

Differential gene expression analysis
The expression data of ASD studies were obtained from the GEO DataSets. The data from each study were analyzed separately using the Multiple Experiment Viewer (MeV) program (microarray software suite) (74). Firstly, the data were ltered using a 70% cut-off lter to remove probes that were missing in > 30% of samples. The available data were then used for the identi cation of differentially expressed genes (DEGs) in ASD vs non-ASD cohort by using the Signi cance Analysis of Microarrays (SAM). The FDR and q-value less than 5% were considered as signi cantly DEGs.
RNA-sequencing (RNA-seq) data were obtained from the Sequence Read Archive database and re-analyzed using the Galaxy platform (https://usegalaxy.org/) (75). The quality control of RNA-seq data was assessed by fastp tool (76). The cleaned reads were then mapped to the human reference genome (GRCh38/hg38) using HISAT2 (77) and quanti ed using the Subread package FeatureCounts (78). Differential expression analysis was performed using the DESeq2 package (79). The read counts were normalized using the median ratio method of the DESeq2 and the remove unwanted variation (RUV) tool (77). The genes with a p-value (p) with Benjamini-Hochberg correction of less than 0.05 were considered signi cant.

Statistical analyses
Differentially methylated loci were identi ed by two-tailed t-tests with signi cance de ned as P FDR ≤ 0.05 by the BH method or p ≤ 0.05 for validation data due to low power in the analysis. Two-tailed t-tests were used to identify signi cant changes in age acceleration residuals among non-ASD and ASD. Pearson's correlation coe cient was used to compare DNAm Age and chronological age. Fisher's exact test was used to identify enrichment by genomic location of REs. DEGs were identi ed using SAM analysis with signi cance de ned as FDR ≤ 0.05 by the BH method. Gene function and pathway analysis were performed in IPA using Fisher's exact test with BH correction for multiple testing (P FDR ≤ 0.05 was considered to be signi cant). All statistical analyses were performed in R (version 4.0.5) and RStudio (version 1.4.1103) using the ggplot2, plotROC, pheatmap, and GraphPad Prism (version 7.0b); data are presented as mean ± SD, and p ≤ 0.05 were considered to be signi cant.

Results
Epigenetic age acceleration (EAA) changes in ASD blood DNAm Age was predicted using Horvath's method, which employs 353 CpG sites (28). EAA was then estimated as the residual resulting from a multivariate model regressing the DNAm Age estimate on chronological age. Firstly, we correlated chronological age and DNAm Age in non-ASD (n = 59, combining non-ASD and non-ASD with 16.p11.2 del) and ASD (n = 72, combining ASD, ASD with 16p11.2 del, and ASD with CHD8 variants). The results show that DNAm Age was signi cantly correlated with chronological age in non-ASD (r = 0.72, p < 0.0001), but the correlation was lower in the ASD cohort (r = 0.49, p < 0.0001) ( Figure 1A and 1B). The expected correlation found in the non-ASD group indicated that DNAm Age in our analyses can be determined EAA. The chronological age difference between the two cohorts was not signi cant (ASD: 9.94, non-ASD: 9.68, p = 0.723) ( Figure 1C), while the DNAm Age of the ASD cohort was considerably decreased (ASD: -0.84, non-ASD: 0.84, p = 0.00078) ( Figure 1D). Next, we classi ed non-ASD and ASD cohorts by ≤ 10 years old and ≥ 11 years old. As for previous comparisons, chronological age did not differ between ASD and non-ASD in both groups ( Figure 1E). In the ASD cohort older than 11 years old, EAA shows a signi cant deceleration compared with an age-matched non-ASD cohort but this deceleration was not found in ASD younger than 11 years old ( Figure 1F). These ndings suggest that people with ASD were epigenetically younger than their chronological age which may be occurring in response to various environmental risk factors.
DNAm pro le of LINE-1 and Alu elements in the heterogeneous ASD A total of 22,352 probes mapping to LINE-1 and Alu elements were identi ed on the In nium 450K platform for differential DNAm analysis. The analyses were performed for heterogeneous ASD, ASD with 16p11.2del, and ASD with CHD8 variants, versus non-ASD. Firstly, we measured the global methylation by combining all positions mapping to LINE-1 and Alu elements as the total of CpGs. In the comparison of heterogeneous ASD against non-ASD (Figure 2A), there was no signi cant difference in global methylation between these cohorts (Δβ = 0.003, p = 0.098). However, when we performed the methylation pro le of REs by which LINE-1 and Alu positions were analyzed separately, we found that 2,802 (LINE-1) and 4,363 (Alu) DMPs were signi cantly differentially methylated (P FDR < 0.05) in the heterogeneous ASD compared to non-ASD ( Figure 2B and 2C, Figure 3). All these loci included 2,471 hypomethylated loci (LINE elements were changed in a subfamily-speci c manner. We discovered that LINE-1 was considerably hypermethylated in young and intermediate age families, including L1H (∆β = 0.013, p = 0.00001) and L1P (∆β = 0.005, p = 0.027), but HAL1 was hypomethylated (∆β = -0.003, p = 0.03) ( Figure 2D). Methylation of Alu elements was signi cantly hypermethylated in the old and intermediate age families: AluJ (∆β = 0.006, p = 0.004) and AluS (β = 0.005, p = 0.016), respectively ( Figure 2E). These ndings indicated that methylation of LINE-1 and Alu elements in the heterogeneous ASD was altered in family-and locus-speci c manner rather than globally.
LINE-1 and Alu methylation signatures in the homogeneous ASD (16p11.2del and CHD8 variants) Due to the heterogeneity in the ASD population, we also intended to investigate the methylation pro le of LINE-1 and Alu elements in genetically homogeneous ASD, as identi ed in the original article of GSE113967, including ASD individuals with 16p11.2del (n = 7) and CHD8 variant (n = 15). As for the results of ASD with 16p11.2del compared with non-ASD, we found no signi cant changes in the global methylation compared with non-ASD (∆β = -0.002, p = 0.771, Figure 4A). However, the analyses identi ed 70 signi cantly locus-speci c DMPs of REs in ASD with 16p11.2del including 27 DMPs at LINE-1 (5 hypomethylated loci, 22 hypermethylated loci, Figure 4B) and 43 DMPs at Alu elements (23 hypomethylated loci, 20 hypermethylated loci, Figure 4C). When LINE-1 and Alu positions were categorized into families, there was no signi cant difference in methylation of LINE-1 and Alu elements by family ( Figure 4D and 4E).
Genomic distribution of LINE-1 and Alu methylation in heterogenous and homogenous ASD.
To determine the differential DNAm of LINE-1 and Alu elements by genomic features, we performed enrichment analysis using Fisher's exact test. CpG positions at LINE-1 and Alu elements were categorized to 1500 and 200 within the transcriptional start site (TSS1500 and TSS200, respectively), the 5' untranslated region (5'UTR), the rst exon (1st exon), gene body (Body), and 3' untranslated region (3'UTR). In the heterogeneous ASD signatures, CpG sites at LINE-1 were signi cantly enriched in TSS1500 (p = 0.0005) and Body (p < 0.0001) (see Additional le 2A). Whereas Alu elements were signi cantly enriched in TSS1500 (p < 0.0001), 5'UTR (p < 0.0001), Body (p < 0.0001), and 3'UTR (p = 0.0086) (see Additional le 3A). However, DNAm across all retrotransposons by genomic location did not signi cantly differ between non-ASD and heterogenous ASD (see Additional le 2B and 3B). DNAm signatures of the ASD with 16p11.2del and CHD8 variants were signi cantly enriched in Body (p = 0.04) and TSS1500 (p < 0.0001) respectively (see Additional le 4: Figure S5B, S5C). This result shows that the changes of probes mapping to TSS1500 and gene bodies are more likely to have a functional impact on gene expression in ASD in both heterogenous and homogenous ASD.
Biological functions and pathways of LINE-1 and Alu methylation signatures in ASD and ASD variants.
To determine the biological signi cance of LINE-1 and Alu methylation signatures identi ed in each ASD cohort, we predicted the biological function and pathway of genes located nearby DMPs of LINE-1 and Alu elements using IPA software. We found that neurological diseases were signi cantly enriched among genes associated with LINE-1 and Alu methylation signatures in the heterogeneous ASD (p range: 0.00495 -3.33E-26, 2274 genes) and ASD with CHD8 variants (p range: 0.0258 -0.000117, 302 genes) as shown in Additional le 5 and 6. The categories ASD and intellectual disability were exclusively associated with LINE-1 and Alu methylation signatures in the heterogeneous ASD (p = 2.56E-06, 253 genes). Whereas Huntington's disease, familial encephalopathy, and brain lesion were commonly associated with both ASD signatures. For ASD with 16p11.2del variant, LINE-1 and Alu methylation signatures in this cohort were signi cantly associated with developmental disorders (p range: 0.0393 -0.00222, 9 genes) (see Additional le 7). However, only one gene was associated with the disease, possibly caused by a small number of genes associated with LINE-1 and Alu methylation of this ASD variant. Additionally, we discovered that several canonical pathways linked to ASD were associated with genes located nearby LINE-1 and Alu methylation signatures in each ASD cohort. More precisely, we found that the α-adrenergic signaling pathway was signi cantly associated in the heterogeneous ASD (p = 0.00269, 28 genes) and ASD with CHD8 variants (p = 0.00646, 7 genes). Axonal guidance signaling pathway involved in nervous system development was signi cantly associated with LINE-1 and Alu methylation signatures of ASD with 16p11.2del and CHD8 variants. These results indicate that genes associated with LINE-1 and Alu methylation signatures in ASD were involved with neurological diseases and ASD-comorbid disorders as well as canonical pathways known to be implicated in ASD. The list of all signi cant biological functions and pathways in each ASD variant is shown in Additional le 5-7.
Interactome networks or gene regulatory networks revealed the interaction of genes located nearby LINE-1 and Alu methylation signatures of each ASD variant. The functions and pathways implicated in ASD were highlighted in the networks. The interactome of the heterogeneous ASD was associated with ASD and mental retardation, as well as canonical pathways implicated in ASD such as retinoic acid receptor (RAR) and AMP-activated protein kinase (AMPK) signaling ( Figure 6). In ASD with 16p11,2del, we found that the interactome related to axonal guidance and sirtuin signaling pathway (see Additional le 8: Figure S8A). The interactome of ASD with CHD8 was related to familial encephalopathy and movement disorder which conditions found in ASD individuals (29,30). The interactomes were also associated with neuronal function including axonal guidance and synaptogenesis signaling pathways (see Additional le 8: Figure S8B and S8C).

Identi cation of unique target loci located nearby LINE-1 and Alu signatures in heterogeneous ASD
To investigate the functional impact of locus-speci c LINE-1 and Alu methylation to target gene or neighboring gene expression in the ASD, we identi ed DEGs from multiple ASD studies obtained from the GEO DataSets. This approach re ected the heterogeneity of the ASD population because these studies were compiled from a different ASD cohort. There were 12,419 DEGs identi ed from seven datasets including four studies (one study used peripheral blood samples and three studies used post-mortem brain tissues from ASD individuals) (see Additional le 1). We subsequently overlapped the list of DEGs with differentially methylated genes (DMGs: genes located nearby LINE-1 and Alu signatures). The overlapping revealed 1,847 DMGs in the heterogeneous ASD that were differentially expressed in several ASD studies, with 155 of them being autism-related genes in the SFARI database. We identi ed 43 top DMGs, |Δβ| ≥ 5%, inversely related to gene expression, and differentially expressed in at least two studies (Table 2). Interestingly, two of the top DMGs, potassium voltage-gated channel subfamily Q member 3 (KCNQ3) and ubiquitin conjugating enzyme E2 H (UBE2H), were genes in the SFARI database and were enriched in the gene regulatory network related to ASD and mental retardation ( Figure 6).
The genomic regions of LINE-1 and Alu methylation signatures within the DMGs are shown in Figure 7. We identi ed DMRs by mapping all probes located nearby LINE-1 and Alu signatures using the UCSC genome browser. The ndings revealed that AluSg7 (cg16926147), which is located on the gene body of the KCNQ3 gene ( Figure 7A), was hypermethylated and KCNQ3 expression level was signi cantly reduced in blood and post-mortem brain tissues. Interestingly, we discovered that several probes in this region, including those in the promoter region were not changed. This result suggests that LINE-1 and Alu methylation at DMRs may facilitate gene expression indicated by the inverse relationship between LINE-1/Alu methylation and gene expression. As well as AluY (cg08998414) within UBE2H gene ( Figure 7B) and L1PA3 (cg24094412) within hyperpolarization activated cyclic nucleotide gated potassium channel 1 (HCN1) ( Figure 7C), we also observed that AluY and L1PA3 methylation were inversely related to the gene expression levels in both blood and brain tissues of ASD cohort. Moreover, we found several DMGs that were not reported in the SFARI database but the expression of these DMGs in the blood and post-mortem brain tissues was inversely related to LINE-1 and Alu methylation such as N-deacetylase and Nsulfotransferase 1 (NDST1) (cg12611243: L1MC1), ubiquitin speci c peptidase 6 (USP6) (cg23416909: L1M5), and formin binding protein 1 (FNBP1) (cg13916261: AluSg). These results suggest that DMPs at LINE-1 and Alu elements may affect the expression of genes located nearby these DMPs in the heterogeneous ASD cohort. Identi cation of unique target loci located nearby LINE-1 and Alu signatures in ASD variants.
To investigate the effects of unique LINE-1 and Alu methylation signatures to target gene or neighboring gene expression in the genetically homogeneous ASD, we obtained 39 and 101 DMPs that were found exclusively in the ASD with 16.p11.2 del and CHD8 variants, respectively (see Additional le 4: Figure S4A). We re-analyzed them for ASD variant versus the heterogenous ASD. Next, we conducted the same strategy used for the heterogeneous ASD to select the candidate DMPs by overlapping with the transcriptome data. The overlapping of unique DMPs with transcriptome data revealed 11 and 31 unique DMGs in the ASD with 16.p11.2 del and CHD8 variants, respectively (Table 3 and 4). Among the unique DMGs, we found several genes linked to neurodevelopmental disorder and ASD, including XK related 6 (XKR6) ( Figure 8A), zinc nger protein 107 (ZNF107) ( Figure 8B), and myeloma-overexpressed gene 2 protein (MYEOV2) ( Figure 8C) in the ASD with 16.p11.2 del. The signi cant DMPs at AluY (cg21300361) within XKR6 was hypermethylated, while as AluSq (cg01772945) within ZNF107 and L1MB3 (cg13749477) within MYEOV2 were hypomethylated. Interestingly, these genes were differentially expressed in the blood transcriptome of multiple ASD cohorts, and their expression was inversely relative to LINE-1 and Alu methylation levels.
For ASD with CHD8 variants, we found that all LINE-1 and Alu elements located on candidate genes were markedly hypomethylated, as expected from global and family-speci c methylation levels. These DMPs consist of L1MC5 (cg22706070) within Euchromatic Histone Lysine Methyltransferase 2 (EHMT2) ( Figure 9A), AluJo (cg06421197) within caspase 1 (CASP1) (Figure 9B), and AluSx (cg18699242, cg01963623, cg02169692) within ubiquitin-speci c peptidase 18 (USP18) ( Figure 9C). EHMT2 was signi cantly increased in the blood of ASD, while CASP1 was increased in both the blood and brain of multiple ASD cohorts (one probe was decreased). These changes were inversely relative to LINE-1 and Alu methylation levels within that gene. We found that the expression of USP18 was not inversely relative to AluSx methylation located on the gene. Additionally, the DMRs of XKR6, ZNF107, MYEOV2, EHMT2, and CASP1 genes revealed LINE-1 and Alu probes as well as non-LINE-1/Alu probes located in the same DMRs ( Figure 8A-C and Figure 9A-B). Sensitivity and speci city of unique LINE-1 and Alu signatures in ASD variants.
To predict diagnosis of the genetically homogenous ASD by using LINE-1 and Alu methylation signatures, we subsequently conducted ROC curves analysis of these loci and other probes within unique DMRs to distinguish each homogenous ASD variant from non-ASD and ASD with non-speci c variants. For ASD with 16.p11.2 del, AluY within XKR6 (cg21300361) exhibited high sensitivity and speci city (AUC = 0.905, 95%CI = 0.83-0.98) to distinguish ASD with 16.p11.2 del from non-ASD and ASD with CHD8 variants as shown in the ROC curves ( Figure 8D). In addition, the ROC curves of AluSq within ZNF107 (cg01772945) and L1MB3 within MYEOV2 (cg13749477) also exhibited high AUC value (AluSq: AUC = 0.900, 95%CI = 0.83-0.97 and L1MB3: AUC = 0.841, 95%CI = 0.74-0.95) ( Figure 8E and 8F). In the ASD with CHD8 variants, LINE-1 and Alu methylation signatures within candidate DMGs showed moderate sensitivity and speci city as demonstrated by AUC values (AUC range: 0.712-0.819) compared with the speci city of unique loci in ASD with 16.p11.2 del, including L1MC5 (cg22706070) within EHMT2 ( Figure 9D), AluJo (cg06421197) within CASP1 ( Figure 9E), and AluSx (cg18699242, cg01963623, cg02169692) within USP18 ( Figure 9F). Our ndings suggest that these novel DMPs at the LINE-1 and Alu elements could be used for clinical purposes. However, an independent cohort is required for validation, as we were limited by the percentage of ASD individuals affected by these genetic variants.

Discussion
Epigenetic modi cation is an important mechanism linking environmental and genetic factors, especially during the developmental process. Epigenetic age is a biomarker for age-related epigenetic changes as well as disease-speci c (31), but it is unknown whether it contributes to ASD pathogenesis. DNAm age acceleration rates were signi cantly increased in the blood of mid-childhood ASD (5-11 years) using the method established by Wu et al (32). In this study, DNAm Age was calculated using Horvath's epigenetic clock algorithm, which shows a high correlation between the chronological age and DNAm Age. Interestingly, we discovered DNAm Age in ASD older than 11 years old was signi cantly decelerated but not in younger patients aged <11 years in our study. This observation is assumed to be related to the external exposures accumulated during childhood leading to developmental delay. Concomitantly, the mean of age acceleration difference in adolescence ASD (12-18 years old) was lower than their unaffected siblings (not statistically signi cant) (32). Additionally, we looked into the LINE-1 and Alu methylation pro les for ASD with these groups. Although DNAm Age was reduced in ASD individuals over the age of 11, the DNAm pro le of the LINE-1 and Alu elements was signi cantly altered only in ASD under the age of 11 (see Additional le 9). These ndings suggest that the DNAm signature of LINE-1 and Alu elements may change early which corresponds to early developmental and behavioral delays or deviancies. The epigenetic delays were also observed in post-mortem brain tissues of children with ASD (aged 5-8 years) (33).
In the heterogeneous ASD, we found no difference in global methylation (LINE-1 and Alu) (Figure 2A), which is consistent with our previous studies using lymphoblastoid cell lines that found no difference when all ASD were combined. In addition to Shpyleva's study, total methylation of LINE-1 in the ASD brain was also not signi cantly altered (25). The possibility is that the aberration of the global methylation levels (LINE-1 and Alu) may rely on family-speci c REs or restrict to speci c locations. Reducing the heterogeneity of ASD by classifying ASD based on clinical phenotype may be bene cial, as demonstrated by previous ndings from our investigators (21,26,(34)(35)(36). Subcategorizing ASD allowed us to observe the hypomethylation of global methylation in ASD with CHD8 variants.
The aberration of LINE-1 or Alu elements during development may cause double-strand DNA breaks and DNA damage leading to the process of neurodegeneration (37,38). Furthermore, identi cation of LINE-1 and Alu subfamilies has led to a better understanding of the association between the REs and human diseases because some subfamilies of LINE-1 and Alu elements remain active (20). To the best of our knowledge, our study is the rst to identify the locus-speci c methylation at LINE-1 and Alu elements in a subfamily-speci c manner of the ASD blood samples. In this study, 7,165 DMPs at LINE-1 and Alu elements were identi ed in the heterogeneous ASD compared with non-ASD and the most of the DMPs were notably hypermethylated. We observed these hypermethylated loci mapped to L1P, L1H, AluJ, and AluS elements, which are It is important to note that LINE-1 and Alu elements play important roles in human brain development and brain somatic mosaicism. LINE-1 and Alu elements can regulate nearby genes during brain development (39)(40)(41)(42). LINE-1 and Alu retrotransposition occurred more frequently in the brain than germline cells (43). Furthermore, Coufal's study, which compared LINE-1 activity in fetal neural progenitor cells (NPCs) to other somatic cells, revealed that NPCs have high retrotransposition of LINE-1 (44). They also discovered low DNAm at the LINE-1 promoter as well as a high copy number of LINE-1 in brain tissues when compared to other somatic cells (44,45). Thus, we also performed an analysis in the validation cohort using data from post-mortem brain tissues (GSE131706), we identi ed signi cant 482 DMPs (p < 0.05) at LINE-1 and Alu elements. Interestingly, we found that all DMPs from the brain were overlapped with DMPs identi ed in the heterogenous ASD blood (see Additional le 10). These ndings suggest that epigenetic dysregulation of LINE-1 and Alu elements in ASD may alter the function of autism-related genes regulated by these elements. To address this, we predicted the biological functions and networks of genes located nearby DMPs of LINE-1 and Alu elements. Neurological diseases and canonical pathways implicated in ASD were signi cantly associated with these genes (see Additional le 5). Moreover, interactome networks associated with ASD revealed several autism-related genes in the SFARI database ( Figure 6). Especially, AluSg7 (cg16926147) within KCNQ3 gene and L1PA3 (cg24094412) within HCN1 gene that were hypermethylated and inversely related to aberrant gene expression in the blood and post-mortem brain tissues of several ASD cohort studies. Hypomethylated DMPs were also discovered in the most active Alu family, AluY (cg08998414), which is located on the UBE2H gene and has an inverse relationship with gene expression. KCNQ3 encodes a protein involved in neuronal excitability; people with a de novo variant of this gene experience ASD features, and some were diagnosed with ASD (46). HCN1 encodes a hyperpolarization-activated cation channel that is widely expressed in the brain regions (47). HCN1 mutation causes epileptic encephalopathy and this mutation is associated with intellectual disability and autistic traits (48). UBE2H encodes an E2 ubiquitin-conjugating enzyme family protein that is involved in the protein ubiquitination mechanism. Genetic association and screening studies have shown that this gene is present in ASD individuals. (49,50). Another interesting result is hypomethylation in the HAL1 family which was found exclusively in the heterogeneous ASD. HAL1 or half-L1 encodes only ORF1p which enhances the e ciency of their transposition, but the origin, biological properties, and subfamilies have not been well studied (51). HAL1 subfamilies were also not well annotated in our data. However, this result warrants further research of their biological activity in the ASD.
Here, we discovered LINE-1 and Alu methylation signatures in these genetically homogeneous ASD (both 16p11.2 del and CHD8 variants). In the ASD with 16p11.2del, only locus-speci c changes at LINE-1 and Alu elements were observed ( Figure 4). We identi ed unique DMPs which target genes differentially expressed in the several ASD cohort studies including AluY within XKR6 (cg21300361), AluSq within ZNF107 (cg01772945), and L1MB3 within MYEOV2 (cg13749477). These genes were genetic risk variants for ASD identi ed in genome-wide association study (GWAS), single nucleotide polymorphisms (SNPs), and copy number variation (CNV) studies (52)(53)(54). In the case of ASD with the CHD8 variants, we observed a widespread reduction of LINE-1 and Alu methylation levels in global methylation and the active LINE-1 and Alu families (L1P, L1H, and AluS). This change has far-reaching implications for even the oldest and fossil family (AluJ and FAM), as well as FRAM family. Furthermore, the unique LINE-1 and Alu methylation signatures of ASD with CHD8 variants, such as L1MC5 (cg22706070) within EHMT2, AluJo (cg06421197) within CASP1, and AluSx (cg18699242, cg01963623, cg02169692) within USP18, were also hypomethylated. However, we found that these alterations are inconsistent with the heterogeneous ASD pro le, in which most DMPs were hypermethylated. These results may be caused by disease-speci c genetic variants of CHD8 that is a huge difference from the ASD without any genetic variants or with unde ned ones. CHD8 is a chromatin remodeling/modi er factor that plays a role in the transcription process required for brain development (10). LINE-1 and Alu elements have an activation and a repressive chromatin mark that is bound by several chromatin remodeling/modi er factors (39,42,55,56). Aberrant CHD8 function may be leading to changes in genome-wide epigenetic marks which can affect a variety of gene regulatory mechanisms. The inverse relationship between LINE-1/Alu methylation and gene expression were also observed in the ASD with CHD8 variants. EHMT2, located nearby L1MC5, is a histone lysine methyltransferase involved with gene activation or repression. Gene and protein expression levels of EHMT2 were signi cantly increased in the post-mortem brain tissues of ASD (57,58). CASP1 encodes cysteine-aspartic acid protease (caspase) enzyme involving apoptosis, monocyte cell fate, and immune response (59). CASP1, located nearby AluJo, was signi cantly elevated in the peripheral blood mononuclear cells of ASD (60), as well as overexpressed in two ASD studies including blood and post-mortem brain tissues. USP18 is a protein in the ubiquitin pathway which is essential for cell cycle, cell differentiation, and proliferation (61) and its CNV has been reported in ASD individuals (62). In transcriptome data obtained from several ASD studies, USP18 was signi cantly decreased, but not inversely related to hypomethylated positions of AluSx located in the upstream region of USP18 gene. However, three probes (cg18699242, cg01963623, and cg27281093) at the same regions have been reported to be hypomethylated and they are the CDH8 signature in the previous study (63). Our ndings showed that DNAm of LINE-1 and Alu elements, located in the target genes, are connected with ASD-related genes. Moreover, biological functions and interactome of the genes located nearby LINE-1 and Alu methylation signatures in the ASD cohorts were associated with neurological diseases and developmental disorders, as well as canonical pathways implicated in ASD.
Unlike genetic changes, epigenetic alterations are not recorded in the genome and cannot be identi ed by genome sequencing. DNAm signatures are identi ed by comparing the methylation patterns of affected individuals to those typically developing control individuals. Several DNAm signatures have been established, and their effectiveness demonstrated as epigenetic markers for identifying variations of uncertain signi cance as pathogenic or benign (64). Although ASD pathogenesis occurs in the brain tissue, other systems such as the immune (65), metabolic (66), and gastrointestinal systems (67) are also affected in ASD individuals. DNAm in the blood is highly correlated to brain tissue samples and re ects environmental exposure (68). The discovery of distinct LINE-1 and Alu methylation signatures in ASD blood outlines their clinical potential to be used as non-invasive biomarkers. We conducted ROC curves analysis to predict a sensitivity and speci city of diagnosis with ASD using unique DMPs at LINE-1 and Alu elements identi ed in the blood of ASD individuals. Our ndings show that LINE-1 and Alu methylation can be used to identify ASD with speci c variants from unaffected individuals and classify them. However, additional research is required to determine its sensitivity and speci city in large and independent ASD cohorts.
In summary, we interrogated DNAm signatures of LINE-1 and Alu elements in the blood of ASD. DNAm signatures re ect environmental exposure linked to interaction with genetic factors through epigenetic processes. In the ASD cohort, DNAm signatures revealed changes in age acceleration residuals and LINE-1/Alu elements positions. The DNAm age was decelerated, meaning that an individual with ASD is epigenetically younger than their chronological age. Alteration of LINE-1 and Alu methylation induced a change in expression of genes located nearby these elements. Moreover, these genes have important functions related to neurodevelopment and cellular signaling. Disruption of these functions may lead to ASD features ( Figure 10).
Because of the limitation of available post-mortem brain tissues for each ASD variant in publicly available datasets, our analyses were carried out using methylation data from ASD blood. Further research with a large number of post-mortem brain tissues is required. We did not perform multiple test corrections in the analysis of the validation cohort using post-mortem brain tissues, due to the small sample size which affects the statistical analysis power. Moreover, methylome and transcriptome datasets used in our study were obtained from different cohorts. However, transcriptome data from several ASD studies may re ect the heterogeneity of ASD, and one of the ASD cohorts in our analyses is a heterogeneous group. It is important to note that changes in LINE-1 and Alu methylation may occur as a result of a genetic factor in the genetically homogeneous ASD.

Conclusions
Locus-speci c DNAm of LINE-1 and Alu elements in ASD, as well as its functional impact on gene regulation, were rstly reported in our study. Our analyses revealed LINE-1 and Alu methylation changes in a locus-and family-speci c manner which were different according to each ASD cohort. By integrating methylome and transcriptome data, the target genes of LINE-1 and Alu elements were identi ed. These target genes were differentially expressed in multiple ASD cohorts, and their functions were related to neurological diseases and developmental disorders such as ASD. Our research also demonstrated that the LINE-1 and Alu signatures can be applied to diagnose and classify people with ASD. Finally, our nding will provide a better understanding of the impact of LINE-1 and Alu elements in ASD, at least in the blood. Our study provides evidence supporting future studies on the role of LINE-1 and Alu related to ASD neuropathology using human post-mortem brain tissues. However, further functional studies will be necessary to investigate the subsequent impact upon the target genes and fully elucidate the role of REs in ASD biology. Availability of data and materials The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests Authors' contributions T.Sae. performed transcriptome analysis, biological function, and interactome prediction, and drafted the manuscript under the supervision of T.S., T.T. and C.S. C.S., N.I., D.G., and T.P. performed DNAm analysis for LINE-1 and Alu elements. T.S. supervised transcriptome analysis and interpretation. T.S. supervised the biological function and interactome prediction using IPA. C.S. had primary responsibility for the nal content. All authors read and approved the nal manuscript. Figure 1 Age acceleration residual changes in ASD. Correlation of DNAm Age and chronological age in non-ASD (n = 59) (A) and ASD (n = 72) (B). Differences of the chronological age (C) and age acceleration residual (AgeAccelResidual) (D) between non-ASD and ASD. Differences of the chronological age (E) and AgeAccelResidual (F) between non-ASD and ASD by ≤ 10 years old and ≥ 11 years old. Mean ± SD. *p < 0.05.     Methylation of repetitive elements (LINE-1 and Alu) in non-ASD (n = 48) and ASD patients who carry 16p11.2 deletions (n = 7). Global DNA methylation (A). Volcano plots of mean change in methylation (Δβ) of LINE-1 (B) and Alu (C) against -log10 FDR-adjusted p-value (P FDR ) of ASD compared with non-ASD; the green line represents P FDR = 0.05, the red line represents 10% of methylation changes, green dots represent hypomethylation loci, and orange dots represent hypermethylation loci. Changes in DNA methylation (Δβ) of ASD with 16p11.2 deletion compared with non-ASD by a subfamily of LINE-1 (D) and Alu (E). Mean ± SD.

Figure 5
Methylation of repetitive elements (LINE-1 and Alu) in non-ASD (n = 48) and ASD patients who carry CHD8 variants (n = 15). Global DNA methylation (A). Volcano plots of mean change in methylation (Δβ) of LINE-1 (B) and Alu (C) against -log10 FDRadjusted p-value (P FDR ) of ASD compared with non-ASD; the green line represents P FDR = 0.05, the red line represents 10% of methylation changes, green dots represent hypomethylation loci, and orange dots represent hypermethylation loci. Changes in DNA methylation (Δβ) of ASD with CHD8 compared with non-ASD by subfamily of LINE-1 (D) and Alu (E). Mean ± SD. *p < 0.05.

Figure 6
The regulatory network of differentially methylated genes (DMGs) in heterogeneous ASD that is related to neurological diseases. The gene regulatory network was predicted by ingenuity pathway analysis software using the list of DMGs (colored; red = hypermethylation; green = hypomethylation)