Identification of methylation markers for diagnosis of autism spectrum disorder

Autism spectrum disorder (ASD) is a hereditary heterogeneous neurodevelopmental disorder characterized by social and speech dysplasia. We collected the expression profiles of ASD in GSE26415, GSE42133 and GSE123302 from the gene expression omnibus (GEO) database, as well as methylation data of GSE109905. Differentially expressed genes (DEGs) between ASD and controls were obtained by differential expression analysis. Enrichment analysis identified the biological functions and signaling pathways involved by common genes in three groups of DEGs. Protein-protein interaction (PPI) networks were used to identify genes with the highest connectivity as key genes. In addition, we identified methylation markers by associating differentially methylated positions. Key methylation markers were identified using the least absolute shrink and selection operator (LASSO) model. Receiver operating characteristic curves and nomograms were used to identify the diagnostic role of key methylation markers for ASD. A total of 57 common genes were identified in the three groups of DEGs. These genes were mainly enriched in Sphingolipid metabolism and PPAR signaling pathway. In the PPI network, we identified seven key genes with higher connectivity, and used qRT-PCR experiments to verify the expressions. In addition, we identified 31 methylation markers and screened 3 key methylation markers (RUNX2, IMMP2L and MDM2) by LASSO model. Their methylation levels were closely related to the diagnostic effects of ASD. Our analysis identified RUNX2, IMMP2L and MDM2 as possible diagnostic markers for ASD. Identifying different biomarkers and risk genes will contribute to the diagnosis of ASD and the development of new clinical and drug treatments.


Introduction
Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by a wide range of barriers from childhood, involving mutual social communication and limited, repetitive interests and behaviors . Approximately 1 in 59 children is diagnosed with ASD, with a male-to-female ratio of approximately 3:1 (Guo et al. 2019). It is increasingly recognized that ASD has potentially harmful effects on function, both in children and adults (Courchesne et al. 2011). Adults with ASD have higher suicide rates, increased mortality and reduced life expectancy (Arnold et al. 2019).
In studies combining brain tissue from adolescents and adults with ASD, transcriptional expression patterns suggested that abnormalities in cortical patterns, synapses, apoptosis, immunity, and inflammation may be associated with autism (Chow et al. 2012). Although many aspects of ASD are still not fully understood, significant progress has been made in emphasizing genes (Elsabbagh and Johnson 2010;Velmeshev et al. 2019). The involvement of mouse double minute 2 (MDM2) in the process of synapse elimination in ASD had shown in study (Guang et al. 2018). Runt-related transcription factor 2 (RUNX2) expression is significantly reduced in patients with neurodevelopmental disorders (Amini et al. 2020). Studies have found inner mitochondrial Bei Zhang and Xiaoyuan Hu contributed equally to this work. membrane peptidase subunit 2 (IMMP2L) as a potential risk gene for ASD (Kreilaus et al. 2019;Zhang et al. 2018).
Previous studies had shown that there may be potential genetic and/or environmental perturbations involving multiple systems in ASD, thereby affecting the emergence of ASD and physical health problems (Pan et al. 2020). However, recent studies had confirmed that DNA methylation in ASD was an important epigenetic regulator (Yoon et al. 2020). The expression levels of epigenetic related genes in ASD patients were significantly different, but not in normal subjects, suggesting that epigenetic modification plays a key role in the ASD phenotype (Ben-David and Shifman 2013; Waye and Cheng 2018). Wong et al. found variant DNA methylation patterns in ASD-inconsistently diagnosed identical twins, which was the first epigenetic analysis of ASD patients (Wong et al. 2014). The development of biomarkers for ASD, although a focus of important research, has yielded limited results to date.
Because of extensive gene expression and genetic studies, there is currently some consensus on the etiology of ASD. This study attempts to provide a means to identify high priority candidate genes for ASD by combining methylation profiles with gene expression profiles.

Data collection
The datasets related to autism of GSE26415, GSE42133, GSE123302 and GSE109905 were collected from the gene expression omnibus (GEO) database. GSE26415 included gene expression profiling in peripheral blood from 21 young adults with ASD and healthy mothers having children with ASD, between whom there was no blood relationship. Raw data output was imported into Genespring GX10 (Agilent) and normalized each probe to the median expression of matched control intensity for that gene across all samples. GSE42133 included gene expression profiling in leukocyte samples from 91 pediatric ASD subjects and 56 control toddlers. Raw data was processed with Illumina GenomeStudio® software and Lumi package in R, and Log2 transformation and quantile normalization. GSE123302 included gene expression profiling in cord blood samples from 59 ASD and 91 control. The raw fluorescence data (in Affymetrix CEL file format) with one perfect match and one mismatched probe in each set were analyzed using oligo package in R. Signal distribution was first assessed in perfect-match probe intensity and Robust m u l t i -C h i p A v e r a g e ( R M A ) n o r m a l i z e d d a t a . GSE109905 included genome-wide DNA methylation profiling of blood samples in 38 ASD adults and 21 control adults.

Analysis of differentially expressed genes
The differentially expressed genes (DEGs) between ASD group and control group were identified by limma R software package (Ritchie et al. 2015). Genes with P < 0.05 were considered as statistically different.

Enrichment analysis
The R package of clusterProfiler was used to perform the functional enrichment analysis of the DEGs, including gene ontology (GO) functional analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. The cellular component (CC), biological process (BP) and molecular function (MF) terms were included in the GO analysis. P value of <0.05 was considered as a statistically significant difference.

Construction of protein-protein interaction (PPI) network
The important DEGs were analyzed by inputting them into Search Tool for the Retrieval of Interacting Genes (STRING) (https://string-db.org). The PPI network was displayed through Gephi. The key genes were chosen based on their associations with a higher number of other DEGs in the PPI network.

DNA methylation analysis
The raw methylated and unmethylated read counts of each CpG site in GSE109905, modeled with a beta-binomial distribution. Data quality control and analysis were performed using the ChAMP methylation analysis package (version 2.6.4) in R (Tian et al. 2017). After quality control, the CpG probes were retained for further analysis. Differentially methylated positions (DMPs) were identified using the limma R package.

Clinical sample collection
The peripheral blood samples were collected from five patients of Asperger syndrome (AS) adult ASD and five adult controls. Demographics and main physical characteristics of the patients are shown in Table 1. All patients signed informed consent. This study was approved by the ethics review board of the Fourth People's Hospital of Urumqi (No: 20200306).

Quantitative real time polymerase chain reaction (qRT-PCR) analysis
The key genes were chosen for qRT-PCR analysis to examine the expressed levels of mRNA transcript as predicted by microarray analysis. Total RNA of peripheral blood was isolated using TRIzol reagent (Invitrogen, CA, USA) according to the manufacturer's protocol. Then RNA was reverse transcribed into cDNA using the PrimeScript™ RT Master Mix (Takara, Dalian, China), and qRT-PCR was accomplished using Platinum SYBR Green qPCR SuperMix (Invitrogen, Beijing, China). The results were normalized to the expression of GAPDH. Quantitation was carried out using the 2 −ΔΔCT method. P values <0.05 were considered statistically significant. Sequences of primers used for qRT-PCR are given in Table 2. Experiments with qRT-PCR were repeated for three times.

Methylation analysis
DNA was extracted from blood samples by using the DNA extraction Kit (Qiagen, Germany) according to the manufacturer's instruction. The DNA was treated with sodium bisulfite using the EZ DNA Methylation-Gold Kit (Zymo Research, Irvine, CA) according to the manufacturer's protocol. The samples were eluted in 40 μL of M-elution buffer, and 2 μL were used for each PCR reaction. The biotinylated primers listed in Table 1.

Statistical analyses
Statistical analyses were carried out with SPSS Statistics 21.0. In all cases, a P value <0.05 (two-sided) was considered significant. The differences between two groups were compared by Student t test.

Results
Differentially expressed genes in autism spectrum disorder To identify differentially expressed genes associated with ASD, we performed a differential analysis of gene expression between ASD patients and controls in GSE26415, GSE42133, and GSE123302 (Fig. 1a). We identified 57 common genes in three groups of DEGs (Fig. 1b).

Function and pathway enrichment
Enrichment analysis for 57 common genes revealed that genes were enriched to 162 biological processes (BP), 26 cellular components (CC) and 48 molecular functions (MF). It mainly included SRP-dependent co-translational protein targeting to membrane, protein targeting to ER and protein autoubiquitination (Fig. 2a, b, c). In addition, the genes were also enriched in 11 KEGG signaling pathways. It mainly included Ribosome, Sphingolipid metabolism and PPAR signaling pathway (Fig. 2d).

Protein-protein interaction network
By constructing PPI networks of common genes, we identified interaction networks of 57 genes (Fig. 3a). Among them, MDM2, RPL4, TBP, RPL5, RPS13, RPS14 and FAU had the highest degree of connectivity and were considered as key genes. These key genes were all up-regulated in ASD compared with controls in three datasets (Fig. 3b). Receiver operating characteristic (ROC) curves showed that the AUC values of t h es e g en es w e r e g r e a t e r t ha n 0 .6 ( Fi g. 3 c ). Importantly, we validated that MDM2, RPL4, RPL5, RPS13, RPS14 and FAU were significantly differentially up-regulated in AS patients compared with controls detected by qRT-PCR (P < 0.05), whereas TBP was down-regulated (Fig. 3d).

Methylation markers in autism spectrum disorder
By analyzing the methylation positions between ASD and controls in GSE109905, we obtained differentially methylated positions (DMPs) (Fig. 4a). Methylation probes had larger feature proportions in body, especially hyperProbe (Fig. 4b).
Of the 57 common DEGs, we identified 31 methylation markers (Fig. 4c, Table S1). A least absolute shrink and selection operator (LASSO) model was constructed in the 31 methylation markers. After the LASSO regression analysis, three genes were chosen to predict ASD most accurately based on the minimum value of the crossvalidation error mean (Fig. 4d, e). Compared with normal controls, methylation levels of cg11115867 (RUNX2) and cg16056465 (IMMP2L) were Venny map of three groups of differentially expressed genes. The intersection part is common genes increased in autism spectrum disorder, while methylation levels of cg24549639 (MDM2) were decreased (Fig. 5a). This was further confirmed in methylation assays on blood from AS patients (Fig. 5b). RUNX2 and IMMP2L were up-regulated in ASD patients detected with methylated primers, while MDM2 was downregulated. The ROC curve showed that the AUC values of all three genes were greater than 0.7 (Fig. S1). Binary logistic regression analysis found that the ROC curves of the three methylation markers revealed an AUC value of 0.91, which was highly sensitive and specific for the differential diagnosis of ASD and controls (Fig. 5c, d). A nomogram was constructed by logistic regression analysis, and we used methylation levels of key methylation markers as risk factors for prediction of ASD (Fig. 5e). The results showed that the methylation level of key methylation markers may closely relate to the diagnosis of ASD.

Discussion
We envision that the identification of new biomarkers, risk genes and related molecular pathways may contribute to the early diagnosis of ASD and to the improvement of clinical and pharmacological treatment of this disease. The results of this study validated that six DEGs were differentially expressed in autistic patients. For methylation markers, it may have more clinical diagnostic significance, especially for MDM2. This study further supported the possible role of protein ubiquitination in neuropathology as a pathological mechanism associated with autism (Fatemi et al. 2012). Signal recognition particle (SRP) is a ribonucleic acid (Rnp) complex that mediates the targeting of nascent signal sequence-carrying polypeptides, including membrane and secreted proteins, to the ER surface (Keenan et al. 2001). ER stress and apoptosis may play an important role in the pathogenesis of autism . The Sphingolipid metabolism pathway is altered in a biologically similar maternal immune activation (MIA) mouse model to ASD (Naviaux et al. 2014). In addition, the PPARγ agonist pioglitazone has been used in clinical trials, suggesting that PPAR may be a target for ASD drug therapy (Ghaleiha et al. 2015). Selective activation of PPARα and PPARγ plays a neuroprotective role by reducing oxidative stress and neuroinflammation, which is an important link in the pathophysiological process of ASD (Barone et al. 2019).
Among the key genes identified in the PPI network, the differently expressed MDM2 had been validated by other ASD study (Chow et al. 2012). In the ASD model, dysregulated MDM2 prevented myocyte enhancer factor 2-induced PSD-95 ubiquitination and synaptic elimination (Tsai et al. 2012). The AUC value of MDM2 was 0.68, indicating that its expression abnormality has a potential diagnostic role for ASD. The results of our analysis again suggested MDM2 as a potential marker for ASD. TATA-binding protein (TBP) is a ubiquitinated substrate by a special E3 ligase in vivo and in vitro, which helps to activate gene expression (Li et al. 2015;Ravarani et al. 2020). In the neuroprotective proteins of ASD, the expression of TBP is relatively reduced, which is consistent with our experimental results (Grigg et al. 2020). Ribosome protein L4 (RPL4) is involved in the regulation of splicing and transcription networks in human neuronal development (Fogel et al. 2012), may influence the occurrence of Fig. 2 The enrichment results of common genes. a. The biological process of common genes enrichment. b. The cellular component of common genes enrichment. c. The molecular function of common genes enrichment. d. The KEGG pathway of common genes enrichment autism. Unfortunately, no studies had been found on the relationship between RPL4, RPL5, RPS13, RPS14, or FAU and ASD. However, ribosomal proteins may play an important role in maintaining cognitive function and have been associated with ASD (Wang et al. 2015). Importantly, we verified the abnormal expression of these key genes in AS patients. This implies that these genes may serve as biomarkers for AS. Overall, abnormalities in the marker genes we identified appear to contribute to organelle dysfunction rendering the cell incapable of protein processing. This may be an important molecular mechanism in the genesis of ASD.
Our comparison of DNA methylation modification differences between ASD and controls enabled us to identify specific methylation markers associated with ASD. Interestingly, the only overlapping gene between seven key genes from PPI network and three key methylation markers is MDM2. Differential methylation of MDM2 has been found to be associated with mitochondrial dysfunction in autism spectrum disorder Fig. 3 Identify key genes of autism spectrum disorder. a. PPI network of common genes. b. Expression pattern of key genes in ASD of GSE26415, GSE42133 and GSE123302 compared to controls. The larger the node, the bigger the differential change. c. The ROC curve of key genes for autism spectrum disorder and control people. ROC, receiver operating characteristic curve; AUC, area under the receiver operating characteristic curve. d. The mRNA levels of key genes were detected by qRT-PCR method in ASD and controls. ASD, autism spectrum disorder, Ctrl, control. * P < 0.05, ** P < 0.01 (Stathopoulos et al. 2020). MDM2 may serve as a new and promising molecule for research and targeted therapy of ASD. RUNX2, as a candidate gene for ASD, is involved in abnormal brain rhythm and language deficits . Differential methylation of RUNX2 is associated with neurodevelopmental disorders (Chater-Diehl et al. 2019). IMMP2L is associated with susceptibility to autism in Caucasian populations (Maestrini et al. 2010). IMMP2L is not only associated with autism, but also with diagnosable autism-related diseases (Muhle et al. 2004). Logistic regression analysis showed Fig. 4 Identification of methylation markers. a. Differentially methylated positions between ASD and control. b. The feature proportion of methylation probes. c. Identification of methylation markers between ASD and control. d. The validation for tuning parameter (λ) selection in the LASSO model. The log(λ) value of −1.743583 is used for further analysis. e. LASSO coefficient profiles of the most useful risk genes that MDM2, RUNX2 and IMMP2L had a better role in the differential diagnosis of ASD. These results suggested that MDM2, RUNX2 and IMMP2L had great application prospects in the diagnosis of ASD and deserve further study.
Our study also had some limitations. First, our analytical data were derived from public databases, and there was not much sample data for validation experiments, and larger clinical samples needed to validate important analytical results in the follow-up. Second, we had not explored the mechanism of methylation markers and need further in-depth study. Finally, we selected only one ASD phenotype for validation of gene expression and methylation modification, it needed more different phenotypes to explore.
In summary, we report the association of methylation modifications and gene expression. Our results will need to be followed up to determine the diagnostic role of methylation markers. These results may provide new insights into the biological Fig. 5 The effect of methylation marker on the prognosis of autism. a. Methylation levels of key methylation markers in ASD and control. ASD, autism spectrum disorder. b. DNA methylation levels of key methylation makers detected by sodium bisulfite with both methylated and unmethylated primers. Methylated, methylated primers; unmethylated, unmethylated primers; M, marker; C, controls; ASD, autism spectrum disorder. The AUC value (c) and risk differentiation (d) of multi factor analysis for methylation markers. e. Nomogram for predicting risk factors of ASD mechanisms and diagnostic markers of ASD, which involved in the development of ASD by affecting organelle dysfunction.
Acknowledgements The authors thank all the participants included in this study.
Author contributions Bei Zhang and Xiaoyuan Hu designed and managed the study. Bei Zhang and Yuefei Li contributed significantly to the data analyses. Xiaoyuan Hu and Yongkang Ni performed methylation analysis. Bei Zhang and Lin Xue wrote the paper. Lin Xue reviewed and edited. All authors contributed to and approved the final manuscript.
Funding This work was not supported by any funds.
Data availability The data used in this study can be found in GEO database, including GSE26415, GSE42133, GSE123302 and GSE109905.

Declarations
Ethics statement This study was performed at the Fourth People's Hospital of Urumqi and was approved by the ethics review board of the Fourth People's Hospital of Urumqi (No: 20200306). Patients gave their approval to be enrolled in this study in an informed written consent. The patient studies were conducted in accordance with the ethical guidelines of the Declaration of Helsinki.
Competing interests The authors declare no competing interests.