Identification of CpG methylation and non-CpG methylation is associate with Lymphocyte specific helicase

Background : CpG methylation is crucial for normal cell development and differentiation in mammalian genomes, whereas non-CpG methylation presents in stem cells and appears to be enriched in pluripotent cell types. Lymphocyte-specific helicase (LSH) is critical for normal cell development as a regulator of CpG methylation, but whether there is potential link between LSH and non-CpG methylation remains unclear. Results: In this study, we using reduced representation bisulfite sequencing(RRBS), we found that LSH affected DNA methylation patterns in lung adenocarcinoma (LUAD) cells. We analyzed 12,239 differentially methylated regions (DMRs) for CpG methylation and 1605 DMRs for non-CpG methylation. LSH did not change CpG methylation at a global level, but partially change the methylation of CpG islands (CGIs). Interestingly, LSH reduced approximately 30% of CHH methylations at CGIs and consistently decreased CHG and CHH methylation patterns. However, LSH increased the CHH methylation level in satellite and simple-repeat region. The KEGG analysis showed LSH to be linked with a markedly enrichment in focal adhesion and the PI3K-Akt pathway. Further analysis indicated that LSH might epigenetically regulate the binding of transcription factors, including SP2, etc. Conclusions: These findings suggest that LSH have no remarkable influence on CpG methylation but it partially alter non-CpG methylation and repeat sequences in LUAD. LSH inhibits non-CpG methylation in cancer and may function as a regulator to promote tumorigenesis through inhibiting non-CpG methylation. bind to in promoters and gene body which were induced by LSH. results indicate that LSH may epigenetically the ability of TFs to bind to DNA. We also hold that LSH affects non-CpG methylation throughout the gene regions in either the coding (exon) or noncoding regions (intron, 3’UTR) in certain genes. However, additional experiments and mechanistic studies are clarify this.

the LUAD cells and found that LSH inhibited non-CpG methylation and consistently decreased DNA methylation of CHG and CHH.

Efficient detection of Global Methylation Patterns by Reduced Representation Bisulfite Sequencing (RRBS)
Our previous work established stable expression of LSH in H358 cells [32], we then performed western blotting to confirm the successful establishment of LSH expression in H358 cells (supplementary Figure S1A). To identify regions that were differentially methylated among the lung adenocarcinoma cells, we carried out representation bisulfite sequencing on cellular DNA (H358-vector and H358-O/E LSH) using Illumina sequencing by 10-fold coverage synthesis technology [36][37][38]. We first observed the average ratio of DNA methylation patterns in the vector and LSH groups; no obvious changes were found in these two groups (supplementary Figure S1B). There was no apparent difference in the bisulfite conversion (BC) ratio in these two groups (supplementary Figure S1C). We then investigated whether the CpG or non-CpG methylation levels changed on a genome-wide scale over the course of LSH expression. For an in-depth analysis, we merged data from the vector and LSH to compare them. By merging all CpG or non-CpG sites from these two groups, we found that the global CpG methylation level remained the same for < 10% of the reads (pink column) and differed for 90-100% of the reads (purple column) in these two groups, whereas the CHG and CHH methylation levels (< 10% of reads, pink column) were decreased in the LSH group (Fig. 1A). Thus, the percentages of methylated CHG and CHH continued to decrease upon the expression of LSH. Next, we plotted the average methylation levels of nonoverlapping 1 Mb sliding windows along each chromosome. The global changes in non-CpG methylation patterns showed visibly decreased in multiple windows between the vector and overexpressed LSH cells, such as at chromosome 8, 16, 20 (Fig. 1B). These results indicated that CpG methylation levels remained at the same global level, and non-CpG sites were consistently not methylated in the LSH group compared to the vector.

LSH affects DNA methylation levels of different functional regions
We divided all mC patterns into specific gene features: upstream 2 kb (prepromoter 2000 bp), 5'UTR (untranslated region), exons, introns, 3'UTR, downstream 2 kb, CGIs (CpG islands), CGI-shore and repeats (repeat sequence). The methylation levels were evaluated in these functional regions. A violin graph was plotted with dots representing different methylation levels at gene element. We found that the methylation levels had no marked differences in regions of the violin plot for CG methylation types, while, the non-CpG methylation levels of LSH group were lower in some regions compared to vector, such as intron, upstream 2 k and CGI. An approximately 30% reduction of CHH types at CGIs was found in the LSH group ( Fig. 2A). Our previous study showed that LSH promoted genome stability by silencing satellite expression through affecting the 5-hmC levels in pericentromeric satellite repeats [26]. However, the levels of CpG methylation and non-CpG methylation at repeat regions showed no evidently differences between the vector and LSH group. We then subdivided the repeat region into specific genes: short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs), long terminal repeats (LTRs), satellites, DNA transposons, simple repeats, and low complexity. The results showed an increased level of CHH methylation at satellites and simple repeat regions (Fig. 2B). We also measured the DMR-related genes(DMGs) methylation level. The methylation level in the specified regions of these two groups(vector and LSH) were similar at the CpG methylation pattern, while the methylation level for the CG type were higher than CHG and CHH types. In addition, high levels of non-CpG methylation were detected in upstream, exon and intron region with LSH expression. Interestingly, the CHG and CHH methylation level of DMGs in LSH group showed hypomethylation and stable in each functional element compared to vector (Fig. 2C).
Overall, LSH did not affect CpG methylation but reduced non-CpG methylation, except for at satellite and simple repeat elements, in the human genome. Thus, LSH may play a unique role in the distribution of non-CpG methylation during lung adenocarcinoma development.

Characterization of DMRs; enrichment through GO and KEGG analysis
DMRs were detected and annotated into gene functional elements according to different methylation types. In total, 12,239 CG DMRs, 342 CHG DMRs and 1263 CHH DMRs were identified, most of which were in distal intragenic regions, with only 1255 (CG), 28 (CHG) and 86 DMRs in promoter regions (Fig. 3A). The length of the DMRs with different methylation patterns were measured (Fig. 3B). We further analyzed the correlation between LSH, hypo-DMRs, and hyper-DMRs for different DNA methylation types. We found that LSH inhibited hyper-DMRs non-CpG methylation and have an opposite effect on hyper-DMRs, however, there was no apparently change on the CpG methylation pattern (Fig. 3C). DMR cluster analysis showed that LSH mainly inhibited non-CpG methylation (CHG and CHH), and few changes were found in CpG methylation (Fig. 3D). For all methylation types, the ratio of DMRs located in introns was the highest, except for that in distal intergenic regions (Fig. 4A).
The evidence showed that DNA methylation modified the DNA binding specificity of transcription factors (TFs) [39]. We then observed the ability of TFs binding to DNA, a heat map was generated using a cluster analysis of the DMRs (TF binding sites per 100 DMRs) in the promoter and gene body with high level of DNA methylation or low level of DNA methylation in the LSH group. We found that the promoter region showed a high level of TFs binding sites compared to the gene body.
Furthermore, hypomethylation enhanced the ability of TFs bind to DNA both in promoter and gene body region which was found to be in the LSH group with low level of DNA methylation (Fig. 4B), eg,SP2, SP3, etc. These results suggest that LSH may epigenetically regulate the ability of these TFs bind to DNA through affect DNA methylation at the gene promoter region.
To probe changes in the methylation status of gene functions under prolificacy conditions, the GO and KEGG pathway databases were analyzed to characterize the DMR-related genes (DMGs) for different methylation types. The GO analysis revealed that for CG, CHG, and CHH methylation, DMGs were markedly enriched in the categories of protein binding and plasma membrane for CG methylation and CHG methylation, and the cellular component of CHH methylation was related to the cytoplasm. The molecular function category of CG methylation was transcription DNA-template, while the non-CpG methylations (CHG, CHH) were clustered at homophillic cell adhesion (Fig. 4C). The KEGG analysis revealed that for different methylation types, the DMGs were enriched in the metabolic pathways of CG methylation. CHG methylation was enriched in metabolic pathways and the cGMP-PKG signaling pathway. The most enriched molecular functions of CHH were focal adhesion and the PI3K-Akt pathway (Fig. 4D). Importantly, we also found that some DMGs were involved in regulating the actin cytoskeleton. Taken together, our results indicate that these specific genes, which are influenced by CG methylation, mainly affect metabolism in cancer. The influence of LSH in non-CpG methylation mostly associated with cancer development and the cellular cytoskeleton, subsequently impacting cancer cell proliferation.

Verification of sequencing results
To validate the sequencing results, we performed Venn analysis to identify the correlation between RRBS data and our previous transcriptome data in these two groups [27]. A total of 12 genes were identified to be upregulated and hypomethylated by LSH expression (Fig. 5A). Next, we observed the relationship between gene expression and methylation type in a systematic way. The transcription site (TSS) regions had a lower level of DNA methylation and higher gene expression compared with other regions, and this was not related to the DNA methylation type (CG, CHG or CHH) or LSH expression (Fig. 5B). The DNA methylation levels in the upstream 2 kb and downstream 2 kb regions were moderate. The non-CpG methylation patterns were relatively stable in the gene body region; however, the CpG methylation was disordered when integrated with the transcriptome and methylation data. This phenomenon may be involved when non-CpG methylation patterns mostly occur in the gene body region. However, there was no difference between transcriptome and methylation sequencing in our system. We next investigated whether the differences were specifically reflected in the individual differences of certain genes. Four genes were randomly selected from the Venn analysis results in which DMRs were located in different regions. Our results showed that the CG methylation levels of transient receptor potential melastatin 5 (TRPM5), growth arrest-specific 2-like 1(GAS2L1), intercellular adhesion molecule 5 (ICAM5) and solute carrier family 10 member 7 (SLC10A7) were not clearly changed compared to vector. However, the non-CpG methylation levels were apparently decreased under LSH expression (Fig. 6). To predict the DNA methylation level of these genes in normal and cancer samples we used MethHC, which is a DNA methylation and gene expression database for human cancer (http://methhc.mbc.nctu.edu.tw/php/index.php). GAS2L1 showed decreased DNA methylation levels in cancer despite its different regions (promoter, CpG islands and gene body) (supplementary Figure S2A). The DNA methylation levels of CpG islands and the gene body in TRPM5 were significantly reduced in cancer, while the promoter region methylation level had no significant change (supplementary Figure S2B). Regions of CpG islands and the 5'UTR in ICAM5 displayed increased DNA methylation levels in cancer, but there was no significant change in the promoter region (supplementary Figure S2C). SLC10A7 also showed significantly decreased levels of methylation in CpG islands and the 3'UTR (supplementary Figure S2D). The TCGA database results show that these genes were significantly upregulated in cancer (except GAS2L1 was downregulated) (supplementary Figure S3A). We then carried out a correlation analysis to identify the relationship between LSH and these genes. Our results revealed that TRPM5 was positively correlated with LSH, while the other genes were negatively correlated with LSH expression (supplementary Figure S3B).
We further used the KM-Plotter database (http://kmplot.com/analysis/index.php?p=background) to determine their prognosis in LUAD. High expression of TRPM5 and GAS2L1 indicated a poor prognosis in LUAD; SLC10A7 contributed to the LUAD prognosis, and ICAM5 had no significant correlation with prognosis in LUAD (supplementary Figure S3C). Taken together, these results indicate that LSH affects non-CpG methylation in certain genes and may epigenetically regulate their expression in LUAD.

Discussion
DNA methylation is the main feature of the epigenetic regulatory mechanism that plays an important role in the regulation of gene expression. Previous studies have demonstrated a role of LSH in supporting CpG methylation in normal cell developmental genes [23,40], but there are few reports regarding cancerous non-CpG methylation. RRBS, which allows unbiased genome-wide DNA methylation profiling, has allowed us to investigate prolificacy-related DNA methylation in unprecedented detail [38]. In this study, we used RRBS to investigate the DNA methylation profile of H358 cells to discover the relationship between different methylation types and LSH. We found that LSH inhibits non-CpG genome methylation, including at repeat sequence in cancer, but it does not clearly affect CG methylation. Further correlation analyses indicated that several DMR-related genes were most likely involved in cancer development and cell proliferation.
For the final analyses, we merged samples from the two groups, and this merging not only increased the genomic coverage significantly but also minimized the impact of genome complexity. However, individual characteristics, such as cell growth status, and differences in stress responses from stimulated cell proliferation could be confounding factors. Nevertheless, our findings that LSH affects non-CpG methylation during cancer echo an earlier study on the effect of LSH on genome instability; this study found that LSH promotes genome stability through silencing satellite expression [26].
Despite the alteration of DNA methylation being considered as a classical hallmark of cancer [41,42], the role of non-CpG methylation remains controversial, and its relevance in normal and cancer processes is poorly understood. The higher methylation levels of non-CpG sites in human pluripotent stem cells and brain cells is consistent with previous reports that found a high potential for maturation or further differentiation [13,20,43]. Non-CpG methylation has been reported in other However, the establishment of non-CpG methylation in cancer is poorly studied. LSH functions as an oncogene by promoting metastasis and genome instability and is involved in lipid metabolism in cancer cells and epigenetic regulates stem cell fate [27,50,51]. Our study found that LSH inhibits non-CpG methylation except for in repeat regions (CHH methylation). This may be connected with its ability to promote genome stability. However, there is no evidently change in CpG methylation when LSH is overexpressed. The KEGG results showed that CpG methylation is associated with metabolic and non-CpG methylation and is mostly involved in the processes of cancer survival, migration and development, such as focal adhesion, regulation of the actin cytoskeleton and PI3K-Akt signaling.
TRPM5, a sodium-selective TRP channel, functions as a regulator of calcium uptake and nonselective cationic channels and was found to be upregulated in Wilms' tumor and rhabdomyosarcomas, which may be associated with tumorigenesis [52,53]. GAS2L1 is a GAS2-like family protein that interacts with actin filaments and microtubules. Evidence has shown mediated centrosome disjunction and movement [54] and epigenetically silenced genes in acute myeloid leukemia [55]. ICAM5 is a member of the intercellular adhesion molecule (ICAM) family that supports breast and prostate cancer susceptibility loci and promotes tumorigenesis and perineural invasion in head and neck cancer [56,57]. SLC10A7 is involved in glycosaminoglycan synthesis, and its splicing is associated with colorectal cancer prognosis [58,59]. These genes are associated with tumor development, and our results indicate that LSH regulates the non-CpG methylation of these genes but not their CpG methylation. To exclude the possibility that inhibition of non-CpG methylation is induced by the exogenous overexpression of LSH, we performed western blotting to clarify this in lung cancer cell lines (A549 and H358). LSH seems to work as a driver to transform tumors into malignant tumors; in short, it may function as a regulator to promote tumorigenesis through inhibiting non-CpG methylation.
Transcription factors can serve as readers of DNA methylation or induce changes in DNA methylation states upon binding to target sequences [60]. We found that hypermethylation reducedd TFs bind to DNA in promoters and gene body which were induced by LSH. These results indicate that LSH may epigenetically regulate the ability of TFs to bind to DNA. We also hold that LSH affects non-CpG methylation throughout the gene regions in either the coding (exon) or noncoding regions (intron, 5'UTR, 3'UTR) in certain genes. However, additional experiments and mechanistic studies are needed to clarify this.

Conclusion
In conclusion, our study utilized the RRBS technology to investigate changes in DNA methylation types upon the expression of LSH. Our findings serve as a basis for further exploring the relationship between cancer and non-CpG methylation. The results support the hypothesis that LSH inhibits non-CpG methylation in cancer and that this reduction in non-CpG methylation may affect TFs and is reflected in certain genes. The biological role that non-CpG methylation may play in cancer development and the mechanism of LSH epigenetic regulation of TFs requires further investigation.

Plasmids and lentivirus transfection
The LSH lentiviral construct was generated by inserting the LSH cDNA into plvx-EF1a-puro vector, and an empty vector was used as negative control (Clontech, Mountain View, CA). All plasmid vectors were verified by sequencing. Plasmid transfection was performed using LipoMax (Sudgen Biotech, Bellevue, WA, USA), in accordance with the manufacturer's protocol. Cell colonies were selected using puromycin (1 μg/ml). The overexpression of LSH was confirmed by western blot.

DNA, bisulfite treatment preparation
The cells were collected and the genomic DNA was extracted using a genomic DNA kit (Sangon Biotech, Cat.#B518201-0100); bisulfite conversion was then performed using the EZ DNA Methylation Direct kit (Zymo Research Corporation, Cat#D5020). The concentration and quantity of DNA were measured by a NanoDrop instrument (NanoDrop Technologies, Wilmington, DE, USA). All operations were conducted following the manufacturers' recommended instructions.

RRBS library preparation and data analysis
The RRBS libraries were prepared according to previously published protocols [61]. Briefly, genomic DNA was digested with the MspI enzyme, followed by end-repair and ligation of sequencing adaptors.
The fragments were then size-selected (40-220 bp) and bisulfite-converted prior to a PCR amplification step. The quality of the libraries was checked using a bioanalyzer, and two libraries were sequenced on an Illumina HiSeq X Ten machine (100 bp, single-ended run). The peak signals produced by the Illumina HiSeq were transformed into a base sequence using base calling of the raw data or raw reads. The raw reads were then filtered for subsequent information analysis to ensure the quality of the information analysis, including the removal of reads that had adapters and filtering reads with more than 10% N content or more than 50% low-quality bases. The final filtered data were regarded as clean reads.

Mapping reads to known genome
Sequencing reads must be aligned with a reference genome before conducting methylation analysis.
Bismark software was used to perform a comparison of the alignments of bisulfite-treated reads to a reference genome using the default parameters. Reads that aligned with the same region of the genome were regarded as duplicates. The number of duplicates was used to summarize the sequencing depth and coverage. The conversion rate of bisulfite was calculated as the percentage of the methylated clean reads as a percentage of the total number of clean reads in the lambda genome using the Bismark software. Unmethylated cytosine from the genome was converted into T after bisulfite treatment and PCR amplification, but the methylated cytosine remained unchanged. Bismark was able to extract information about genomic cytosine sites by comparing the clean reads with the reference genome, thereby gaining cytosine site coverage statistics and the number of different types (CG as CpG, CHG and CHH) of methylated cytosine reads. As the methylation single C sites cannot be discriminated by Bismark, we used the binomial distribution test for each C site to confirm the methylated C site by screening conditions for coverage ≥4× and a false discovery rate (FDR) < 0.05.

Estimating methylation levels and the identification of DMRs
All cytosine sites with read coverages >10× were used for DMR analysis with MOABS [62]. First, to detect the methylated C sites in a region, we defined as the number of methylated reads at a single C site, as the number of unmethylated reads at a single C site, as the position of C, and as the total The binomial distribution test was used to determine whether the C site was methylated.
Subsequently, DMRs were defined as those with at least three different methylation sites in the region in which the difference in methylation level was greater than 0.2 (0.3 for CG type) with a p value from Fisher's exact test of less than 0.05. The detailed DMRs were listed in supplementary file table1 to 3.
The methylation level of a region was calculated as follows [63].

Bioinformatics analysis of DMGs
The DMGs were compared with functional databases such as GO, COG (Cluster of Orthologous Groups of proteins) and KEGG (Kyoto Encyclopedia of Genes and Genomes) by BLAST to obtain the annotation of these genes for analyzing gene function. The GO enrichment analysis was implemented by Wallenius noncentral hypergeometric distribution in the GOseq R package [64]. KOBAS software was used to assess the statistically enrichment of differentially expressed genes in the KEGG pathways Circles from outside to the center: cytochromosome, CHH methylation level (green, vector; red, LSH), CHG methylation level, CpG methylation level. Each peak represents a 1-Mb bin. Circles from outside to the center: cytochromosome, CHH methylation level (green, vector; red, LSH), CHG methylation level, CpG methylation level. Each peak represents a 1-Mb bin. Figure 2 overall distribution of methylation levels for different methylation types in the regions (CpGs, CHGs, CHHs) for each group. B. Violin plot for the distribution of methylation levels at repeat-specific genes. C. The methylation level of CpG, CHG and CHH in different functional elements; blue, vector; red, LSH. Each line consists of the mean and 95% confidence interval. CGI (CpG island): a region with at least 200bp, a GC percentage greater than 50%, and an observed-to-expected CpG ratio greater than 60%, as defined in the UCSC genome browser. CGI-shore (CpG island shore): 2-kb regions flanking a CpG island. Figure 2 overall distribution of methylation levels for different methylation types in the regions (CpGs, CHGs, CHHs) for each group. B. Violin plot for the distribution of methylation levels at repeat-specific genes. C. The methylation level of CpG, CHG and CHH in different functional elements; blue, vector; red, LSH. Each line consists of the mean and 95% confidence interval. CGI (CpG island): a region with at least 200bp, a GC percentage greater than 50%, and an observed-to-expected CpG ratio greater than 60%, as defined in the UCSC genome browser. CGI-shore (CpG island shore): 2-kb regions flanking a CpG island.