In silico analysis of differentially expressed-aberrantly methylated genes in breast cancer for prognostic and therapeutic targets

Breast cancer (BC) is the leading cause of death among women across the globe. Abnormal gene expression plays a crucial role in tumour progression, carcinogenesis and metastasis of BC. The alteration of gene expression may be through aberrant gene methylation. In the present study, differentially expressed genes which may be regulated by DNA methylation and their pathways associated with BC have been identified. Expression microarray datasets GSE10780, GSE10797, GSE21422, GSE42568, GSE61304, GSE61724 and one DNA methylation profile dataset GSE20713 were downloaded from Gene Expression Omnibus database (GEO). Differentially expressed-aberrantly methylated genes were identified using online Venn diagram tool. Based on fold change expression of differentially expressed-aberrantly methylated genes were chosen through heat map. Protein–protein interaction (PPI) network of the hub genes was constructed by Search Tool for the Retrieval of Interacting Genes (STRING). Gene expression and DNA methylation level of the hub genes were validated through UALCAN. Overall survival analysis of the hub genes was analysed through Kaplan-Meier plotter database for BC. A total of 72 upregulated-hypomethylated genes and 92 downregulated-hypermethylated genes were obtained from GSE10780, GSE10797, GSE21422, GSE42568, GSE61304, GSE61724, and GSE20713 datasets by GEO2R and Venn diagram tool. PPI network of the upregulated-hypomethylated hub genes (MRGBP, MANF, ARF3, HIST1H3D, GSK3B, HJURP, GPSM2, MATN3, KDELR2, CEP55, GSPT1, COL11A1, and COL1A1) and downregulated-hypermethylated hub genes were constructed (APOD, DMD, RBPMS, NR3C2, HOXA9, AMKY2, KCTD9, and EDN1). All the differentially expressed hub genes expression was validated in UALCAN database. 4 in 13 upregulated-hypomethylated and 5 in 8 downregulated-hypermethylated hub genes to be significantly hypomethylated or hypermethylated in BC were confirmed using UALCAN database (p < 0.05). MANF, HIST1H3D, HJURP, GSK3B, GPSM2, MATN3, KDELR2, CEP55, COL1A1, APOD, RBPMS, NR3C2, HOXA9, ANKMY2, and EDN1 were significantly (p < 0.05) associated with poor overall survival (OS). The identified aberrantly methylated-differentially expressed genes and their related pathways and function in BC can serve as novel diagnostic and prognostic biomarkers and therapeutic targets.Please confirm if the author names are presented accurately and in the correct sequence (given name, middle name/initial, family name). Author 4 Given name: [Jeewan Ram] Last name [Vishnoi]. Also, kindly confirm the details in the metadata are correct.It is correct


Introduction
Worldwide, the most common cancer among women is BC following lung cancer [1].It has been seen that therapeutic strategies have considerably improved the recovery rate for BC.A number of treatments are available for BC such as chemotherapy, radiotherapy, hormone therapy, targeted therapy and surgery [2].To understand functions of tumourrelated genes and roles of tumour cell signalling pathway, genetic studies exploring functional genomics are performed in cancer.
Systemic treatment of BC includes chemotherapy that might either neoadjuvant or adjuvant.Presently, treatment includes a simultaneous administration of schemes 2-3 of the following drugs-carboplatin, 5-fluorouracil/capecitabine, cyclophosphamide, taxanes (paclitaxel, docetaxel), and anthracyclines (doxorubicin, epirubicin).The choice of the proper drug depends on the molecular subtypes of BC [3].Chemotherapy has several side effects like hair loss, diarrhoea, fatigue, increased susceptibility to infections, Table 1 Screening of upregulated-hypomethylated genes and downregulatedhypermethylated genes in seven GSE datasets, six genes expression microarray datasets (GSE10780, GSE10797, GSE21422, GSE42568, GSE61304, GSE61724) and one genes methylation microarray dataset (GSE20713) using online Venn diagram tool

Datasets
Total Elements GSE10780 (upregulated) GSE10797 (upregulated) GSE21422(upregulated) GSE42568 (upregulated) GSE61304 (upregulated) GSE61724 (upregulated) GSE20713 (hypomethylated) bone marrow suppression, and anaemia.Radiotherapy is a local treatment, usually provided after surgery and/or chemotherapy.It ensures that all the tumorigenic cells remain destroyed to minimize the BC recurrence.Further, radiation therapy is favourable in the case of metastatic or unresectable BC [4].Endocrinal therapy also might be used either as a neoadjuvant or adjuvant therapy which aims to decrease the estrogen levels or prevents BC cells to be stimulated by estrogen.Drugs that block estrogen receptors include selective estrogen receptor modulators (SERMs) (tamoxifen, toremifene) and selective estrogen receptor degraders (SERDs) (fulvestrant) while treatments that lower the estrogen levels include aromatase inhibitors (AIs) (letrozole, anastrazole, exemestane) [5].Endocrinal therapy in combination with chemotherapy is associated with the decreased rate of mortality in BC patients.Immunotherapy is common in HER2-positive BC patients, drugs include trastuzumab, pertuzumab, trastuzumab deruxtecan, lapatinib, and neratinib [6].Nowadays, antibody drug conjugates (ADCs) as a targeted therapeutics hybrid molecule have shown possibility to make an ideal shift in the field of cancer therapy through the antibody-antigen interaction [7,8].Most promising among these strategies are PD-L1 or PD-1 monoclonal antibodies (mAb) for the treatment of patients with advanced BC with triple-negative.And it has also been studied that immune checkpoint inhibitor (ICI) monotherapy shows effective response in a small cohort of patients with metastatic triple-negative BC and several biomarkers are also being evaluated for this therapeutic strategy [9,10].
Another study showed that there is no significant association between negative PD-L1 level with early death (ED) and high expression of PD-L1 showed a higher probability of ED with ICI-only as compared to non-ICI therapies.Furthermore, significant reduction in the risk of ED when ICI combined with other agents compared to ICI-only in cancers with positive and high PD-L1 expression.Altogether these data suggest that population with positive and high PD-L1 expression of ED occurs and that negative PD-L1 expression cannot predict ED [11].Alteration of gene expression plays an important role in carcinogenesis, progression and metastasis of BC and it is now considered that it is not only caused by genetic defects (such as TP53, BRCA1/BRCA2 inactivation, PIK3CA mutation, Cyclin D1 amplification) but epigenetic modifications also plays an important role [12].Interestingly, many evidences have demonstrated that epigenetic mechanisms play a crucial role in BC and may be useful in the treatment, prevention, prognosis, and follow-up of the disease.DNA methylation, histone post-translational modifications, and miRNA are three most important epigenetic modifications that interfere with the expression of genes, and can therefore show involvement in cancer development [12].DNA methylation on "CpG" islands is the most commonly studied epigenetic modification [13].DNA methylation is the addition of the methyl group (-CH 3 ) on the 5-carbon of cytosine mostly located on cytosine-phosphate-guanine (CpGs) dinucleotides [14].Aberrations in DNA methylation that occur in cancer can be of different types, such as hypermethylation of gene-locus which leads to the silencing of tumour suppressor genes, or unique genes and repetitive sequences hypomethylation leading to the expression of oncogenes [12].Therefore, aberrant DNA hypo-and hypermethylation patterns have been identified as critical factor in carcinogenesis which is promoting the expression of oncogenes or silencing of tumour suppressor genes.It has been seen that in early diagnosis of multiple cancer including BC DNA methylation has always been observed in cancer cells, with stable inheritance from parent to daughter cells.Therefore, more and more biomarkers which are very specific to DNA methylation have been developed that can help for early diagnosis of cancer [15,16].It has been reported that with 4 CpGs DNA methylation can differentiate the healthy people and BRCA patients, with a sensitivity of over 97% and a specificity of nearly 91% [17].
In our study, we broadly investigated the genome-wide DNA methylation landscape of BC and their adjacent normal tissues from the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA).We explored the gene expression level and methylation status between BC tissue samples and normal breast tissue samples to identify critical genes whose expression levels were altered due to DNA methylation.Our in silico analysis results will help to get better disease insights, diagnosis, prognosis and new therapeutic targets of the disease for further research in BC.

Data collection
From GEO datasets of NCBI (https:// www.ncbi.nlm.nih.gov/ gds/), BC expression microarray and methylation microarray datasets were taken in which BC and normal breast samples were utilized and were sorted by sample number (from high to low).Six gene expression datasets GSE10780, GSE10797, GSE21422, GSE42568, GSE61304, GSE61724 and one DNA methylation profile dataset GSE20713 were downloaded.42 BC and 143 normal breast tissues were selected in GSE10780.56 BC and 10 normal breast tissues were enrolled in GSE10797.14 BC and 5 normal breast tissues were selected in GSE21422.104 BC and 17 normal breast tissues were selected in GSE42568.58 BC and 4 normal breast tissues were selected in GSE61304.64 BC and 4 normal breast tissues were selected in GSE61724.For methylation dataset, GSE20713 88 BC and 2 normal breast tissues were selected (Platform: GPL570, Illumina Human-Methylation27 Bead Chip).

Screening for upregulated-hypomethylated and downregulated-hypermethylated genes
GEO2R, an online tool for analysing GEO Datasets, was utilized to examine differentially expressed genes (DEG) in expression microarray datasets and differential methylation status of genes in methylation microarray datasets between BC tissues and normal tissues.p value < 0.05 and Fold Change (FC) > 1.25 were used as cutoff criteria to identify upregulated-hypomethylated genes and P value < 0.05 and Fold Change (FC) < 0.75 were used for downregulatedhypermethylated genes in BC.The overlapping DEGs among all the seven datasets of BC and normal breast tissue (GSE10780, GSE10797, GSE21422, GSE42568, GSE61304, GSE61724, and GSE20713) were identified using the online Venn diagram tool (bioinformatics.psb.ugent.be/webtools/Venn/).Through heat maps and violin plots using the R limma (linear models for microarray data) package and Orange Data Mining software the fold change expression distribution was visualized [18].

Functional enrichment analysis
Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, and Gene Ontology (GO) pathway was analysed of upregulated-hypomethylated and downregulatedhypermethylated genes by the Database for Annotation, Visualization and Integrated Discovery (DAVID) (https:// david.ncifc rf.gov/) and PPI (protein-protein interaction) network was constructed by Search Tool for the Retrieval of Interacting Genes (STRING, https:// string-db.org/) database in BC.For upregulated-hypomethylated genes, the median confidence for the interaction is 0.4.22 nodes and 38 edges were taken and PPI enrichment p value is 0.00.For downregulated-hypermethylated genes, the median confidence is 0.4 and 18 nodes and 29 edges were taken and PPI enrichment p value is 0.00.PPI interaction network was visualized by Cytoscape software (v.3.8.2), which is an open-source visualization software.

Validation of gene expression and gene methylation in UALCAN database
UALCAN (http:// ualcan.path.uab.edu/), an online cancer transcriptome database, is designed to give easy access to cancer transcriptome data (TCGA and MET500 transcriptome sequencing) which is publicly available [19].For comparing expression level and gene methylation status of hub genes between BC and normal breast tissue, this database was used.

Survival analysis of genes in Kaplan-Meier plotter database
Kaplan-Meier plotter database was used to perform clinical outcome analysis of hub genes in BC [20].This database has gene expression data and survival information of 1,809 patients from GEO (Affymetrix HGU133A and HGU133 + 2 microarrays).Overall survival (OS) analysis was also performed of the hub genes separately in its default settings with 300 months follow-up.

GO functional and KEGG pathway enrichment
Functional enrichment analysis of 72 upregulated-hypomethylation genes and 90 downregulated-hypermethylation genes was performed by GO analysis.The results are listed in Table 1.The most significant (p < 0.05) with lowest false discovery rate (FDR) biological process, cellular component and molecular function were considered.Upregulatedhypomethylation genes are enriched in cell proliferation, nucleus, protein kinase activity, microtubule binding.Downregulated-hypermethylation genes are enriched in negative regulation of epithelial cell proliferation, Wnt signalling pathway, adherence junction, protein binding (Fig. 1, A-C,  E-F).
From upregulated-hypomethylation genes, the most significant pathways were p53 signalling pathway, pathways in cancer, cell cycle.For downregulated-hypermethylation genes, the most significant pathways were BC, PI3K-Akt signalling pathway, Ras signalling pathway which play important roles in life process were also enriched in KEGG pathway analysis (Fig. 1D, H).

PPI network construction and Hub genes selection
Hub genes were selected through fold change expression of common genes from six non-methylated datasets and one methylated dataset and visualized by heat maps and violin plots using the R limma package and Orange Data Mining software (Figs. 2, 3, 4, 5).For upregulated-hypomethylated genes, heat map and violin plot is shown in Fig. 2A, B, D.

Survival analysis of hub genes through Kaplan-Meier plotter database
To analyse prognostic value of 20 differentially expressedmethylated hub genes of BC, Kaplan-Meier plotter database restricted to BC was searched.Upregulation-hypomethylation of MANF, HIST1H3D, HJURP, GSK3B, GPSM2, MATN3, KDELR2, CEP55, COL1A1 (Fig. 10) Fig. 2 Gene expression comparison of common genes from seven datasets and selection of hub genes.A Heat map shows the fold change expression of upregulated-hypomethylated common genes obtained from venn diagram tool.B Heat map of selected hub genes from seven datasets.C Bar graph for comparation of FCE of 13 hub genes in all seven datasets.D Violin plot between FCE and hub genes in all seven datasets ◂ and downregulation-hypermethylation of APOD, RBPMS, NR3C2, HOXA9, ANKMY2, EDN1 (Fig. 11) genes were significantly associated with poor OS.

Discussion
BC is a very common cancer among all the cancers and it causes high rate of mortality worldwide.Its early diagnosis in developing countries has always been a challenge.
Copy number variation and DNA mutation studies have been found less helpful in BC as compared with different cancers (e.g.lung and colon), may be due to the greater diversity of BC subtypes [21,22].Aberrant DNA methylation is prevalent in BC; therefore, it has been used as early detection biomarkers for cancer diagnosis.Many studies have focused on abnormal methylation of CpG islands in the promoter region of genes.Therefore, identifying DNA methylation-induced genes and studying their association with treatment outcomes could help to predict the risk of BC patients, leading to a more specialized clinical treatment [23].The advantages of using DNA methylation as biomarker is, firstly, prevalence of CpG islands aberrant methylation is higher than those of mutations and methylation that are measured by genome-wide screening.Secondly, DNA methylation can serve as biomarkers as even subtle changes in CpG islands methylation can contribute to tumorigenesis and their prediction and validation paves way for future research in this direction.Thirdly, techniques involved in the detection of methylation patterns are relatively simple [24].
MATN3 from KEGG pathway enrichment analysis by DAVID is involved in extracellular matrix remodelling in BC, also has higher gene expression in BC patients as compared to normal tissue and is related to poor prognosis [25][26][27].MATN3 is further involved in EMT-promoting function in basal BC [28].
The G1 to S phase transition 1 (GSPT1) gene encodes the peptide chain release factor GTP-binding subunit ERF3A which is essential peptide for eukaryotes.Besides, the role in terminating protein translation, this multifunctional GTPase is highly involved in mitosis and apoptosis [29].GSPT1 gene plays an important role in biological process and is highly upregulated in BC [30].It has been shown that GSTP1 interacts with GAPDH to activate its activity and inactivation of GSTP1 damages glycolytic metabolism after the GAPDH step to impair ATP generation, fatty acid and nucleotide metabolism, and oncogenic signalling pathways [31].
The pro-alpha 1 chains of type I collagen are encoded by gene collagen type I alpha 1 (COL1A1), have triple helix which comprises of two alpha 1 chains and one alpha 2 chain.It have been shown that Col1A1 was upregulated in BC tissues compared to normal breast tissues [32].In in vitro experiments, the migration and invasion of BC cells could be suppressed by the knockdown of COL1A1, suggesting that cellular expression of COL1A1 promotes the metastasis of BC and it can be a potential therapeutic target.However, the EMT-related markers were not regulated by the knockdown of COL1A1, indicating that cellular COL1A1 may promote the progression and metastasis of BC through other signal pathways [33].Interestingly, in BC cells COL1A1 is a positively regulated target gene of the Wnt/ beta-catenin pathway [34].Furthermore, ER/PR status was shown to be associated with COL1A1 expression, which can be potentially a combined biomarker for BC patients.A deleterious factor like radiation could upregulate COL1A1 to enhance cancerous tropism and can even induce secondary BC.COL1A1 contributes bone metastasis and is associated with poor prognosis and can be a therapeutic target for the control of bone metastasis [35].
COL11A1, a member of small fibre collagen, is encoded by COL11A1 gene.Many studies have shown that COL11A1 is associated with the tumorigenesis and development and plays an important role in biological processes like cell proliferation and invasion [36].It has been reported that COL11A1 is significantly upregulated in infiltrating tumour lesions compared to their adjacent stroma [37].Differential expression of COL11A1 is shown in primary BC that metastasize and their corresponding lymph node sites compared to its normal tissue and is correlated with cancer aggression, progression, and lymph node metastasis [38,39].The finding of these quantitative changes in the expression of COL11A1 could lead to novel approaches regarding predictive tools and/or prognostic marker for BC.
Human apolipoprotein D (ApoD), also known as gross cystic disease fluid protein-24, is a multifunctional 24-kDa glycoprotein expressed in most tissues, with high expression being found in the spleen, pancreas, kidneys, placenta, ovaries, testes, adrenal glands, and brain [40].ApoD employ various intracellular mechanistic roles, particularly ligand binding.Some assumed ligands reported are arachidonic acid, tamoxifen, and progesterone [41].The structure and function of ApoD studies have shown that it is involved in normal human cell physiology and have also suggested its role in cancer cell progression [41].ApoD expression in BC has been shown to be downregulated via oestrogen receptor-α (ERα) signalling [42].On the other hand, ApoD expression is induced by androgens in BC cell lines and coincides with epithelial cell proliferation inhibition of breast, most likely via androgen receptor (AR) signalling, and in the presence of the AR antagonist flutamide the effect is reversed [42].
RNA-binding proteins family contain RNA-binding protein for multiple splicing (RBPMS) which have N-terminal RNA-recognition motif (RRM) sequence.It binds to nascent RNA transcripts and their processing is regulated, including the splicing of pre mRNA and its transport, localization, and the stability of RNA molecules [43,44].RBPMS is known to counteract cell cycle progression by inhibiting some of the oncogenes for example MYC protein and Bcl-2 anti-apoptotic protein or the activation of cell cycle inhibitors including p15INK4b, p21CIP1/WAF1, and p57KIP2 leading to cell cycle arrest or apoptosis [45,46].In a study, it was shown that RBPMS overexpression significantly represses the activator protein 1 signalling activity, and therefore regulate the BC proliferation and migration [47].However, a study report suggested that RBPMS antisense RNA-1, RBPMS-AS1 expression was significantly low in tumour tissue when compared to normal tissue in the BC patients above the age 50 [48].
Nuclear receptor subfamily 3 group C member 2 (NR3C2) is an adrenal cortex hormone receptor [49].It is a transcription-dependent factor, and can bind to response elements of mineralocorticoid and interfere with the effect of aldosterone on the water and salt balance of restricted target cells.Aberrant expression of NR3C2 can lead to hypertension in pregnancy, type I pseudo-hyperaldosteronism, and chronic central serous chorioretinopathy [50].Recent studies have shown that NR3C2 was downregulated in BC cell lines and overexpressing suppressed the proliferation, migration, and invasion of BC cells.Downregulation of NR3C2 leads to the unfavourable prognosis of BC patients, and can serve as a marker for the prognosis of BC patients [51].
The HOXA gene cluster is a member of homeotic genes family which code for transcriptional regulators that take part in the development and differentiation of many multicellular organisms [52].It has been reported that HOXA1 to HOXA11 are expressed in normal breast epithelium.Therefore, HOXA9 acts as tumour suppressor gene in BC [53].It is downregulated in human BC and tumour cell lines and reduced expression of HOXA9 transcript levels associated with proliferation, metastasis, and patient prognosis [54].Methylation of HOXA9 has been reported and due to this its expression is reduced in BC [55].Therefore, HOXA9 can serve as a marker for prognosis in BC.
DNA methylation biomarkers may be used as a prognostic and predicting therapeutic outcomes.During tumorigenesis, DNA methylation changes occur early and have important role in regulating gene expression levels in tumorigenic cells.Therefore, epigenetic changes alone, or in combination with conventional chemotherapeutic agents may provide a way to resensitize drug-resistant tumours in BC [56].

Conclusion
Our study identified aberrantly methylated-differentially expressed genes and their related pathways and function in BC by using integrated bioinformatics analysis.We validated the gene expression level and methylation status of the hub genes through UALCAN, 4 upregulated genes were hypomethylated and 5 downregulated genes were Fig. 6 Protein-protein interaction.A Shows PPI of upregulated-hypomethylated hub genes with one another or through with other proteins in STRING software.B Shows PPI of downregulated-hypermeth-ylated hub genes with one another or through with other proteins.Cytoscape was used to visualize the PPI hypermethylated.From our hub genes, 15 genes showed significant poor OS in BC which can serve as prognostic markers.Our findings may deepen the understanding of the methylation-mediated regulatory mechanisms of carcinogenesis of BC and can serve as novel diagnostic and prognostic biomarkers and therapeutic targets.Furthermore, experimental and clinical studies are required to validate the findings of the present study and determine the potential clinical value of these potential biomarkers.

Limitations
This study is based on in silico analysis, so further validation is required of identified aberrantly methylated-differentially expressed genes and their pathways in patient's tumour tissue as well as healthy surrounding tissue.In vitro and in vivo emphasis is also required to validate this study in BC.However, methylation status of some genes was varied among different platforms and databases.

Fig. 4 Fig. 5
Fig. 4 Gene expression comparison of common genes from seven datasets and selection of hub genes.Heat map (A) shows the fold change expression of downregulated-hypermethylated common genes obtained from venn diagram tool.B Heat map of selected hub genes from seven datasets.C Bar graph for comparation of FCE of 13 hub genes in all seven datasets.Violin plot (D) between FCE and hub genes in all seven datasets ◂