Genetic and epigenetic modifications of HPDL and SOX17 associated with breast cancer prognosis: a study based on The Cancer Genome Atlas

BackgroundThe high heterogeneity of breast cancer (BRCA) makes it more challenging to interpret the genetic variation mechanisms involved in BRCA pathogenesis and prognosis. Areas with high DNA methylation (such as CpG islands) were accompanied by copy number variation (CNV), and these genomic variations affected the level of DNA methylation. Methods: In this study, we characterized inter-tumor heterogeneity and analyzed the effects of CNV on DNA methylation and gene expression. In addition, we performed a Genetic Set Enrichment Analysis (GSEA) to identify key pathways for changes between patients with low and high expression of genes. Results: Our analysis found that the CNV of HPDL and SOX17 is not only related to the patient's prognosis, but also related to gene methylation and expression levels affecting the patient's survival time. Conclusion: This study provided an effective bioinformatics basis for further exploration of molecular mechanisms related to BRCA and assessment of patient prognosis, but the development of biomarkers for diagnosis and treatment still requires further clinical data validation.


Background
In the post-genome era, rapidly evolving high-throughput sequencing technologies have enabled the acquisition of vast amounts of multi-omics data more efficiently [1]. The variation of expression of some genes causes the genetic regulation trajectory inside the cell to deviate, which alters the gene expression programming inside the cell. Therefore, most disease-causing genomic variants are likely to play a role by altering gene regulation, such as transcription factor binding and DNA methylation, rather than directly affecting protein function [2,3]. The high heterogeneity of breast cancer (BRCA) makes it more challenging to interpret the genetic variation mechanisms involved in BRCA pathogenesis and prognosis [4].
In human cancer, genomic instability leads to extensive cell copy number variation (CNV) [5].
Genome-wide association studies (GWASs) have been conducted for common malignancies and have identified more than 450 genetic variants associated with increased disease risk [6]. Budczies J et al. 3 found that PD-L1 CNVs were significantly associated with changes in PD-L1 mRNA expression in 22 cancer types; Tumors with increased PD-L1 expression exhibited significantly higher mutation loads, and could be, therefore, used in immunotherapeutic regimens and as predictive biomarkers [7]. In BRCA, CNV is associated with the expression of ~ 40% of genes, and thus, participate in the onset, treatment, and recurrence of BRCA, and affect the prognosis of BRCA patients [8]. CNV in BRCA1, BLM, OR4C11, OR4P4, CDH1, HEPACAM, and LOXHD1 increase the incidence of BRCA [9,10]. MCL1, MYC and JAK2, and PTEN deletions or mutations play a role in de novo or acquired chemotherapeutic resistance in triple-negative BRCA [11]. The CNV of FGFR1 and ZNF703 increase the risk of death in patients with BRCA [12,13]. Higher intratumoral heterogeneity of EGFR/CEP7 and CCND1/CEP11 CNVs predicts metastasis and is associated with significantly worse metastasis-free survival in triplenegative BRCA patients [14].
Disorders in the epigenetic state are closely related to human diseases, particularly cancer. DNA methylation is a well-characterized epigenetic modification that is closely related to many cellular processes. In the current study, DNA methylation and its sites associated with tumor recurrence and overall survival (OS) of BRCA and its subtypes have been identified based on methods employed for genome-wide DNA methylation analysis [15][16][17]. The methylation of oncogenes, ESR1 and ERBB2, and tumor suppressor genes, FBLN2, CEBPA, and FAT4, contribute to the early diagnosis of BRCA [18].
Methylation of HER2, Ki67, and GSTP1 are associated with BRCA TNM staging and tumor size and can be combined for early diagnosis and prognosis [19]. FECR1 circular RNA expression is coordinated by regulation of DNA methylation and demethylation. Upstream regulators control BRCA tumor growth [20]. However, MLH1/MSH2, methylation product of the DNMT gene may be important for the chemotherapeutic tolerance of BRCA [21,22].
CNV represents a major source of genomic variation and is an important genetic factor leading to various cancers. DNA methylation, a major means of epigenetic modification, is considered an inhibitory epigenetic marker. Several studies have found that areas with high DNA methylation (such as CpG islands) are accompanied by copy number variation, and these genomic variations affect the level of DNA methylation [23]. For example, in lung adenocarcinoma, DNA methylation heterogeneity 4 demonstrates branch clonal evolution of lung adenocarcinoma regions driven by genomic instability, and subclone copy number variation [24]. Here, we investigated the association between genomic variation (such as CNV) in regulatory regions of BRCA and corresponding changes in DNA methylation.
In addition, we performed a Genetic Set Enrichment Analysis (GSEA) to identify key pathways for changes between patients with low and high expression of genes. Thus, an in-depth study of the genome pathogenesis of BRCA was conducted to identify prognostic biomarkers and their clinical efficiency.
Our analysis found that HPDL and SOX17, tumor suppressor genes, can cause malignant transformation of cells and cause tumorigenesis when mutations, deletions, or inactivations occur. In addition, the results of the study indicate that the CNV of HPDL and SOX17 is not only related to the patient's prognosis, but also related to gene methylation and expression levels affecting the patient's survival time.

Materials And Methods
Data processing and analysis TCGA Data Portal was terminated, and all TCGA data were transferred to the Genomic Data Commons (GDC). Therefore, data from The Cancer Genome Atlas (TCGA) of BRCA-related methylation, CNV, gene expression, and clinical data were downloaded from GDC (https://gdc.cancer.gov/). The chisquare test, and Limma and Edger software packages were used to collate and analyze the downloaded data and screened according to P and logFC values. To obtain differences in CNV, abnormally methylated and dysregulated genes between BRCA tissue samples and normal tissue samples were analyzed. The data from the TCGA database is public. Therefore, no approval from the local ethics committee was required.
Multi-layer correlation analysis method predicts the pattern of gene CNV in BRCA DNA methylation has been shown to regulate gene expression in a variety of ways, such as changing chromosome structure, DNA stability, etc. In addition, CNV is widely distributed in the human genome and has important biological implications. To further explore the link between CNV and methylation on gene expression, the possible patterns of CNV in BRCA need to be elucidated. This study focuses 5 on the analysis of correlation between abnormal methylation and gene expression, CNV and aberrant methylation, and CNV and gene expression. Screening was done by the Pearson correlation coefficient and p-value. Key genes with simultaneous methylation abnormalities, CNV, and abnormal expression were obtained, and further prognostic analysis was performed on these genes.
Mapping of Kaplan-Meier survival curve of genes and screening of prognostic key genes In order to further identify key genes related to the prognosis of BRCA patients from the genes obtained above, survival analysis was performed on the relevant data based on the survival software package, and survival curves were plotted to show the effect of abnormal methylation and methylation combined with abnormal gene expression on patient survival. In addition, in order to further explore the methylation sites of prognostic aberrant methylation genes, the factors affecting the prognosis of patients and gene expression are mapped to specific methylation sites.

Effect Of Key Genes Cnv On Patient Prognosis
Through data analysis, it was found that the abnormal methylation of key genes is closely related to the prognosis of BRCA patients, while the key genes harbored methylation abnormalities, CNV, and abnormal expression, and there was a significant correlation between them. The effect of mutations on the prognosis of patients can be seen by studying CNV and survival time of BRCA patients, further indicating the biological significance of gene CNV in the progression of BRCA. In addition, we performed a gene set enrichment analysis (GSEA) analysis between high-expression and lowexpression groups of key genes to determine key pathways that are altered in patients with abnormal gene expression [25].

Data processing and analysis
In this study, BRCA-related methylation data downloaded from the TCGA database included 883 samples, comprising 96 normal tissue samples and 787 BRCA tissue samples. The difference analysis results obtained a total of 122 protein-coding genes with P < 0.05 and | logFC | > 1 as the cut-off condition (Fig. 1A). The CNV data included 2201 samples, 1103 normal tissue samples, and 1098 BRCA tissue samples. A total of 19178 genes with CNV were found based on the chi-square test 6 results (P < 0.05) (Supplementary material 1). The difference analysis of gene expression data between 112 normal tissue samples and 1096 cancer tissue samples showed that 2138 dysregulated genes, including 1375 upregulated genes and 763 downregulated genes (Fig. 1B), were obtained with P < 0.01 and | logFC | > 2 as the cut-off condition.

Multi-layer Correlation Analysis Method To Screen Key Genes
In order to reduce the number of calculations of correlation analysis between the two, we performed correlation analysis on the condition of genes with abnormal methylation. First, we found that 105 of the 122 genes with aberrant methylation exhibited simultaneous expression disorders. Correlation analysis showed that the aberrant methylation of 25 genes was closely related to the expression with the Pearson correlation coefficient Cor > 0.4 as the screening criterion (Table 1). Interestingly, these 25 genes harbored CNV simultaneously (Fig. 1C). To explore the pattern of effects of CNV in disease progression, we performed a correlation analysis of CNV with methylationand abnormal gene expression for 25 genes. Screening with P < 0.01 as the cut off criterion, the CNV of 12 genes was associated with the level of methylation, and the CNV of 16 genes was related to the abnormal expression level. Among them, there are 6 common genes. (Fig. 2, Supplementary material 2A,2B).
We used these six genes as key genes for prognostic survival analysis. Joint survival analysis and site-related prognostic assessment to identify biomarkers Through joint survival analysis, it was found that the combination of methylation and abnormal expression of HPDL and SOX17 was significantly associated with the prognosis of BRCA patients.
Furthermore, the results showed that high-methylation low-expression of HPDL and SOX17 showed poor prognosis (Fig. 3A). In addition, based on the survival of the R package, we analyzed the effects of the relevant methylation sites of these two genes on patient survival. P < 0.05 was used as a screening criterion for predicting prognosis, and specific methylation sites associated with the prognosis of these genes were found. Among them, the two methylation sites of HPDL and the eight methylation sites of SOX17 can affect the survival time of patients (Fig. 3B).
Kaplan-Meier survival curve analysis of the effect of gene CNV on patient prognosis The genes HPDL and SOX17 showed not only methylation abnormalities and abnormal expression, but also CNV. Further analysis showed that CNV in HPDL and SOX17 were associated with overall patient survival, in which the addition of two copies of SOX17 is associated with a lower survival rate, while a decrease in the copy number of HPDL also suggests a poor prognosis (Fig. 3C). In addition, as the CNV of HPDL and SOX17 are related to methylation and abnormal expression levels, our research indicated 8 that the CNV of HPDL and SOX17 can directly affect the prognosis of patients, and can also indirectly affect the survival time of patients by affecting the methylation and expression levels of the corresponding genes.

GSEA analysis of patients with low and high expression of HPDL and SOX17
To identify the molecular pathways of the biological functions and effects of HPDL and SOX17 in BRCA progression; we used GSEA to identify key pathways involved in the changes between patients with low and high expression of genes. With p value < 0.05 as the screening standard, the results indicated that the pathways that HPDL can affect mainly, included MAPK signaling pathway,p53 signaling pathway, etc. In addition, SOX17 mainly affects JAK STAT signaling pathway, WNT signaling pathway, and so on ( Table 2, Fig. 4). Table 2 The key pathways for the differential between low and high expression of patients based on GSEA gene promoter methylation can be used as a tumor suppressor and dysregulated oncogene (via aberrant DNA methylation) in many tumors, such as lung adenocarcinoma [26], cholangiocarcinoma [27], esophageal squamous cell carcinoma [28], colorectal cancer [29], non-small cell lung cancer [30], endometrial cancer [31], and so on. In BRCA, Fu Deyuan et al. used methylation-specific polymerase chain reaction to assess the relationship between the methylation of the SOX17gene promoter and the onset and prognosis of BRCA. Abnormal SOX17 methylation in cancer tissues and plasma DNA was found to be significantly associated with tumor lymph node metastasis and lymph node metastasis, associated with poor disease-free survival (P < 0.005) and overall survival (P < 0.005). In addition, SOX17 methylation in plasma DNA is an independent prognostic factor for DFS in BRCA [32]. Chimonidou Maria et al. found that the SOX17 promoter is highly methylated in primary breast tumors, in CTCs isolated from patients with BRCA, and in corresponding cfDNA samples, which provides new predictive ideas for recurrence and prognosis in patients with operable BRCA and metastatic patients [33,34]. HPDL may have dioxygenase activity. Previous studies have found that HPDL exhibits differential expression in CNS lymphoma compared with non-primary central nervous system (CNS) lymphoma [35]. However, understanding the role of HPDL in BRCA needs further research and interpretation, which provides an idea for the in-depth study of the molecular mechanism of BRCA.
Intracellular signaling pathways regulate various cellular activities. We performed GSEA identification on SOX17 and HPDL to further explore the small molecule regulation mechanism of BRCA and found that signaling pathways with significant changes in enrichment exist between patients with low expression and high expression. It is well known that the JAK-STAT signaling pathway, a signal transduction pathway stimulated by cytokines, is involved in biological processes, such as cell proliferation, differentiation, apoptosis, and immune regulation, and is associated with pathogenesis of many tumors, such as liver cancer, ovarian cancer, and BRCA [36][37][38].The major cellular processes during BRCA development rely on JAK/STAT signaling to coordinate growth factor function. Previous studies have found that activation of the JAK/STAT pathway is common in triple-negative BRCA, which can affect the expression of genes controlling immune signals. Dysregulated JAK/STAT signaling has 12 been implicated in BRCA metastasis, associated with high risk of recurrence [39][40][41]. The Wnt signaling pathway plays a crucial role in early embryonic development, organ formation, tissue regeneration, and other physiological processes, often involving stem cell control, which may induce cancer if a key protein is mutated [42]. Wnt signaling pathway involves the onset and treatment of colorectal cancer, pancreatic cancer, gastric cancer, and other tumors [43][44][45]. Yang Feibiao et al.
confirmed that SOX17 is a target gene of miR-194-5p. In mouse studies, knockdown of miR-194-5p in BRCA cells may increase SOX17 expression and regulate the signaling pathway of Wnt/β-catenin [46].
Therefore, increased expression of SOX17 is associated with activation of the Wnt signaling pathway and is thus involved in the pathogenesis of BRCA. In addition, the enrichment results of SOX17 include pathways related to cell growth, division, and proliferation of oocyte meiosis, ABC transporters, and neuroactive ligand receptor interaction. HPDL upregulation is related to cell cycle and P53 signaling pathway. HPDL down-regulation is related to MAPK signaling pathway and TGF-β signaling pathway.
Both cell cycle and p53 signaling pathways are involved in cell division and proliferation. The p53 gene is called the "guardian of the genome", but when p53 is deregulated, it participates in the development and proliferation of various tumor cells [47].Both MAPK and TGF-βsignaling pathway are involved in cell growth, differentiation, and apoptosis. In recent studies, abnormal activation of the MAPK signaling pathway signal has been found to favor the abnormal proliferation of malignant cells [48]. TGF-βsignaling acts as suppressor and inducer of tumor progression during the early and late stages of cancer, and can trigger a cascade of reactions that mobilize cancer cells [49,50].
Recent studies have demonstrated the consequences of genetic variation in regulating overall risk associated with BRCA patients. In the study so far, we explored the effects of CNV and DNA methylation on gene expression levels and OS of BRCA patients and found that CNV can affect DNA methylation levels. CNV and methylation of SOX17 and HPDL are related to expression and regulation.
In addition, the CNV of SOX17 and HPDL were also correlated with methylation levels. In addition, we found methylation sites for SOX17 and HPDL associated with BRCA prognosis. DNA methylation is an effective regulator of gene expression, If the CpG island is located in the promoter region of a gene, the methylation of the CpG island will significantly reduce or even completely silence the transcription of the gene, and then affect the protein expression. In this study, due to data and conditional restrictions, we did not distinguish whether it was on the promoter or DNA when screening prognostic related methylation sites, which is what we will explore in the next study. Finally, by enriching the low and high expression pathways of SOX17 and HPDL, pathways related to BRCA progression have been discovered, including the JAK-STAT / Wnt / P53 / MAPK signaling pathway.

Ethics approval and consent to participate
Not applicable.

Consent for publication
Not applicable

Availability of data and materials
All data generated or analysed during this study are included in this published article. treatment of melanoma brain metastases.