Epigenetics and Prognostic Survival Analysis of Muc Gene Family in Breast Cancer Based on Bioinformatics Analysis

Background: To explore the epigenetic and biological functions of muc members in breast cancer. Results: Mutation analysis showed that most of the muc members had missense mutations and the breast cancer patients with muc13 had a poor prognosis. Methylation analysis showed that the methylation expression of muc1, muc4, muc5AC and muc13 was related closely. The differentially expressed muc1, muc13 and muc15 had a poor prognosis and were related to immune inltrating lymphocytes in breast cancer microenvironment. Conclusions: The epigenetic basis and biological functions of muc members are closely related to the occurrence and development of breast cancer.


Background
Breast cancer(BC) is the most common cancer in women, accounting for 30% of all cancers [1]. In recent years, more convincing evidence shows that epigenetic modi cations such as chromatin remodeling and DNA methylation play key roles in the early stages of breast cancer [2]. Identi cation of reliable molecular markers for BC prognosis could facilitate individualized treatment.
Mucins(MUC) are a diverse family of high molecular weight glycoproteins widely expressed in epithelial tissues [3].To date, there are 21 members in the mucin family(https://www.genenames.org/).The presence of carbohydrate antigens can often be observed on the surface of tumor cells and is related to the poor prognosis of cancer patients [4]. CA15-3, also known as MUC1, has been identi ed as a biomarker for breast cancer diagnosis for its elevated serum level [5]. In addition to its protective effect in normal physiology, MUC16 (CA125) contributes to the disease progression and metastasis of certain malignant tumors. It is promising to be regarded as a prognostic target for diagnosis and treatment in cancers, such as gastric cancer, lung cancer, ovarian cancer and cervical cancer [6][7][8][9][10]. Studies have shown that MUC members are widely expressed in gastrointestinal tumors and can serve as potential targets for their prognosis [11][12][13][14][15][16]. However, the epigenetic and biological functions of muc members in breast cancer have not been systematically studied.
In general, the role of muc members in different cancers has received widespread attention. Here, we use the online public database The Cancer Genome Atlas(TCGA) and online analysis software, combining bioinformatics knowledge, to comprehensively explore the relevant mechanisms of muc members in breast cancer.

Mutation Analysis of Muc Gene Family in Breast Cancer
In breast cancer, a total of 16 muc family members have mutations. Among them, muc16 has the highest number of mutations, followed by muc4. In the analysis of the mutation types, it is found that the muc16, muc4, muc17, and muc5B of the top 5 muc genes had the most common missense mutations, and the mutation of muc2 was mainly the 3'UTR region. The overall analysis showed that non-synonymous mutations per megabase are larger than synonymous mutations ( gure1A).
We also analyzed the chromosomal loci of the mutant gene. In the analysis of breast cancer muc mutant genes, we found that chromosomes 1, 3, 4, 6, 7, 11, 12 and 19 have mutations. Among them, the mutation frequency of Chromosomes 11 and 19 were higher than others (shown in red) ( gure1B).
For genes with mutations greater than 10, we identi ed them as highly mutated genes. In breast cancer samples, there were six highly mutated genes in the muc family includig muc1, muc12, muc13, muc15, muc16, and muc17. We divided the samples into mutants and wild-types, and analyzed the survival prognosis of these highly mutated genes. The results showed that the muc13 mutation had a poor prognosis which was statistically signi cant (p=0.027). Also, we found that the survival period of mutant genes was signi cantly shorter than that of genes without mutation ( gure2).

Expression Status of MUC Members in Breast Cancer Tissues
Based on the TCGA database, we found that fourteen muc members were expressed in breast cancer tissues. We used the limma package to analyze differentially expressed genes ( gure3A) and the pheatmap package for visualization ( gure3C). Compared with normal tissues, we discovered that breast cancer tissues with |logFC|>1 and FDR<0.05 had signi cant differences. Among them, muc1, muc5AC, muc5B, muc6, muc13, muc21 were up-regulated, muc7 and muc15 were down-regulated, and others showed no signi cant difference.
Furthermore, we performed a correlation analysis of muc members expressed in breast cancer, which showed that muc members were highly related to each other ( gure3B).

Correlation Analysis of Muc Methylation and Expression in Breast Cancer
Gene promoter region methylation is one of the most common mechanisms that affect gene expression in the development of human cancer. We found that eight muc genes were methylated in breast cancer, of which six genes (muc1, muc5AC, muc5B, muc13, muc15, muc17) were signi cantly down-regulated (p<0.05). At the same time, we conducted a correlation analysis of the methylated muc gene and its expression. The correlation analysis showed that muc1 methylation was negatively correlated with the expression level. Methylation of muc4, muc5AC, and muc13 was positively correlated with expression (all p values <0.05) ( gure4).

Prognosis and Clinical Analysis of MUC Members in Breast Cancer
Based on Kaplan-Meier Plot (http://www.kmplot.com), an online analysis website dedicated to prognostic analysis of survival, we selected muc members with differential expression in breast cancer for prognostic survival analysis, and found the high expression of muc1, muc7 and muc13 and the low expression of muc15 represented a poor survival period ( gure5).
Also, we made relevant subgroup analysis based on the clinically different types of breast cancer patients. For lymph-positive patients the higher expression of muc15 and muc1 indicated a poor prognosis (Table 1). From the perspective of clinical stage we discovered the higher expression of muc1 gene in patients with clinical stage III represented a poor prognosis( Table 2). We also explored the di ent breast cancer classi catin.Among them, the higher expression of muc1 gene in triple-negative breast cancer patients indicated a poor prognosis, and the hjgher expressions of muc3A, muc7, muc1 in patients with luminalA type indicated poor survival of patients. The expression of muc20, muc13 and muc1 in patients with luminalB type represented a poor prognosis. In addition, the expression of muc3A and muc5AC in patients with her2 overexpression type predicted the poor prognosis (Table 3).

Correlations Between TIICs and MUC Members in Breast cancer
In recent years, more and more people believe that tumor immune in ltrating cells play an important role in the occurrence and development of tumors. And they are closely related to the progression and prognosis of cancer. Here we used the Sangerbox tool to analyze statistically differentially expressed muc members and six types of immune in ltrating cells based on the Timer database. There was a signi cant negative correlation between muc1 gene expression and immune in ltrating lymphocytes. On the contrary, muc13 and muc15 had a positive correlation with tumor immune in ltrating cells( gure6).

GSEA Multi-channel Enrichment Analysis
In order to explore the potential mechanism of muc members with statistically differential expression in the occurrence and development of breast cancer, we conducted a GSEA multi-pathway enrichment analysis. Here, we focused on the muc1, muc13 and muc15 genes with poor prognosis above. GSEA multi-pathway enrichment analysis indicated that the high expression of muc1 was negatively correlated with the pathways of "cell cycle", "cysteine and methionine metabolism" and "DNA replication", while was positively correlated with the pathways of "peroxisome", "SNARE interactions in vesicular transport" and "sphingolipid metabolism". The high expression of muc13 was positively correlated with the pathways of "antigen processing and presentation", "cell adhesion molecules cams", and "cytokine cytokine receptor interaction". The high expression of muc15 was positively correlated to the pathway of "cell adhesion molecules cams" and "chemokine signaling pathway", "cytokine cytokine receptor interaction" and "FC gamma r mediated phagocytosis"( gure7).

Discussion
MUC gene family encoded a total of 21 mucins, which were divided into secreted and membrane-related genre [3]. The former type of mucin protein included MUC2, MUC5, MUC6, MUC7, MUC8, MUC9, and MUC19. While MUC1, MUC3, MUC4, MUC11, MUC12, MUC13, MUC15, MUC16, MUC17, MUC20, and MUC21 were bound with cell membranes [17].Those transmembrane MUC protein contained a hydrophobic domain [3]. Whether the secreted or membrane-related mucins, they all perform signi cant role across surface of lumen in human as physical defense. Detailly, different mucins still own special and selective function, which was dependent to highly O-glycosylated amino acids rich in Ser/Thr/Pro [18]. Such sugar modi ed structure endows plentiful possibilities to serve biological behaviors [19].Despite of various O-glycosylated modi cation of mucins, MUC genes share similar characteristics, a repeated sequence of de ned nucleotides, which make them a set of gene family on the level of DNA [20,21].
Increasing clinical evidence have showed that aberrant manifestation of MUC gene family was connected with human cancers. There have been studies found that MUC2 and MUC5AC expression were related with the prognosis for colorectal carcinoma (CRC) [22]. MUC16 has been known as important diagnostic marker of ovarian cancer [23]. And for pancreatic cancer, MUC4 seemed to be hopeful to distinguish adenocarcinoma and chronic pancreatitis [24]. With more and more bioinformation platforms or tools published, we were provided with a bulk of data to trace full expression pro les of MUC gene family in breast cancer and dig out their practical values in clinical. Our study was to demonstrate expression pattern, mutation, methylation and prognosis of MUC gene members, as well as their relationship with tumor immune in ltration and possible biological processes involved in.
According to our results, MUC1 have exhibited difference from other members in breast cancer patients. Consistent with previous studies, MUC1 is overexpressed in breast cancer cells compared to normal breast epithelia, especially in TNBC [25][26][27]. Nearly 40% breast neoplasm tissues were detected with ampli cation of MUC1 gene, as well as the upregulation of MUC1 mRNA and protein level [28]. Some studies showed that miR-125b and miR-145 could bind to 3'UTR of MUC1 gene to suppress its expression [29,30]. There were reports claimed MUC1 was interacted with EGFR on the membrane of cells. Two of these molecules formed complexes through galectin-3 in extracellular region [31]. Then MUC1 cytoplasmic domain would be phosphorylated by EGFR or other tyrosine kinases. In turn, EGFR was activated to enter the nuclear dependent on PI3k promoted by means of its SH2 domain binding with MUC1 cytoplasmic sites [32]. As we mentioned before, MUC1 was negatively relevant with immune response in breast cancer. It has been found that anti-MUC1 antibodies in patients' serum might improve immune reaction of our body by strengthening cytotoxicity, killing those circulating tumor cells carrying MUC1 marker [33]. Detailly, MUC1 could downregulate T cells and NK cells functions directly [34,35].
Additionally, MUC1 induced PD-L1 through MYC and NF-κBp65 pathway was reported to be hopeful method as supplement for immune treatment. Furthermore, increased PD-L1 not only help cancer cells evade immune system, but also contribute to EMT process in TNBC [36,37].
MUC4, also known as sialomucin complex in rats, keep in low level at nonpregnant mammary gland. When it comes to pregnancy, it will boost up 100 times [38]. Thus, it was presumed MUC4 might be involved in immunity for babies. So, it was strictly regulated in a spatiotemporal pattern. While a higher abundance of MUC4 was also observed in breast cancer metastasized into ascites [39]. Mechanically, MUC4 would combine with ErbB2 following ErbB3 ligand with NRG. Then four of them formed a heteromeric complex to activate PI3K-Akt and MAPK pathways, inhibiting cancer cells apoptosis [40].
In regard to MUC13 and MUC15, even though our data showed they might be correlated with breast cancer from diverse aspect, like mutation, methylation, overall survival and relationship with tumor immune in ltrating cells. However, little was known about how MUC13 and MUC15 affect breast cancer.
And we believed it was worth investigating in depth. Brie y, MUC13 was supposed to be involved in gastric cancer [42], colon cancer [43] and ovarian cancer [44]. Aberrant expression was thought to be connected with inhibiting cancer cells apoptosis signal pathway [45]

MRNA Mutation Data of Muc Family Members in The Cancer Genome Atlas (TCGA) Database
TCGA is currently the largest cancer gene information database. We use the GDC tool ( https://portal.gdc.cancer.gov/) to download breast cancer gene mutation data from the TCGA database, and download 986 breast cancer-related gene mutation samples. We use perl (version 5.28.1) to extract the relevant expression matrix, and use R software (version 4.0.2) to analyze the mutation gene matrix for the mutation frequency, mutation type and chromosome related mutation sites of the MUC gene family. We performed prognostic correlation analysis on muc family genes with mutations greater than ten.

Comparison of the mRNA Expression of the MUC Family in Breast Cancer and Normal Tissues
We downloaded the mRNA expression pro le data of 1178 breast cancer-related samples from the TCGA database, including 1066 breast cancer patients and 112 adjacent tissues. We used the edgeR and gplots packages to analyze the differential expression of breast cancer and adjacent samples of muc family members. At the same time, we used the online analysis tool Sangerbox (http://www.sangerbox.com/tool) to conduct a correlation analysis of muc family members.

Correlation Between Methylation and mRNA Expression of the MUC Family in Breast Cancer
We downloaded the Illumina Human Methylation 27K breast cancer-related gene methylation data from the TCGA database, and used the limma package to analyze the differential expression of methylated genes. Also, we analyzed the correlation between the muc methylation level and the gene expression level.

Survival Analysis of MUC Members in Breast Cancer
The prognostic survival analysis of muc family members was divided into two aspects. On the one hand, we used online tool analysis Kaplan-Meier Plot (http://www.kmplot.com) to analyze the overall prognosis of muc family members. On the other hand, in order to better study the survival of muc members in breast cancer patients with different factors (including ER, Her2, classi cation, staging, lymph nodes, etc.), we conducted a single-factor survival prognosis analysis.

Associations Between Tumor Immune In ltrating Cells (TIICs) and the MUC Family
In order to study the relationship between muc gene family and tumor immune in ltrating cells, we used the Sanger Box tool to perform correlation analysis based on the TIMER platform, including (B cells, T Cells, neutrophils, macrophages, dendritic cells).

Gene Set Enrichment Analysis (GSEA)
In order to evaluate the potential mechanism of the role of MUC family members in breast cancer, we conducted a GSEA (GSEA_4.0.3 version) analysis to determine the potential pathways for the differentially expressed muc gene family of breast cancer in the TCGA database. Annotated gene set le c2.cp.kegg.v7.2.symbols.gmt (from Msig database) was a reference. The number of random combinations of muc gene differential expression was 1000 permutations and the false discovery rate (FDR) was <0.05 to determine the way of signi cant enrichment. Methods:

MRNA Mutation Data of Muc Family Members in The Cancer Genome Atlas (TCGA) Database
TCGA is currently the largest cancer gene information database. We use the GDC tool ( https://portal.gdc.cancer.gov/) to download breast cancer gene mutation data from the TCGA database, and download 986 breast cancer-related gene mutation samples. We use perl (version 5.28.1) to extract the relevant expression matrix, and use R software (version 4.0.2) to analyze the mutation gene matrix for the mutation frequency, mutation type and chromosome related mutation sites of the MUC gene family. We performed prognostic correlation analysis on muc family genes with mutations greater than ten.

Comparison of the mRNA Expression of the MUC Family in Breast Cancer and Normal Tissues
We downloaded the mRNA expression pro le data of 1178 breast cancer-related samples from the TCGA database, including 1066 breast cancer patients and 112 adjacent tissues. We used the edgeR and gplots packages to analyze the differential expression of breast cancer and adjacent samples of muc family members. At the same time, we used the online analysis tool Sangerbox (http://www.sangerbox.com/tool) to conduct a correlation analysis of muc family members.

Correlation Between Methylation and mRNA Expression of the MUC Family in Breast Cancer
We downloaded the Illumina Human Methylation 27K breast cancer-related gene methylation data from the TCGA database, and used the limma package to analyze the differential expression of methylated genes. Also, we analyzed the correlation between the muc methylation level and the gene expression level.

Survival Analysis of MUC Members in Breast Cancer
The prognostic survival analysis of muc family members was divided into two aspects. On the one hand, we used online tool analysis Kaplan-Meier Plot (http://www.kmplot.com) to analyze the overall prognosis of muc family members. On the other hand, in order to better study the survival of muc members in breast cancer patients with different factors (including ER, Her2, classi cation, staging, lymph nodes, etc.), we conducted a single-factor survival prognosis analysis.

Associations Between Tumor Immune In ltrating Cells (TIICs) and the MUC Family
In order to study the relationship between muc gene family and tumor immune in ltrating cells, we used the Sanger Box tool to perform correlation analysis based on the TIMER platform, including (B cells, T Cells, neutrophils, macrophages, dendritic cells).

Gene Set Enrichment Analysis (GSEA)
In order to evaluate the potential mechanism of the role of MUC family members in breast cancer, we conducted a GSEA (GSEA_4.0.3 version) analysis to determine the potential pathways for the differentially expressed muc gene family of breast cancer in the TCGA database. Annotated gene set le c2.cp.kegg.v7.2.symbols.gmt (from Msig database) was a reference. The number of random combinations of muc gene differential expression was 1000 permutations and the false discovery rate (FDR) was <0.05 to determine the way of signi cant enrichment.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
Publicly available datasets were analyzed in this study, these can be found in The Cancer Genome Atlas (TCGA) program at https://portal.gdc.cancer.gov/.

Competing interests
The authors declare that they have no competing interests.