Integrative analysis of AGR2 based on data mining reveals new insights in breast cancer

Background: Belonging to the protein disulde isomerase (PDI) family, anterior gradient 2 (AGR2) is overexpressed in mucus-secreting cells and endocrine organs such as breast, lung, and ovarian, but its exact function and regulation remain unclear. We aimed to perform integrative analysis of AGR2 in breast cancer. Methods: In our study, a serious of bioinformatic online databases were used to analyze the expression, regulation, prognostic value, and function of AGR2 in breast cancer. Results: The results suggested that AGR2 mRNA was overexpressed in non-basal-like or non-triple-negative breast cancer, but underexpressed in basal-like/ triple-negative breast cancer. AGR2 mRNA expression was correlated with its protein expression. Moreover, the expression of AGR2 was lower in more malignant breast cancer. Furthermore, we also found that DNA methylation, rather than copy number variation, led to AGR2 mRNA overexpression. Prognosis analysis showed no signicant correlation between AGR2 level and survival, but in subtype investigation, patients with higher AGR2 level had a signicantly better outcome in patients with basal-like / triple-negative breast cancer. Enrichment analysis for co-expression genes of AGR2 revealed that gene sets enriched for chromosome segregation, DNA conformation change, chromosomal region, and GABA receptor activity may play important roles in breast cancer, and that the most signicant pathway was “ribosome biogenesis in eukaryotes”, in which negative co-expression genes of AGR2 were enriched. Conclusions: Overexpression of AGR2 was found in less malignant breast cancer, for the rst time, our results propose that DNA methylation rather than copy number variation leads to AGR2 overexpression, which can predict favorable prognosis in basal-like/ triple-negative breast cancer, and AGR2 could be a possible regulator of ribosome biogenesis in patients with breast cancer.

as pancreatic, colorectal cancer) 3 . It has been widely reported that AGR2 is associated with the development of estrogen receptor (ER) positive breast cancer 4 . However, the actual functions and regulation of AGR2 are poorly comprehended, and the prognostic value of AGR2 remains inconsistent.
With the development of bioinformation technology, diverse public databases and analytical strategies, data mining techniques has been widely accepted in medical researches. To develop a better understanding about the role of AGR2 in BRCA, we performed integrative analysis based on data mining by using public web tools to explore the expression, regulation, function, and prognostic value of AGR2 in BRCA.

AGR2 Expression Analysis
The expression of AGR2 mRNA was examined with two web-based databases, including GEPIA2 and Breast Cancer Gene Expression Miner v4.4 (bc-GenExMiner v4.4). First, we evaluated the AGR2 mRNA expression in pan-cancer and breast cancer by using GEPIA2, which is a web tool for Wide-ranged expression pro ling and interactive analysis with data from The Cancer Genome Atlas (TCGA) database and the Genotype-Tissue Expression (GTEx) database (http://gepia.cancer-pku.cn/) 5 . We also explored the AGR2 expression according to different clinicopathological characteristics of breast cancer, including hormonal receptors status, Her-2, age, nodal status, histological subtypes, PAM50 subtypes, Nottingham prognostic index (NPI), and Scarff-Bloom-Richardson(SBR) grade 6,7 , by using bc-GenExMiner v4.4, which is a user-friendly online mining platform with published annotated data of breast cancer (http://bcgenex.centregauducheau.fr) 7 . The expression of AGR2 protein was studied with The Human Protein Atlas (HPA) database and UALCAN database. IHC results of AGR2 in breast cancer and normal breast tissue were obtained from HPA, which is an online program containing human transcriptome and proteome data generated from RNAsequencing analysis and immunohistochemistry analysis 8 . We explored AGR2 protein expression in normal breast tissues and malignant counterparts by using UALCAN, which is a web portal for performing analyses of gene expression data from TCGA (http://ualcan.path.uab.edu) 9 . Correlation between mRNA and protein expression of AGR2 was determined by using The cBioPortal database, which is a portal for integrative analysis of cancer genomics and clinical pro les (http://cbioportal.org) 10 . P < 0.05 was considered statistically signi cant.

AGR2 Gene Alteration Analysis
DNA copy number of AGR2 in breast cancer was analyzed within the Oncomine 4.5, which is an outstanding database for integrated data-mining with cancer microarray data (www.oncomine.org) 11 . The cBioPortal database was used to analyze AGR2 gene alteration with data from the Breast Invasive Carcinoma (TCGA, Firehose Legacy) samples. An overview of genetic alterations per sample in AGR2 is displayed in the tab OncoPrint. In the tab Survival, the Kaplan-Meier analysis was used to evaluate the effect of AGR2 alterations on the patient's overall survival (OS) and disease-free survival (DFS). The tab Comparison shows correlation between AGR2 mRNA expression and CNV, DNA methylation or protein expression. We also analyzed correlation between AGR2 CNV and DNA methylation status. P < 0.05 was considered statistically signi cant.

AGR2 DNA Methylation Analysis
This part was performed by using three databases, including UALCAN, Wanderer and Methsurv. We rst explored AGR2 DNA methylation status in normal breast tissues and malignant counterparts by using UALCAN, then obtained data of ve probes from Methsurv (https://biit.cs.ut.ee/methsurv), which is an online database for multivariable prognostic analysis using DNA methylation data 12 . We investigated AGR2 DNA methylation status in BRCA and normal tissues and relevance between AGR2 DNA methylation status and mRNA expression at all the ve probes respectively by using Wanderer database, which is an interactive online tool to investigate data of DNA methylation and gene expression in human cancer from TCGA (http://maplab.cat/wanderer) 13 . Within Methsurv database, We explored prognosis values of AGR2 DNA methylation in BRCA. P 0.05 was considered statistically signi cant.
2.4 Survival Analysis of AGR2 using GEPIA2, TCGAportal and bc-GenExMiner v4.4 Within GEPIA2 database, we performed survival analysis of AGR2 in breast cancer including OS and DFS.
Then the prognostic value of AGR2 was evaluated in patients with different subtypes of breast cancer by using TCGAportal, which is a newly released web tool that allows for analysis of survival curve with data from TCGA (http://www.tcgaportal.org) 14 . Afterword, we veri ed the results in bc-GenExMiner v4.4. P 0.05 was considered statistically signi cant.

Co-expression Genes of AGR2 and Functional Enrichment Analysis
In this part, we performed analysis by using LinkedOmics database, which is an online database that integrates data from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) and contains multiomics data and clinical data from TCGA (http://www.linkedomics.org/ login.php) 15 . The module of LinkFinder was used to analyze associated genes of AGR2 in the TCGA BRCA cohort (n = 1093). The module of Link-Interpreter module was used to conduct functional analysis of associated genes of AGR2 in the Web-based Gene SeT AnaLysis Toolkit (WebGestalt) 16 . GSEA was employed to perform Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses. GO functional analysis contained three categories, including biological process (BP), cellular component (CC) and molecular function (MF). P 0.05 and FDR < 0.05 were considered statistically signi cant.

Protein-Protein Interaction (PPI) Network
PPI network for genes enriched in the pathway of ribosome biogenesis in eukaryotes was established by using GeneMANIA database, which is a web interface for fast gene network construction and function prediction (http://www.genemania.org) 17 . The result contained the source of the edge of the network and a number of bioinformatics methods, including physical interaction, gene co-location, gene co-expression, gene enrichment analysis and website prediction.

Protein Expression of AGR2 in BRCA
The UALCAN database was used to assess the different levels of AGR2 in BRCA and normal breast tissues, and showed that AGR2 protein level was higher in BRCA (P = 8.59E-11, P) ( Fig. 3C). To validate the IHC results, the HPA database was used to obtain AGR2 expression in BRCA and normal tissue, and showed that AGR2 level (antibody HPA007912) was upregulated in BRCA compared to normal tissue ( Fig. 3A-B). Furthermore, correlation was found between AGR2 protein expression (mass spectrometry by CPTAC) and mRNA level (microarray) in cBioPortal database (Spearman: 0.81, P = 3.43E-18; Pearson: 0.79, P = 4.64E-17) (Fig. 3D).

Gene alteration of AGR2 in BRCA
The Oncomine database revealed that DNA copy number of AGR2 was signi cantly higher in invasive BRCA than corresponding normal tissues (P = 0.027) (Fig. 4C). Afterword, we used the cBioPortalto database to determine the type and frequency of AGR2 alteration in BRCA patients based on data from Breast Invasive Carcinoma (TCGA, Firehose Legacy), then performed correlation analysis between AGR2 copy number variation (CNV) and mRNA expression, and prognostic analysis of AGR2 CNV. AGR2 was altered in 76 of 1093 (7%) BRCA patients, and these alterations were mRNA ampli cation in 13 cases (1.19%), mRNA low in 61 cases (5.58%), and multiple alterations in 2 cases (0.18%) (Fig. 4A-B). Hence, the most common type of AGR2 CNV is ampli cation in BRCA. Correlation analysis showed that AGR2 mRNA expression was not associated with CNV (Spearman: 0.08, p = 9.166e-3; Pearson 0.07, p = 0.0313) (Fig. 4D). Survival analysis suggested that AGR2 CNV was not associated with overall survive (OS) (p = 0.776) or disease free survive (DFS) (p = 0.0974) in BRCA (Fig. 4E-F).

DNA methylation of AGR2 in BRCA
In this part, we rst compared different status of AGR2 DNA methylation between BRCA and normal breast tissues by using UALCAN. It showed that AGR2 DNA methylation status was signi cantly lower in BRCA than normal breast tissues (P < 1E-12) (Fig. 5A). Then in cBioPortal database, we found that AGR2 mRNA expression (mircoarray) was negatively associated with its DNA methylation status (HM450) (Spearman: -0.76, P = 4.21E-42; Pearson: -0.65, P = 1.15E-27) (Fig. 5B). We also analyzed correlation between AGR2 DNA methylation status and CNV, but the result was negative (Fig. 5C). Furthermore, details of ve probes were available by using MethSurv database (Supplementary Table 1). The methylation status of AGR2 at the ve probes were all lower in BRCA than corresponding normal tissue according to Wanderer database (Fig. 5D). Survival analysis showed signi cant result only at probe cg 25343204 (p = 0.036) by using Methsurv database, lower AGR2 DNA methylation status indicating better prognosis (p = 0.036) (Fig. 6F). Interestingly, among all the ve probes, AGR2 DNA methylation status had the strongest negative correlation to its mRNA expression at probe cg25343204 (Pearson r=-0.646) (Fig. 6A).

Prognosis analysis of AGR2 in BRCA
We investigated prognostic value for AGR2 in BRCA by using GEPIA2, TCGAPotal and bc GenExMiner v4.4. According to GEPIA2, no correlation was found between AGR2 mRNA and OS (p > 0.05) or DFS (p > 0.05) (Fig. 7A-B). According to TCGAPotal, subtype analysis was performed and AGR2 showed no signi cant correlation to survival in Luminal A, Luminal B or Her-2 subtype respectively, but in basal-like subgroup, increased AGR2 was associated with better prognosis (p = 0.00061) (Fig. 7C-F). Furthermore, we veri ed it in bc GenExMiner v4.4 database, it also showed signi cant result only in basal-like subgroup (p = 0.0130), higher AGR2 expression indicating better prognosis (Fig. 7G-J).

Co-expression Genes of AGR2 and enrichment analyses in BRCA
In the LinkedOmics database, we used the Function module to analyze mRNA sequencing data 1093 BRCA patients from TCGA. Volcano plots showed the signi cant positive (red dots) and negative (green dots) co-expression genes of AGR2 (false discovery rate [FDR] < 0.05) (Fig. 8A). Heat maps showed the top 50 positive and the top 50 negative associated genes respectively (Fig. 8B-C). Then gene set enrichment analysis (GSEA) was used to perform GO and KEGG pathway analyses in the LinkInterpreter module.The Web-based Gene SeT AnaLysis Toolkit (WebGestalt) was applied. The signi cant results of GO term analysis (p < 0.05 and FDR < 0.05) were shown in Supplementary Fig. 1A-C. According to volcano plots shown in Fig. 8D-F, we speculated that gene sets enriched for chromosome segregation (BP), DNA conformation change (BP), chromosomal region (CC), and GABA receptor activity (MF) may play important roles in BRCA. Details of these gene sets are shown in Supplementary Table 2-5. The signi cant result of KEGG pathway analysis (p < 0.05 and FDR < 0.05) was shown in Supplementary   Fig. 1D, and the volcano plot shown in Fig. 8G suggested that the pathway of ribosome biogenesis in eukaryotes ( Supplementary Fig. 2) may play an important role in BRCA. Details of genes enriched in this pathway are shown in Supplementary Table 6. Finally, the protein-protein interaction network (PPI) containing genes enriched in ribosome biogenesis was constructed by GeneMANIA. It showed that these genes were mainly responsible for ribosome biogenesis, small nucleolar ribonucleoprotein and ribonucleoprotein complex biogenesis (Fig. 9).

Dicussion
In the study, we rstly evaluated AGR2 expression in BRCA. Generally, AGR2 upregulates in non-basal-like BRCA compared to normal tissues, which is consistent with previous studies 18,19 . We also found that AGR2 level was the lowest in basal-like BRCA and was even lower than in normal tissues. Consistently, Jilong Guo et al. performed immunohistochemistry and revealed that AGR2 was signi cantly upregulated in non-TNBC tissues than TNBC tissues, and was also signi cantly downregulated in TNBC than normal tissues 20 . Basal-like breast cancer, also known as triple-negative breast cancer (TNBC), lacks estrogen receptor (ER) and progesterone receptor (PR) expression as well as human epidermal growth factor receptor 2 (HER2) ampli cation. Compared with other subtypes, the TNBC subtype has poorer prognosis, higher mortality and the highest probability of recurrence after treatment. Its poor outcome is due to the lack of available targeted therapies, and its intrinsically aggressive nature 21 . We also found that AGR2 expression was higher in mucinous breast cancer (MBC) than invasive ductal cancer (IDC) and invasive lobular cancer (ILC). As a rare histological cancer type, MBC exhibits lower malignancy and a comparatively better prognosis. A study showed that patients with MBC had a signi cantly better outcome than patients with IDC 22 . Plus, lower AGR2 level was found in SBR3 group and NPI3 group. It appears that AGR2 expressions are lower in more invasive tumors, though Wang Z suggested that increased AGR2 contributes to tumor growth and invasion and metastasis formation 23 .
To understand the reasons behind AGR2 overexpression in BRCA, we explored the relationship between AGR2 mRNA expression and CNV or DNA methylation. Because of changes in gene sequence location, CNV can affect gene expression, increase disease susceptibility and aggravate disease progression 24 . Changes in DNA methylation have an impact on regulating gene expression during cancer progression and metastasis 25 . Hence, we planned to explore what may be the potential reason behind changes of AGR2 mRNA expression in BRCA at the DNA level. Our study found that AGR2 copy number was signi cantly increased in BRCA, and the main type of AGR2 alteration was ampli cation. However, AGR2 CNV was not associated with mRNA expression or survive. But we found AGR2 mRNA expression was negatively associated with DNA methylation status, lower methylation status indicating higher mRNA level. HYE YOUN SUNG et al analyzed AGR2 mRNA expression and DNA methylation data in metastasized ovarian tumor in the mice and showed mRNA overexpression and hypomethylation at CpG sites in its promoter region 26 . Furthermore, they revealed that treatment with 5-aza-2'-deoxycytidine, a kind of the DNA methyltransferase inhibitor, could reduce the level of CpG methylation in the AGR2 promoter and increase the level of AGR2 expression. We carried out further exploration and found AGR2 methylation status were lower in BRCA than normal tissues, and it showed strong negative correlation between AGR2 methylation and mRNA expression at probe cg25343204, where also showed signi cant association with survival, lower methylation status showing better outcome. Thus, for the rst time, we propose that upregulation of AGR2 mRNA in BRCA may be caused by DNA methylation rather than CNV.
The prognostic value of AGR2 remains largely inconclusive. A study demonstrated that higher level of AGR2 was related to less aggressive tumors and better prognosis in epithelial ovarian carcinoma 27 , which is in line with Ames and colleagues who found absence of AGR2 was associated with increased malignancy and tumor progression, and relapse in endometrioid and muinous subtypes 28 . AGR2 overexpression was found to contributed to the poor OS in lung cancer 29 , whereas another study showed that AGR2 overexpression did not show any prongnostic value in non-small-cell lung cancer 30 . Fritzsche FR et.al found that for patients with AGR2-expressing breast carcinomas, signi cantly longer OS became apparent than those without AGR2-expression 19 . However, other reports suggested that high AGR2 expression was related to poor outcome in BRCA [31][32][33] . In our study, no signi cant prognostic values of AGR2 were found according to GEPIA2 database. Interestingly, in subtype analysis, for patients with basal-like BRCA, higher AGR2 level showed better outcome according to TCGAPortal and bc-GenExMiner v4.4 databases. Combining data from our expression analysis of AGR2, it seems like that AGR2 plays a protective role in BRCA due to its higher level in less aggressive tumors such as Luminal subtype and MBC BRCA.
Many ndings are in favor of the hypothesis that AGR2 is an oncogene and can promote tumor cell survival, migration, and invasion; drug resistance; angiogenesis; and metastasis 2,4,23,34−36 . However, Shao-bo Tian and colleagues performed a systematic review and meta-analysis about the prognostic value of AGR2 expression in solid tumors, and speculated that AGR2 might act as a tumor suppressor in some tumor types, which needed further veri cations 37 . Hence, we performed functional analysis of AGR2 for better understanding. GO and KEGG pathway enrichment analyses of AGR2 associated genes in BRCA were performed by using GSEA. Go term analysis indicated that these genes were mainly enriched in chromosome segregation (BP), DNA conformation change (BP), chromosomal region (CC), and GABA receptor activity (MF). KEGG pathway analysis demonstrated that "ribosome biogenesis in eukaryotes" was the most signi cant signaling pathway, and genes enriched in this pathway were negatively associated with AGR2. Ribosomes play important roles in translating information contained in mRNAs into functional proteins, the nal step in the genetic programme 38 . Ribosome biogenesis can increase the size of nucleolar organizing regions (NORs) which have long been used as a marker of tumor cell proliferation that negatively correlates with patient survival 39 . Richard Y. et al. indicated that the increasing of ribosomal content in epithelial cell fates may enhance their metastatic potential in breast cancer 40 . Studies also demonstrated that drugs inhibiting ribosome biogenesis might provide a feasible therapeutic method for cancer treatment 41 . Moreover, Varsha Prakash et al. demonstrated that upregulation of ribosome biogenesis could fuel the execution of Epithelial-to-Mesenchymal Transition (EMT), which is a migratory cellular program connected with development and tumor metastasis, and inhibition of rRNA synthesis in vivo turns primary tumors into a benign, Estrogen Receptor-alpha (ERα) positive, Rictor-negative (a fundamental constituent of the EMT-promoting mammalian target of rapamycin complex 2 ) phenotype and reduced metastasis 42 . Lucia Sommerova et al. described that loss of intracellular AGR2 in cell lines caused a variation from epithelial to mesenchymal phenotype and establishment of intracellular AGR2 expression launched EMT 43 . However, other studies suggested that extracellular AGR2 could interrupt adherens junctions, disrupt basal laminin and activate broblastassociated cancer invasion, promoting epithelial morphogenesis and tumorigenesis 44 . To date, no researches about correlation between AGR2 and ribosome biogenesis have been reported. For the rst time, we speculate that AGR2 may be a potential regulator of ribosome biogenesis in BRCA. In our opinion, the function of AGR2 should be explored according to different ER status and its different locations: intracellular or extracellular.
There were still limitations to the current study. Firstly, all the results were validated by using web tools, meaning a lack of in vitro and in vivo experiments. Secondly, all the researches should be performed in different subtypes respectively, due to results may be affected by ER, PR, or Her2 status.

Conclusion
By using bioinformatics analysis, we found that higher AGR2 expression was associated with less aggressive breast cancer. Our results was the rst to demonstrated that DNA methylation rather than copy number variation led to AGR2 overexpression, which could predict favorable prognosis in TNBC, and AGR2 could be a possible regulator of ribosome biogenesis in patients with breast cancer.

DATA AVAILABILITY STATEMENT
The data that support the ndings of this study are openly available in all the online databases we used, and the websites are included within this article.

Ethics approval and consent to participate
There was no enthics associated with this work because the study didin't involve animal experiments or human specimens.          Protein-protein interaction network of the gene set that was enriched in the pathway of ribosome biogenesis in eukaryotes (GeneMANIA). Different colors of the net-work edge indicate the bioinformatics methods applied: co-expression, website prediction, pathway, physical interactions and co-localization.
The different colors for the network nodes indicate the biological functions of the set of enrichment genes.