FBLN5 Is an Underlying Common Tumor Suppressor in Breast Cancer and Thyroid Cancer

Background: Breast cancer (BC) patients have a greater risk of developing thyroid cancer (TC) than the general population. Similarly, TC patients are more likely to develop BC, suggesting an underlying common etiology. In this study, we sought to identify the potential cross-talking pathway and related molecular mechanisms conferring to the sequential development of BC and TC. Methods: We rst used Multiple Primary-Standardized Incidence Ratios (MP-SIR) Program of SEER*Stat to calculate SIR to conrm the relationship between BC and TC. Then the RNA-seq was downloaded from The Cancer Genome Atlas (TCGA). And we built a co-expression network via Weighted Gene Co-expression Network Analysis (WGCNA) and obtained the most signicant modules. The key genes were obtained by differential gene expression (DGE) analysis and WGCNA analysis. Furthermore, String database and Cytoscape software were used to construct protein-protein interactions (PPI), and dened the maximum Maximal Clique Centrality (MCC) value as hub gene.Then we performed prognosis analysis on the hub genes and obtained the prognostic genes of BC and TC. Finally, gene set enrichment analysis (GSEA) was used to investigate the molecular pathways associated with prognostic gene expressed both in BC and TC. Results: From the SEER database, we found that the risk of developing BC in TC patients was SIR 1.12, 95% CI [1.07, 1.18], and the risk of developing BC in TC patients was SIR 1.29, 95% CI [1.23, 1.26]. Fifty-nine key genes obtained by differential expression analysis and WGCNA identify that PI3K/AKT was the most enriched pathway in BC and TC. In addition, the Recombinant Fibulin 5 (FBLN5) was shown to be of signicant prognostic value for both BC and TC and was down-regulated in BC and TC tissues. GSEA demonstrated that FBLN5 enrichment pathways associated with BC and TC mainly included: B cell receptor signaling pathway, steroid hormone biosynthesis, and pathways in cancer. Conclusions: The PI3K/AKT signaling is most co-enriched pathway in BC and TC. FBLN5 is the most relevant prognostic gene and an underlying common tumor suppressor in both BC and TC, with down-stream pathways involving immunity, hormone biosynthesis and carcinogenesis.


Introduction
According to newly released data, BC and TC were among the most prevalent malignancies in women. There were about 2.1 million new cases of BC in women worldwide, accounting for almost a quarter of all cancer cases in women in 2018. In addition, around 567,000 cases were diagnosed with TC and ranked ninth in cancer incidence worldwide, with a global female incidence rate of 10.2 per 100,000 people, three times that of male [1]. A number of studies have reported that the risk of TC increases after BC, similarly, the risk of BC increases after TC [2][3][4][5]. Therefore, BC and TC might share certain etiologies in common, and studies have shown that obesity, genetic susceptibility, hormone factors and certain environmental pollutants were all risk factors for BC and TC [6]. However, their underlying common mechanisms of action remain unknown. A few epidemiological studies have suggested a link between BC and TC, and we used the SEER database to verify that BC and TC have a link, but until recently, no study have proven it at the molecular level. Therefore, we aim to investigate whether there is a genetic link between BC and the TC.
SEER database is an authoritative neoplasm statistics database of the United States. It records information on the incidence, mortality and morbidity of millions of malignancies in several counties in the United States. The cancer information in the database is uni ed and standardized by SEER*Stat software. We used the SEER*Stat software program and selected patients between 1973 and 2015 to obtain the SIR of TC in BC patients and SIR of BC in TC patients, and to verify the relationship between BC and TC.
With the development of bioinformatics and genomics technology, gene expression pro ling can be used to explore the molecular mechanism of disease etiology [7]. DGE analysis of transcriptomics provided an effective method to study the molecular mechanisms of genome regulation [8]. WGCNA is a kind of bioinformatics analysis method for constructing gene co-expression networks, aiming at obtaining coexpression gene modules. It can cluster genes with similar expression patterns, explore associations between gene networks and phenotypes of interest, build a bridge between sample characteristics and changes in gene expression, and ultimately identify key genes in the disease pathway [9] , [10,11]. The joint of WGCNA and DGE analysis can obtain highly correlated genes.Therefore, In this study, we combined DGE with WGCNA analysis to obtain highly correlated genes.
In this paper, we used the SEER database to determine the incidence of secondary malignancies with BC as TC or TC as BC. Then, the RNA-seq data of BC and TC were obtained from TCGA data set, and the key genes were obtained by DGE analysis and WGCNA analysis. Functional enrichment analysis was used to explore the commonality of BC and TC. We constructed PPI to select the hub genes. Subsequently, hub genes related to the prognosis of BC and TC were screened. Finally, GSEA of prognostic genes was performed to explore the molecular pathways related to the expression of prognostic genes in BC and TC.
This study might provide a new perspective to a possible shared molecular mechanism of the development of BC and TC.

Materials And Methods
The design and process of this article were shown in Figure 1.

Data collection, processing and differentially expressed genes (DEGs) screening
The RNA-Seq data of BC and TC were obtained from TCGA database, which is the maximum cancer gene information database at present and the most abundant and authoritative database in cancer research.
Bioconductor package in R includes a limma package that can be used to analyze biochip technology. Limma can read the raw data, control the quality of the data, preprocess the data and analyze the differences [12]. We used limma package to acquire DEGs of BC and TC. The cut-off standard were | log FC | ≥ 1.0 and adj. P < 0.05 (adjust p value), and the DEGs of BC and TC were visualized as volcano plot adopting ggplot2 package in R.

Data extraction from the SEER database
The MP-SIR Session of SEER* Stat software can be used to calculate the risk of developing a second primary cancer in people who have already been diagnosed with cancer, compared with the general population. We obtained the information of BC and TC from the SEER database. Patients from 1973 to 2015 were selected, and then the MP-SIR Session of SEER* Stat software was performed to obtain the SIR of TC patients with the second cancer and SIR of BC patients with the second cancer. The SIR is equal to the observed count divided by the expected count, and the 95% con dence interval (CI) can be obtained by calculating the assumed Poisson distribution.

Con rmation of the highest co-expression module via WGCNA
WGCNA is a biological analysis method for calculating the functional sets of various weighted association analyses based on correlation coe cients. In this study, WGCNA was used to obtain the genes in the highest co-expression module. First of all, we used WGCNA to analyze the gene expression information of BC and TC derived from TCGA database. We retained genes with CPM (count per million) ≥1, and processed the missing value. Then the picksoft threshold function was used to lter the soft threshold, so that the network constructed was more accordant with the characteristics of scale-free network. Next the module was segmented by dynamic tree cutting algorithm. According to the correlation coe cient between genes, the hierarchical clustering tree was constructed, and the clustering results were segmented on the basis of the set criteria to obtain different gene modules. Different branches of the cluster tree represented different gene modules with different colors. The correlation between modules and sample traits was analyzed, and the module-trait associations data were visualized using heat map. Finally, the DEGS of BC and TC and the genes in the highest co-expression module obtained by WGCNA analysis of BC and TC were intersected, and the key genes obtained visualized by Venn diagram.

Functional enrichment analysis
GO annotation can be applied to discover biometric attributes and KEGG pathway enrichment analysis to discover functional attributes [13,14]. GO Term analysis comprises biological process, cellular component and molecular function. To conduct GO annotation and KEGG pathway enrichment analysis of key genes, we applied cluster pro ler package in R to enrich key genes and P < 0.05 was taken as the cut-off value.

PPI Network Construction and acquisition of Hub genes
The STRING database is an online tool for studying PPI that can be used to excavate for hub genes. For the 59 key genes obtained above, we used the String database line tool to search for the interactions between known proteins and predicted proteins. Then, we used cytoscape (version: 3.8.0) to construct a network model and screened the top 10 genes with the largest MCC value. Moreover, the top 10 genes were de ned as core genes.

Survival analysis and veri cation of expression patterns of prognostic Genes
The clinical information of BC and TC were gained from TCGA, and patients who completed the follow-up period were chosen for survival analysis with the survival package in R.
Kaplan-Meier (KM) method was performed to estimate the survival rate, and the difference of overall survival (OS) between BC and non-BC, TC and non-TC was compared by Log Rank test. We rst performed a prognostic analysis of HUB gene validation, and then the expression patterns of prognostic genes produced in different pathologic tumors and normal tissues were then examined, and P < 0.05 was taken as the cut-off value.

Single-gene GSEA enrichment analysis
Single-gene GSEA was applied to study the in uence of a gene expression level on diseases. According to the median expression of the gene to be studied, the tumor samples were grouped into high expression and low expression, and then identify the signi cantly enriched pathway [15]. For the sake of further probing the biological pathway of prognostic gene pathway involved in the pathogenesis of BC and TC, we performed GSEA. We downloaded c5.all.v7.1.symbols from the GSEA database (http://www.gseamsigdb.org/gsea/index.jsp), grouped them with prognostic genes, and enriched BC and TC tumor samples using the GSEA package in the clusterPro ler package ,and with the cut-off standard of |normalized enrichment score |≥1.0, p≤ 0.05, p.adj ≤ 0.25.

Statistical Analysis
The gene expression data were all downloaded from the BC and TC databases of TCGA, and all the analyses were performed in R language (Version: 4.0.2). P <0.05 was statistically signi cant.

SIR for the Second Primary Cancer
As compared to the general population, the SIR of TC in BC patients was 1.

DEGs and Co-expression Modules
We obtained the gene expression pro les and clinical information of BC and TC from TCGA database. There were 1222 BC samples, including 1109 tumor tissues and 113 normal tissues. The total number of TC samples was 568, tumor tissue was 510, and normal tissue was 58. The RNA-Seq data for both BC and TC were 56,753 genes.
According to the standard: | log FC | ≥ 1.0 and adj. P <0.05, we use the limma packet for differential expression analysis, and 3736 DEGs ( We used RNA-seq data from TCGA BC database and TC database to construct gene co-expression network, and selected the highest co-expression module. Assign a color to each module, and BC ( Figure  2C) produces 11 modules and TC ( Figure 2D) produces 6 modules. A heat plot of the module-trait relationships was then used to estimate the correlation between each module and clinical characteristics (cancer and normal). As shown in Figures 2E and 2F, it was the result of the module trait relationship. The turquoise module of BC contained 4149 genes, and the red module of TC contained 162 genes. (turquoise module: r = 0.75, p = 2e−220; red module: r = 0.68, p = 1e−79).

Functional Analysis of the Key Genes
There were 4,149 co-expressed genes in the turquoise module of the BC dataset, and 162 co-expressed genes in the red module of the TC data set. There were 3736 DEGs in the BC data set, and TC has 2159 DEGs. 59 key genes in all were obtained from the intersection of the above four sets, and visualized by Venn plot ( Figure 3A).
We used GO annotation ( Figure 3B) and KEGG pathway enrichment analysis ( Figure 3C) to analyze 59 key genes. In terms of biological process, the analysis result was extracellular matrix organization and extracellular structure organization. For cellular component, the analysis result was collagen-containing extracellular matrix. Finally, in terms of molecular function, they were mostly enriched in the extracellular matrix structural constituent. We also used KEGG pathway enrichment analysis to determine the regulatory pathways. The analysis results shown that the regulated pathways include PI3K-Akt signaling pathway, focal adhesion, and other pathways related to BC and TC.

Obtaining Prognostic Genes from Hub Genes
We used STRING online data and Cytoscape software to construct 59 key genes Constructed PPI Network and nally obtained the results in ( Figure 4A). Then, the 10 largest hub genes of MCC were screened out, which were: DPT, OGN, LAMA2, FBLN5, TNXB, MYH11, EFEMP1, MFAP4, ADH1B, and MYOC ( Figure 4B).
We analyzed the survival of 10 hub genes in BC and TC using the R survival package (Supplementary Figure 1, Supplementary Figure 2) and got FBLN5 which were related to OS of BC ( Figure 4C) and TC ( Figure 4D). Next, Wilcoxon rank sum test ( Figure 4E, Figure 4F) and pairing ( Figure 4G, Figure 4H) were used to validated the expression level of FBLN5 in BC and TC samples.The results showed that compared with normal tissue, FBLN5 was signi cantly down-regulated in BC and TC .

GSEA Results
To determine the exact pathway that FBLN5 regulates in BC and TC, we performed GSEA using RNA-seq data of BC and TC. In the prede ned "KEGG pathway "genomes, the main enrichment pathways of BC were CYTOKINE CYTOKINE RECEPTOR INTERACTION, PRIMARY IMMUNODEFICIENCY etc ( Figure 5A).
Among the phenotypes with high FBLN5 expression, a total of 41 signi cant gene sets were found, of which 36 were up-regulated. The main pathways of TC enrichment were: PRIMARY IMMUNODEFICIENCY, STEROID HORMONE BIOSYNTHESIS, COMPLEMENT AND COAGULATION CASCADES etc ( Figure 5B). Among the phenotypes with high FBLN5 expression, a total of 47 signi cant gene sets were found, of which 42 were up-regulated. The common enrichment pathways of the BC and TC mainly included:

Discussion
Both the BC and TC have very high morbidity and mortality among the global population, which cause great harm to women and impose a heavy burden on society. The molecular mechanisms underlying the pathogenesis of BC and TC remain unclear. Therefore, it is imperative to identify the potential common genes that contribute to the sequential development of BC and TC.
From the bioinformatics analysis of BC and TC from TCGA, we found 59 key genes with the identical expression trend. These key genes were mostly enriched in the PI3K/Akt signaling pathway. Lots of studies have shown that PI3K/Akt signaling pathways were associated with cell proliferation, differentiation, and apoptosis, and that their component activation is pertinent to tumor occurrence, invasion, and metastasis [16]. PTEN is a lipid phosphatase that targets the PI3K pathway and is frequently mutated and deleted in a variety of cancers, including BC and TC [17]. It was shown that simultaneous inhibition of MAP kinase, PI3K/Akt pathways and HDAC induced certain non-thyroid cancer cells to strongly express sodium iodide (NIS) and ingest radioactive iodine [18]. The mammary gland, like the thyroid gland, has NIS and so may collect iodine as thyroid cells do [19]. Furthermore, NIS symbiosis plays a signi cant part in the process of iodine from extracellular space to thyroid cells to synthesize thyroid hormone. Fifty-nine key genes were also enriched in these pathways, such as focal adhesion and ECM-receptor interactions. Adhesion kinase is a central regulator of adhesion foci, which affects cell proliferation and migration, and studies addressed overexpression of focal adhesion kinase might be a part of the mechanism of TC invasion and metastasis [20]. ECM protein can stimulate invasive tumor cell programming and thus in uence the survival of tumor cells [21][22][23]. Focal adhesions are mechanically linked to cellular communication centers and ECM cells and are associated with tumor invasion. A variety of studies addressed that the early migration of tumor cells and invasion of adjacent tissues were mediated by focal adhesion signals [24].
In addition, we used String database and Cytoscape software to construct PPI, and de ned the 10 largest MCC genes as hub genes. Then we performed prognosis analysis on the 10 hub genes and obtained FBLN5, a gene related to the prognosis of both BC and TC.In the analysis, we found that FBLN5 was down-regulated in both TC and BC. The FBLN5 gene encodes a protein called Fibulin-5, which is a secreted extracellular matrix protein. FBLN-5 can affect vascular development and transformation [25], with close relation to the development and spread of tumors. Studies suggested that FBLN-5 can prevent tumor angiogenesis and thus inhibit tumor formation [26]. Hiromi et al [27] suggested that Fibulin-5 could affect tumor growth, which might be related to the balance between the antiangiogenic properties of bulin-5 and its direct effect on tumor cell proliferation and migration. YONG et al [28] suggested that FBLN5 enhanced TGF-β(transforming growth factor-β) and induced EMT (epithelial-to-mesenchymal transition), which are key processes conferring to the occurrence and progression of BC. A series of studies have suggested that integrin-binding domain of FBLN5 mutations can reduce the growth of pancreatic ductal adenocarcinoma [28]. In addition, the down-regulation of Fibulin-5 can affect the prognosis of ovarian cancer, liver cancer and other cancers [29,30].
Finally, we grouped FBLN5 into GSEA in all tumor samples from BC and TC, respectively. And we found an array of pathways enriched in BC and TC were involved in immune, hormonal, and carcinogenesis.
Second primary neoplasm may be caused by physical weakness of the person suffering from cancer, long-term exposure to cancer risk factors, destruction of immune tissues, disruption of body balance and some genetic mutations. There may also be immune de ciency, where long-term exposure to carcinogenic environments makes the body vulnerable to developing cancer due to a lack of immunity to various types of cancer [31].
Therefore, we believe that FBLN5 act a pivotal part in the occurrence and development of BC and TC, but the speci c mechanism still needs to be veri ed with future experimental and clinical evidences.

Conclusion
To our knowledge, this is the rst co-expression network being constructed using the WGCNA approach to investigate breast-and thyroid-cancer related modules and enrichment pathways.  Figure 1 The research work ow of this study.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. SupplementaryFile.doc