CILP Inhibits Brain Metastasis by Meditating Mast Cells via the MAPK Signaling Pathway in Breast Cancer

Background: Approximately 15–30% of patients with breast cancer (BRCA) eventually develop brain metastases (BMs) with high morbidity and mortality. Herein, we aimed to identify genes specic to breast cancer brain metastases (BCBM) from an immune inltration perspective. Methods: GSE100534 and GSE125989 were obtained from the NCBI Gene Expression Omnibus (GEO), then performed normalization using Rstudio and perl 5. We constructed a Weighted Gene Co-Expression Network Analysis (WGCNA) and obtained differentially expressed genes (DEGs) in BMs sample compared with primary BRCA tissue. Then we performed GO and KEGG pathway analysis. The LinkedOmics and UALCAN analysis showed the expression of gene in BRCA. The Kaplan-Meier plotter database was used to evaluate the prognosis. The composition of signicant tumor-inltrating immune cells was assessed using the CIBERSORT algorithm. Spearman’s correlation analysis revealed the correlation between CILP gene and immune cells in TCGA cohort and Timer database. Using GSEA analysis, we conducted to identify the potential pathways in BCBM. Results: The cartilage intermediate layer protein (CILP) was a late event in BRCA (stage III to IV) with poor prognosis (P< 0.05). LinkedOmics showed that the mRNA expression of CILP was down-regulated in advanced cancer (P< 0.05). Besides, UALCAN analysis showed that CILP expression was downregulated in HER2-positive and triple-negative breast cancer which were more prone to BMs (P< 0.05). CILP was the hub gene which was signicantly associated with BCBM identied by WGCNA (R2=−0.6, P=3e-06). We found that the resting inltration of mast cells in the BCBM group was signicantly lower than that in the primary BRCA group (P= 0.01). In addition, Spearman’s correlation analysis revealed that the expression of CILP positively correlated with that of mast cells (P< 0.05). Finally, the FCERI-mediated MAPK activation (NES=2.1847, P=0, FDR=0.0031),


Background
Of all the solid tumors, breast cancer (BRCA) is the second most common cause of brain metastases (BMs) (1)(2)(3). Approximately 15-30% of patients with BRCA eventually develop BMs with high morbidity and mortality (3,4). BMs occur in 30-55% of HER2-positive (HER2+) metastatic BRCA patients and up to half die from intracranial progression (5), whereas the median survival rate is only 6 months in triplenegative BRCA (TNBC) with BMs (6,7). Unfortunately, effective treatment is not available since the central nervous system (CNS) is traditionally considered to be a privileged site due to blood-brain barrier (2,8). As a result, identi cation of genetic and epigenetic alterations is essential to the development of BMs targeted therapies (9).
Pursuing of the mechanism involved in BMs has never stopped although it still remains largely unclear. Bos et al. showed that ST6GALNAC5 acted as a speci c mediator of BMs through enhancing its adhesion to brain endothelial cells and facilitating crossing the blood-brain barrier by BRCA cells (10).
Highly sialylated N-glycans which were up-regulated in the brain-seeking cell line 231BR likely play a crucial part in breast cancer brain metastasis (BCBM) evaluated by integrated transcriptomics, glycomics, and proteomics (11). GABAA receptor alpha3 (Gabra3), normally exclusively expressed in the adult brain, was inversely correlated with BRCA survival and promoted BRCA cell invasion and BMs by activating the AKT pathway (12). Recently, it was also reported that YTHDF3 could enhance the translation of m6Aenriched transcripts for ST6GALNAC5, EGFR, and GJA1 thus prompting BCBM (13).
The tumor microenvironment, composed of diverse immune cells, tumor cells, and cytokines, has both an adverse and bene cial impact on tumorigenesis (14). The is growing evidence indicating that an ineffective immune response in uences the behavior of BRCA, suggesting that it is an immunogenic cancer type as well (15). Berghoff et al. demonstrated that the immune reaction to BMs was mostly characterized by activation of microglia/macrophages and upregulation of biomarkers involved in phagocytosis (16). However, an understanding of the microenvironment and its underlying molecular mechanisms contributing to the BCBM are limited (17).
Herein, especially from an immune in ltration perspective and using large TCGA and the GEO datasets of BRCA samples, we used a series of bioinformatics tools, such as weighted gene co-expression network analysis (WGCNA), GSEA analysis, and CIBERSORT estimation to identify genes speci c to BCBM.

Data acquisition and processing
The invasive carcinoma (BRCA) datasets GSE100534 and GSE125989 were obtained from the NCBI Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/gds). GSE125989 consists of 32 BRCA samples with matched primary BRCAs and BM from 16 patients analyzed using the GPL571 ([HG-U133A_2 Affymetrix] Human Genome U133A 2.0 Array) platform. As for the GSE100534, we obtained 19 BRCA samples including 3 BCBM samples and 16 BRCA samples analyzed using the GPL6244 ([HuGene-1_0-st] Affymetrix Human Gene 1.0 ST Array [transcript (gene) version]) platform. The R packages of "GEOquery" and "oligo" were used to access and process and normalize data, using the (Robust Multichip Average (RMA) algorithm) and normalization of the raw data (.CEL les) and matrix construction. The R packages of "hugene10sttranscriptcluster" (GPL6244) and "hgu133a2" (GPL571) were respectively used to match the probe to their gene symbol, and the probes matching several genes were removed. For genes matched by multiple probes, we selected the probe with the highest expression average in the samples. We combined the GSE100534 and GSE125989 datasets by the perl script, then used the "sva" package (under the R environment, version 3.6.3) to preprocess and remove the batch effect and eventually obtained a merged dataset. The Fragments Per Kilobase Million (FPKM) data of TCGA RNA-Seq were downloaded from TCGA (https://portal.gdc.cancer.gov/) and contained 1208 samples (normal: 112; tumor: 1096). The clinical follow-up information contained 1097 samples. For our study, samples with no clinical information were excluded.

Analysis of differentially expressed genes
We performed a differential analysis comparing primary BRCA tissues and BCBM tissues using the merging GEO dataset. Differentially expressed genes (DEGs) were displayed in a volcano plot and heatmap, screened, and then matrixes were constructed using the "limma" package in RStudio using log2FoldChange (FC)≥ 1 and adjusted P-value<0.05 as cut-off values.

Construction of weighted gene co-expression network analysis
The data used were from the merging dataset of GSE100534 and GSE125989 as described above from the GEO database. We chose all 11,786 genes for the weighted gene co-expression network analysis (WGCNA), then the R package WGCNA (18) was conducted, and the power parameter was pre-calculated using the pickSoftThreshold function. By calculating scale-free topology t exponentials for several powers, it is possible to provide appropriate soft threshold power for network construction. After choosing appropriate soft-thresholding power, the adjacency was transformed into topological overlap matrix (TOM), which measured the network connectivity of a gene (19). In order to classify genes with similar expression pro les into gene modules, an average linkage hierarchical clustering was carried out according to the TOM-based dissimilarity measure with a minimum module size of 30 for the gene dendrogram (20). To further analyze the module, we merged highly similar modules with the dissimilarity of<0.25 by implementing clustering of module eigengenes. Then we calculated the correlation between module eigengenes (MEs) and clinical features of BRCA. As a representative of all the genes in each module, MEs were de ned as the rst principal component of each gene module. Gene signi cance (GS) was de ned as the log 10 transformation of the P-value (GS=lgP) in the linear regression between gene expression of the module and clinical features. In addition, module signi cance was de ned as the mean GS for all genes in the module. The visualization in gene network of eigengenes was also carried out and displayed by Cytoscape 3.8.0.

Biological function and pathway enrichment analysis
In gene networks that conform to scale-free distributions, genes with similar expression patterns could be synergistically regulated, pathway shared, or functionally related. We selected the module of the most relevant to clinical characteristics, and then we performed gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis via the "clusterpro ler" package in RStudio software to acquire the enriched biological process and KEGG pathways for further analysis (21).

Kaplan-Meier Plotter Database analysis
The correlation between CILP expression and survival in BRCA was analyzed by Kaplan-Meier plotter (http://kmplot.com/analysis/). The hazard ratio (HR) with 95% con dence intervals (CI) and log-rank P-values were computed.

Analysis of CILP alterations in breast cancer samples
The LinkedOmics database (http://www.linkedomics.org/login.php), which is applied to analyze 32 TCGA cancer-related multi-dimensional datasets, is a Web-based platform. The data types include microRNA, SNP, methylation status, gene mutations, and clinical data (22). We performed a non-parametric analysis of the clinical data in TCGA BRCA cohort, the transcriptional variation in the levels of CILP expression in different pathologic stages (Kruskal-Wallis Test) and pathology M stage (Wilcox Test) was evaluated using this platform. Next, we applied UALCAN (http://ualcan.path.uab.edu/) (23) to analyze the transcriptional levels of CILP in breast invasive carcinoma and their association with intrinsic BRCA subclasses.

GSEA analysis
Samples from the TCGA were divided into two groups based on the expression of CILP by the surv_cutpoint function (res.cut value) and gene set enrichment analysis (GSEA) software (http://software.broadinstitute.org/gsea/index.jsp) was applied in the two groups to identify the pathways associated with change in CILP. The cut-off standard for the GSEA were P-values <0.05 and false discovery rate (FDR)<0.25.

TIMER Database Analysis
TIMER is a comprehensive resource for systematic analysis of immune in ltrates across multiple cancer types (http://timer.comp-genomics.org/) (24). Statistical analyses that correlated the expression of CILP and the presence of mast cells using Spearman's statistical analysis. The results an indication of the purity-adjusted partial Spearman's rho value as a degree of their correlation.

CIBERSORT Estimation
We assessed 22 types of immune cell utilizing the CIBERSORT algorithm. Only samples with a CIBERSORT output of P<0.05 were considered worthy of further analysis. Signi cant differences in the proportion of immune in ltrating cells between the primary BRCA tissue and BM tissue were determined using the Wilcoxon rank-sum test. In ltration of immune cells from adjacent normal tissue and BRCA tissues were also analyzed. The level of immune cell in ltration in each sample was combined with the pathological stage using EXCEL, and samples lacking clinical information were deleted; thus, 800 BRCA samples were ultimately obtained for further analysis.

Statistical analysis
All statistical analyses were carried out using Rstudio 3.6.3 and perl 5 (v5.30.0). We performed survival analysis using the "survminer" and "survival" packages in RStudio software. We then used surv_cutpoint function to get the best cutoff dividing genes into high and low expression groups for clinical analysis.

Results
CILP acted as a late event in BRCA with poor prognosis The Function module of LinkedOmics was used to analyze mRNA sequencing data from 1093 BRCA patients in TCGA. Generally, CILP mRNA levels were reduced with higher grade of clinical stage ( Figure   1a). Thus, it was lower in the M1 stage than in the M0 stage ( Figure 1b). As shown in Figure 1c, the mRNA expression of CILP different across the normal and Luminal, Her2 positive, and TN intrinsic subclasses of BRCA. The expression of CILP was signi cantly lower in HER2 positive and TNBC than luminal subclass. To further examine the prognostic potential of CILP in BRCAs, Kaplan-Meier plotter database was used to evaluate the CILP prognostic value. Interestingly, the poor prognosis in BRCA was shown to correlate with lower CILP expression ( Figure 1d). However, CILP expression had no effects on survival in early BRCA (stage I to II), and showed a worse overall survival (OS) in advanced BRCA (stage III to IV) of RNA-seq data (Figure 1e).
From the above ndings, CILP may serve as a late event occurring in BRCA with poor prognosis.
Identi cation of markedly different expressed genes between breast cancer tissue and brain metastases tissue from breast cancer To identify DEGs between BRCA tissue and BM tissue from BRCA, data from 51 samples from two independent mRNA expression arrays of GSE100534 and GSE125989 datasets (25,26) were downloaded from the GEO database and normalized and merged using the "oligo" and "sva" packages in RStudio ( Figure 2a). According to the thresholds set (adjusted P-value<0.05 and log2FC ≥ 1), 90 DEGs (62 downregulated and 28 upregulated) were identi ed ( Figure 2b). Subsequently, we visualized the 90 DEGs in the form of heatmap including all samples (Figure 2c).
CILP was the hub gene signi cantly associated with BCBM identi ed by construction of weighted coexpression network (WGCNA) We used the merged data to construct the gene co-expression networks, which included a total of 51 samples complete with clinical features data for the WGCNA. Raw data were preprocessed identically using RStudio for background correction and normalization. Finally, we obtained a total 11,786 genes, and all were chosen for WGCNA. WGCNA is a systematic biological approach used to analyze the expression patterns of multiple genes in different samples, which forms clusters or modules containing genes with the same expression pattern (27). If certain genes are located in the same module, they are likely to possess the same biological functions and the association between modules and sample characteristics such as clinical traits can be studied (28,29). The selection of the soft-thresholding power is an important step when constructing a WGCNA. We performed the network topology analysis for thresholding powers from 1 to 20 and identi ed the relatively balanced scale independence and mean connectivity of the WGCNA. In this study, we selected the power of β=8 (scale free R^2=0.83) as the softthresholding to achieve a scale-free network (Figures 3a, 3b). As a result, we set the cut height as 0.25 to merge similar modules (Figure 3c). Twenty gene-co-expression modules were identi ed using merged dynamic tree cut (Figure 3d). The yellow module showed the highest correlation with BM phenotype of invasive BRCA (Figure 3e). There were 399 genes in the yellow module, data for yellow module were selected for further analysis. Subsequently, we identi ed the module containing CILP, that is., the yellow module, as a key module accounting for the highest correlation with BRCA brain metastasis phenotype (R2=−0.6, P=3e-06) (Figure 3e). A total of 24 genes, including the CILP gene, were identi ed as candidate hub genes for the yellow module (Figure 3f). It is well-known that CILP is an inhibitor of transforming growth factor (TGF)-beta signaling. Interestingly, TGF-beta can enable cells to acquire a migratory and in ltrating phenotype, thus enabling the spread of cancer cells.
We further performed enrichment analysis to explore the biological processes and pathways in which the yellow module was involved. GO biological processes and the KEGG functional enrichment analyses were conducted using the R package "clusterPro ler" (21). GO terms such as extracellular matrix organization, collagen bril organization, cell-substrate adhesion, epithelial cell proliferation, transmembrane receptor protein serine/threonine kinase signaling pathway, response to TGF-beta, collagen metabolic process, were mainly enriched in the "biological process" category ( Figure 3g). The results of the KEGG enrichment analysis revealed that focal adhesion, complement and coagulation cascades, proteoglycans in cancer, ECM-receptor interaction, relaxin signaling pathway, TNF signaling pathway, apoptosis, TGF-beta signaling pathway, and PI3K-AKT signaling pathway, and many other related KEGG biological pathways were both important for the development of carcinoma (Figure 3h). Moreover, we constructed a network of protein-protein interaction (PPI) for all genes in the yellow module by Cytoscape, which consisted of 56 nodes and 366 edges according to the weight of the edge (≥0.07) (Figure 3i).
In summary, we may conclude that the CILP gene is signi cantly associated with the BM phenotype of BRCA. .001), suggesting that CILP was associated with the immune microenvironment of BRCA. Of these, resting mast cells involved in brain metastasis had a signi cantly weak positive correlation with the expression of CILP. In the Timer database, we also found a weak positive association between CILP and mast cells (Figure 4i). Essentially, resting mast cells showed a lower expression of stage IV of BRCA, and the expression of CILP was also found lower in stage IV of BRCA, which is consistent with the positive association between CILP and resting mast cells.
Thus, it is reasonable to speculate that CILP may in uence the progression of BRCA favoring BMs through the in ltration of resting mast cells.

GSEA analysis identi ed the potential pathways involved in BCBM
To identify the potential function of CILP genes in BRCA samples from TCGA database, GSEA was used to identify the potential pathways involved in brain metastasis processes. Nine enriched pathways were detected, including the B cell receptor signaling pathway; Hedgehog signaling pathway; lysosome, ubiquitin mediated proteolysis; DNA double-strand break response; antigen activation of the B cell receptor (BCR) leading to generation of second messengers; CD22-mediated BCR regulation; creation of C4 and C2 activators; and FCERI-mediated MAPK activation; all of which were closely related to tumor occurrence, development, and metastasis (P-value<0.05, FDR<0.25) ( Figure 5). Discussion CILP, namely cartilage intermediate layer protein 1, plays a crucial role in Lumbar disc disease (LDD) through negative regulation of TGF-beta signaling (30). A recent study reported that CILP seemed to be a promising candidate as a biomarker for cardiac fibrosis (31). However, to the best of our knowledge, there have been no documented reports of a possible relationship between CILP expression and BCBM.
In our study, using WGCNA we found CILP was the hub gene signi cantly associated with BCBM (Fig3 e, f). Further analysis showed it might be a late event in BRCA and had obvious clinical signi cance (Fig1 ab, d-e). Moreover, it was differentially expressed in TNBC and HER2+ BRCAs compared to normal and luminal type BRCA (Figure 1c), which was consist with the fact that these two subtypes were more prone to the brain metastasis of BRCA. Hence, it is reasonable to assume that it may represent a new biomarker indicative of BCBM.
Previous studies have suggested that mast cells, termed tumor-associated mast cells, in ltrate a variety of hematological and solid tumors, such as stomach, thyroid, melanoma, pancreas, prostate, and BRCA (32)(33)(34)(35)(36). Mast cells have been reported as having both antitumorigenic and pro-tumorigenic effects in cancers (32). To date, most studies have supported signs of mast cell CELL in ltration as a marker for favorable prognosis in BRCA, suggesting that mast cells might contribute to anti-tumoral functions of BRCA (37,38). Rovere et al. documented that there is evidence of mast cell accumulation around tumors in highly hormone-receptive BRCAs, which seemed to oppose tumor activity via their cytolytic activities (39). It has also been demonstrated that both activated and resting mast cells and basophils were able to increase the proliferation and survival of naive and activated B cells, and promoted their differentiation into antibody-producing cells (40). Likewise, our study suggested that resting mast cells had lower expression levels in BCBM than in primary BRCA, and were still lower than levels expressed in BRCA than in adjacent normal tissues (Fig4 d-e). Subsequently, activated mast cells were signi cantly upregulated in BRCA tissues compared with adjacent normal tissues (Fig4 e), whereas there was no signi cant difference between primary BRCA and BCBMs (Fig4 d). In the Timer database, we found CILP expression was positively correlated with mast cells (Fig4 i). In our study, we found a positive correlation between CILP and mast cells resting (Fig4 h). Consequently, we reasonably suspect that with the development of BRCA, CILP expression becomes lower, impairing the anti-tumor function of mast cells, and thus prompting tumor invasion and metastases.
In our study, nine potential pathways were enriched in the BCBM (Fig5). Among these, the activation of B cell-related signaling pathways including the B cell receptor signaling pathway, antigen-activated B cell receptor leading to generation of second messengers, and CD22 mediated BCR regulation may play an important role in the occurrence and development of B cell-mediated tumors. The Hedgehog signaling pathway is vital for BRCA progression and metastasis (41). The pathways involving lysosome-and ubiquitin-mediated proteolysis can mediate the degradation of target proteins. The FCERI-mediated MAPK activation cascade can regulate mast cell activity (42). In combination with the study above, we hypothesis that CILP may exert an anti-tumoral role by mediating the activity of mast cells through the MAPK signaling pathway.
Our study has several limitations. First, only two datasets were included in the study which may lead to research bias for lacking of adequate data; Second, because of the limitation of online databases, we did not carry out additional omics analysis such as proteomics to strengthen our results; Third, no external datasets were used to verify the conclusion, and if there was any that would be more supportive and practical. Therefore, further research is needed to explore whether CILP can be exploited as a biomarker speci c to BCBM and its potential detailed mechanism.
Conclusions our study showed that CILP may act as an anti-tumor mediator by interacting with mast cells through the MAPK signaling pathway in BCBM and thus, has great clinical transformational signi cance.

Competing interests
The authors declare that the research was conducted in the absence of any commercial or nancial relationships that could be construed as a potential con ict of interest.   downregulated genes) in merged datasets. Green, black, and red respectively represent the differences in lower, medium, and high expression levels for different genes. Type: Blue and purple respectively represents BRCA brain metastases tissues (BCBM) and BC tissues.   Gene set enrichment analysis using TCGA cohort Listed are only the nine most common functional pathways of KEGG and REACTOME pathway gene sets enriched in BRCA samples with low expression of the CILP gene. CILP gene expression was divided into two groups of high and low expression through the surv_cutpoint function of "survminer" package. False discovery rate (FDR), nominal P-value, and the