Prognostic Prediction, Immune Microenvironment, and Drug Resistance Value of Collagen Type I Alpha 1 Chain (COL1A1) in Pan-Cancer Analysis

Objective The protein encoded by collagen type I alpha 1 chain(COL1A1)is a component of the extracellular matrix, and critical for tumor microenvironments. However, it remains unclear that the expression level, prognostic prediction, and immunological value of COL1A1 in multiple cancers. Methods 4 proles and 6 hub genes were found by bioinformatics analysis. We further analyzed one of the hub genes, the COL1A1. And, we comprehensively analyzed gene expression and genetic alteration patterns among 33 types of malignancies from The Cancer Genome Atlas(TCGA). Besides, we explored the correlation of COL1A1 with cancer prognosis, immune inltrates, tumor mutational burden (TMB)/ microsatellite instability status (MSI), pathway and drug sensitivity analysis of co-expressed genes. results demonstrated that COL1A1 can serve as a prognostic and immunological biomarker in multiple cancers.


Introduction
Esophageal, gastric, and colorectal cancer are the most common types of gastrointestinal cancer. And, global cancer statistics in 2018 showed that colon, gastric, rectal, and esophageal cancer ranked the fourth, sixth, eighth, and ninth respectively in the incidence of human malignant tumors, and their mortality rates were 5.8%, 8.2%, and 3.2%, 5.3%, respectively [8] . However, it is found that gastrointestinal cancer patients might experience multiple primary tumors in the digesting tract. Yoshida et al. reported that 38% of esophageal squamous cell carcinoma patients experienced multiple primary cancers, of which 15% underwent gastrointestinal cancer, mainly in the stomach and colorectal [53] . Moreover, Ste et al. found that approximately one in 20 patients with primary esophageal squamous cell carcinoma had a second primary tumor in the upper digestive tract or stomach [50] . Furthermore, it was reported that several cases related to the synchronous or metachronous primary gastrointestinal tract malignancies [3; 7; 23; 54] .
Hence, we identi ed some hub genes including Collagen Type I Alpha 1 Chain COL1A1 of multiple primary tumors in the gastrointestinal tract based on bioinformatics technology. COL1A1 is the gene which encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha 1 chains and one alpha 2 chain [39] . The protein encoded by which is an important component of the extracellular matrix(ECM). COL1A1 is known to be overexpressed in several cancers other than gastrointestinal cancer [44; 52; 57] , such as thyroid cancer [18] , lung cancer [13] , breast cancer [31] , renal cancer [5] , etc. However, there is no pan-cancer analysis to comprehensively elucidate the potential role of COL1A1 in various tumor types. Thus we expand our research from gastrointestinal cancer to pan-cancer analysis of COL1A1.

Identi cation of the DEGs in Pro le
GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r) is a tool for analyzing differentially expressed genes in the GEO database, which can compare the expression of genes in tumor and normal samples. Adj.p < 0.05 and |logFC| > 1 were set as the cut-off criterion to select DEGs for these microarray, respectively.
Then the overlapping DEGs among the four datasets were identi ed by the online Venn diagram tool(http://bioinformatics.psb.ugent.be/webtools/Venn/).

Gene ontology (GO) and pathway enrichment analysis of DEGs
Database for Annotation, Visualization and Integrated Discovery (DAVIDv6.8 https://david.ncifcrf.gov/) provides a comprehensive set of functional annotation tools to understand the biological meaning behind a large number of genes [19] . DAVID was employed to carry out gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis of DEGs. The GO and KEGG pathways were plotted by http://www.bioinformatics.com.cn, an online platform for data analysis and visualization.

Hub genes screening from the PPI network
The protein-protein interaction (PPI) network of differentially expressed genes was constructed based on the online website STRING (STRING; http://string-db.org) (version 11.0) [47] , and was further illustrated by the Cytoscape software [43] . The MCODE plug-in in Cytoscape was utilized to identify key modules and hub genes. The preferred cut-off values were determined as degree cut-off values = 2, max. depth = 100, node score = 0.2 and the k-score = 2.

Gene expression analysis in pan-cancer
Oncomine(https://www.oncomine.org) database is currently the world's largest cancer gene-chip database and integrated data-mining platform, which can analyze differential gene expressions in normal and tumor tissues [41] . Firstly, we used Oncomine to analyze the differential expression of COL1A1 between tumor tissues and normal tissues. Next, the "Gene_DE" module of TIMER2(tumor immune estimation resource, version 2) (http://timer.cistrome.org/) was employed to analyze the differential expression of COL1A1 in different tumors and normal tissues [28] . For those tumors lack of normal or with highly limited normal tissues, the "Expression analysis-Box Plots" module of the GEPIA2 (Gene Expression Pro ling Interactive Analysis, version 2) (http://gepia2.cancer-pku.cn/#analysis) [48] was used to obtain the expression difference between these tumor tissues and the corresponding normal tissues. In addition, we explored the COL1A1 expression difference in different stages by the "Pathological Stage Plot" module of GEPIA2.

Genetic alteration analysis
CBioportal(https://www.cbioportal.org) is an online database which provides visualization, analysis, and download of large-scale cancer genomics data [9] . Herein, cBioportal was employed to obtain alteration frequency, mutation type of COL1A1 across all TCGA tumors. Then we explored the overall, diseasespeci c, progression-free, and disease-free survival differences with or without SND1 genetic alteration of the most alteration frequency type of tumor.

Methylation Pro le of COL1A1
UALCAN is a comprehensive and interactive web resource for analyzing cancer OMICS data [10] . We investigated COL1A1 promoter DNA methylation level in gastrointestinal cancer and some certain types of cancer by UALCAN. MEXPRESS is a web tool which can visualize DNA methylation, expression and clinical data [25] . MEXPRESS was employed to determine the association between COL1A1 DNA methylation and clinical data.

Survival prognosis analysis
We used the "Survival Map" module of GEPIA2 to analyze the overall (OS) and disease-free survival (DFS) of COL1A1 among all tumors. Furthermore, the "Survival Analysis" module was used to further analyze the survival outcome of the speci c type of tumor.

Immune in ltration analysis
The occurrence and development of tumors are closely related to the tumor immune microenvironment.
We used the "Immune-Gene" module of the TIMER2 to analyze the relationship between COL1A1 expression and cancer-associated broblasts, CD8+ T cells, and macrophages. TISIDB(http://cis.hku.hk/TISIDB/index.php) is an integrated repository portal for tumor and immune system interaction, which integrates multiple heterogeneous data types [42] . We used TISIDB to analyze COL1A1 expression and tumor-in ltrating lymphocytes(TILs), immunoinhibitor, immunostimulator, and MHC molecule.  [49] ; MSI is derived from the Landscape of Microsatellite Instability Across 39 Cancer Types article published by Russell Bonneville et al. in 2017 [6] . R software v4.0.3 was utilized for statistical analysis. If not otherwise stated, the rank-sum test detects two sets of data, and a P-value of <0.05 is considered statistically signi cant.
2.11 Genes co-expressed with COL1A1 in pan-cancer: pathway and drug sensitivity analysis We used the STRING database (https://string-db.org/) [47] to obtain the top 50 proteins most relevant to COL1A1 protein expression. The parameters are set as follows: minimum required interaction score ["Low con dence (0.150)"], meaning of network edges ("evidence"), max number of interactors to show ("no more than 50 interactors" in 1st shell), and active interaction sources ("Experiments"). Moreover, we used the GEPIA2 database to obtain the top 100 genes most correlated to COL1A1. Then we drew the Venn diagram to get the overlapping genes by the online Venn diagram tool(http://bioinformatics.psb.ugent.be/webtools/Venn/). Next, we used GSCALite (http://bioinfo.life.hust.edu.cn/web/GSCALite/) [30] to conduct pathway and drug sensitivity analysis on COL1A1 and the obtained genes.

Identi cation of DEGs in gastrointestinal cancer
Based on the cut-off criterion, four mRNA expression pro les of esophageal squamous cell carcinoma, esophageal adenocarcinoma, gastric adenocarcinoma, and colorectal adenocarcinoma were extracted from the GEO database, including GSE20347, GSE92396, GSE103236, and GSE110224, respectively. After analyzing by the GEO2R tool and Venn diagram, a total of 21 DEGs were identi ed. And, 19 genes were up-regulated, and 2 genes were down-regulated. The Venn diagram was shown in the Figure.1a and Figure.1b.

GO and KEGG Analysis for the DEGs in gastrointestinal cancer
For all DEGs, extracellular regions, cell surfaces, proteinaceous extracellular matrix, and collagen trimer were mostly enriched in cellular components(CC). The biological process(BP) mainly included the collagen catabolic process, extracellular matrix organization and disassembly, cell adhesion, cartilage development, and skeletal system development. As for molecular function(MF), the DEGs were related to extracellular matrix structural constituent, serine-type endopeptidase activity, metalloendopeptidase activity, protein binding, and extracellular matrix binding. ( Figure.1c) Furthermore, the KEGG pathway includes focal adhesion, ECM-receptor interaction, PI3K-Akt signaling pathway, protein digestion and absorption, and transcriptional misregulation in cancer( Figure. 1d).

PPI network construction and modules analysis
The PPI network of the DEGs was constructed through the STRING online website (Figure.1e). Thus the data was imported into Cytoscape software for visualization, and the MCODE plug-in was used to further screen the hub gene. Finally, a total of 6 hub genes were identi ed, namely SPP1, BGN, THBS2, MMP3, COL1A1, and TIMP1( Figure. 1f). All the hub genes were up-regulated in tumor tissues.
The above results showed that the hub genes were mainly enriched in the extracellular matrix by GO and KEGG. And the protein of COL1A1 is an important component of the extracellular matrix. Moreover, we found that COL1A1 was highly-expressed in other cancers in addition to gastrointestinal cancer [5; 13; 18; 31] . Therefore, we aimed to explore COL1A1 in pan-cancer.

Gene expression analysis of COL1A1 in pan-cancer
Through the Oncomine database, we found that COL1A1 was highly expressed in a variety of tumors( Figure.  We further analyzed the expression of COL1A1 in different tumor stages. The results showed that COL1A1 expression is closely related to the late stage of ACC, BCLA, ESCA, KICH, KIRP, STAD, and THCA( Figure. 2d).

Genetic alteration analysis of COL1A1
We found COL1A1 mutation in various tumors by cBioportal. Among them, the highest mutation frequency was in melanoma(16.22%). Breast invasive carcinoma had the highest ampli cation frequency(6%) and sarcoma had the highest fusion frequency of COL1A1. It is noteworthy that all mesothelioma cases had copy number ampli cation(4.6%) and all uveal melanoma had deletion of COL1A1(1.25%) (Supplementary Figure.1a). Furthermore, the gure presented the types, sites, and case number of the COL1A1 genetic alteration. The most common alteration type was missense mutation(Supplementary Figure.1b). In addition, we analyzed the relationship between genetic changes and the prognosis of melanoma. The results showed that the alternations of COL1A1 are related to the better prognosis of melanoma with overall survival(p=0.0158), progression-free survival(p=0.0385), and disease-speci c survival(p=0.0361). Although these alternations showed a tendency of better diseasefree survival, there is not statistically signi cant(p=0.395) (Supplementary Figure. Figure.2b). The results showed that COL1A1 expression was signi cantly correlated with DNA methylation of CPG islands(p < 0.01).

The relationship between survival and gene expression of COL1A1 in pan-cancer
In order to analyze the correlation between COL1A1 gene expression and prognosis, we used GEPIA2 to analyze the OS and DFS of COL1A1 among different cancers. The results indicated that high expression of COL1A1 is related to poor overall survival of KIRP (HR=2.

Discussion
Recently, reports have been gradually increasing in multiple primary tumors of the digestive tract.
However, it is di cult to distinguish them from metastatic cancer, and the diagnosis is relatively di cult. In this study, we screened out 4 chips of esophageal, gastric, and colorectal cancer from the GEO database and identi ed 21 DEGs by using bioinformatics analysis. The proteins encoded by these genes were mainly enriched in the extracellular region, cell surface, proteinaceous extracellular matrix, and collagen trimer. Further PPI network analysis identi ed six hub genes, namely SPP1, BGN, THBS2, MMP3, COL1A1, and TIMP1. All of these genes were upregulated in gastrointestinal cancer. SPP1, also known as osteopontin, is a secreted arginine glycine aspartic acid-containing phosphorylated glycoprotein. It could upregulate the expression of interleukin-10 (IL-10), interleukin-12 (IL-12), interferon-γ (IFN-γ), and is essential in the pathway that leads to type I immunity [20; 24] . BGN is an important component of the extracellular matrix and plays a vital role in the regulation of epithelial cell morphology, growth, migration and differentiation. It has been con rmed that BGN is upregulated in gastrointestinal cancer and promotes resistance to chemotherapy [16; 29; 60] . THBS2 belongs to the thrombospondin family and is a versatile glycoprotein that can be secreted by stromal broblasts, endothelial cells, immune cells, etc [1] . It is related to angiogenesis and tumor metastasis [21] . The protein encoded by TIMP1 is a natural inhibitor of matrix metalloproteinases (MMPs) [27] . It can regulate tumorigenesis through ERK, JAK/STAT, and MAPK signaling pathways [45] . Previous researches have con rmed that these genes were closely related to tumor invasion, metastasis, and drug resistance in esophageal, gastric, and colorectal cancer [2; 16; 29; 45; 55] . Therefore, these hub genes could be viewed as the candidates as prognostic biomarkers for multiple primary tumors of the digestive tract.
COL1A1 is the main member of the type I collagen family, and the protein is a component of the extracellular matrix. Liu et al found that COL1A1 was closely related to Helicobacter pylori (+) gastric cancer through bioinformatics technology [33] . Moreover, it is associated with serous membrane in ltration, lymph node metastasis, and hematogenous metastasis in colorectal cancer regardless of KRAS mutation [57] . Through literature review, we found that it is not only highly expressed in gastrointestinal cancer, but also highly expressed in other tumors, and is involved in tumorigenesis, metastasis, and immune in ltration [12; 34; 56] . Besides, it is the downstream molecule of multiple microRNAs. MiR-133b, miR-29b-3p, miR-133a-3p, and miR-196b-5p can inhibit tumor invasion and metastasis, and reverse drug resistance through targeting COL1A1 [14; 22; 52; 59] . Therefore, we expanded our research scope to pan-cancer.
The tumor microenvironment(TME) is a dynamic, complex, and heterogenic system which is mainly composed of ECM, blood vessels, tumor-in ltrating immune cells, CAFs, and cytokines, etc [37; 40; 46] . It plays a critical role in tumorigenesis and metastasis and is responsible for cancer-target therapies. An important nding of this study was that the expression of COL1A1 was positively correlated with the abundance of CAFs in almost all tumors. Minna et al. found that CAFs coexisted with senescent thyroid cells at the tumor invasive front, accompanied by the deposition of COL1A1 [36] . This phenomenon needs further validation in other tumors. Besides, the expression of COL1A1 was found negatively correlated with CD8+ T cells in CESC, HNSC-HPV+, and SKCM, which may associated with deletion of CD8 + T cells induced by CAFs [26] . Furthermore, we also found that the expression of COL1A1 was positively correlated with macrophage in ltration in gastrointestinal cancer, BLCA, and HNSC. In the future, more experiments could focus on whether targeting COL1A1 can we inhibit CAFs and macrophages, and promote tumorspeci c CD8+ T cell differentiation, thus improving immunotherapy resistance. TMB and MSI are important predictors for the e cacy of cancer immunotherapy. By analyzing the information obtained from the TCGA database, we found that the high expression of COL1A1 was associated with high TMB and MSI in many tumors. After literature review, we found that there was no research regarding the relationship between COL1A1 expression and TMB and MSI. Therefore, this study provided a good basis for predicting the e cacy of immunotherapy, but more evidence is needed to support this result.
In addition, we found that COL1A1 was closely related to activation of EMT through pathway enrichment analysis. EMT is a dynamic process that inactive polarized epithelial cells transit to active mesenchymal cells [38] . Researches have found that knockdown of COL1A1 can inhibit the EMT process, thereby inhibiting the invasion and metastasis of liver cancer and bladder cancer [34; 58] . Furthermore, through literature review, we found that the up-regulation of COL1A1 was related to radioresistance, and experiments have con rmed that knockdown of COL1A1 can increase the radiosensitivity of nasopharyngeal and cervical cancer [15; 32] . Moreover, EMT was related to radioresistance [35] , which indicated that the radioresistance of COL1A1 may be related to EMT. More researches could be made to further verify whether the process of EMT can be down-regulated by inhibiting COL1A1, thus reducing tumor metastasis, and improving chemoresistance, radioresistance , and immunoresistance.
Nevertheless, although we employed multiple bioinformatics databases to analyze the role of COL1A1 across 33 tumors, this study still has some limitations. Firstly, the current study merely focused on COL1A1 at the genetic level, but lacked description of its protein transcriptome and phosphorylation level.
Secondly, the results of our study lack external validation in other public datasets. Finally, the databases used in this study were mainly based on the gene-chip and sequencing data of tumor tissue from TCGA, so the cell-level analysis of immune cell markers could have introduced systematic bias. Future studies should focus more on using skills like single-cell sequencing technology.

Conclusion
In conclusion, we screened 4 chips, and found 6 hub genes including COL1A1, and rstly conducted a pan-cancer analysis to illustrate the gene expression, prognosis, and immunological role of COL1A1. The results demonstrated that COL1A1 can serve as a prognostic and immunological biomarker in multiple cancers.

Declarations
Funding This study was supported by the Natural Science Foundation Project of Chongqing Science and Technology Commission (CSTC), China (grant no. cstc2018jcyjAX0012).
Con icts of Interest The authors declare that the research was conducted in the absence of any commercial or nancial relationships that could be construed as a potential con ict of interest.
Availability of data and material All data is available under reasonable request.
Code availability Not applicable.
Author Contributions YL wrote the initial manuscript. JX, MZ, ZW, and HE edited and approved the nal manuscript. All authors contributed to the article and approved the submitted version.
Ethical approval This article does not contain any studies with human participants or animals performed by any of the authors.
Consent to participate Not applicable.
Consent for publication All authors consent to the publication of this study.