3.1 Identification of DEGs in gastrointestinal cancer
Based on the cut-off criterion, four mRNA expression profiles of esophageal squamous cell carcinoma, esophageal adenocarcinoma, gastric adenocarcinoma, and colorectal adenocarcinoma were extracted from the GEO database, including GSE20347, GSE92396, GSE103236, and GSE110224, respectively. After analyzing by the GEO2R tool and Venn diagram, a total of 21 DEGs were identified. And, 19 genes were up-regulated, and 2 genes were down-regulated. The Venn diagram was shown in the Figure.1a and Figure.1b.
3.2 GO and KEGG Analysis for the DEGs in gastrointestinal cancer
For all DEGs, extracellular regions, cell surfaces, proteinaceous extracellular matrix, and collagen trimer were mostly enriched in cellular components(CC). The biological process(BP) mainly included the collagen catabolic process, extracellular matrix organization and disassembly, cell adhesion, cartilage development, and skeletal system development. As for molecular function(MF), the DEGs were related to extracellular matrix structural constituent, serine-type endopeptidase activity, metalloendopeptidase activity, protein binding, and extracellular matrix binding. (Figure.1c) Furthermore, the KEGG pathway includes focal adhesion, ECM-receptor interaction, PI3K-Akt signaling pathway, protein digestion and absorption, and transcriptional misregulation in cancer(Figure. 1d).
3.3 PPI network construction and modules analysis
The PPI network of the DEGs was constructed through the STRING online website (Figure.1e). Thus the data was imported into Cytoscape software for visualization, and the MCODE plug-in was used to further screen the hub gene. Finally, a total of 6 hub genes were identified, namely SPP1, BGN, THBS2, MMP3, COL1A1, and TIMP1(Figure. 1f). All the hub genes were up-regulated in tumor tissues.
The above results showed that the hub genes were mainly enriched in the extracellular matrix by GO and KEGG. And the protein of COL1A1 is an important component of the extracellular matrix. Moreover, we found that COL1A1 was highly-expressed in other cancers in addition to gastrointestinal cancer[5; 13; 18; 31]. Therefore, we aimed to explore COL1A1 in pan-cancer.
3.4 Gene expression analysis of COL1A1 in pan-cancer
Through the Oncomine database, we found that COL1A1 was highly expressed in a variety of tumors(Figure. 2a). Moreover, Through the TIMER2 database and GEPIA2 database, we found that COL1A1 was over-expressed mainly in breast invasive carcinoma (BRCA), cholangiocarcinoma (CHOL), colon adenocarcinoma(COAD), esophageal adenocarcinoma (ESCA), glioblastoma multiforme (GBM), head and neck squamous cell carcinoma (HNSC), renal clear cell carcinoma (KIRC), hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), pheochromocytoma and paraganglioma (PCPG), prostate adenocarcinoma (PRAD), rectum adenocarcinoma (READ), stomach adenocarcinoma (STAD), thyroid adenocarcinoma (THCA), diffuse large B lymphoma (DLBC), testicular germ cell tumors (TGCT), and thymoma (THYM)( Figure. 2b-c).
We further analyzed the expression of COL1A1 in different tumor stages. The results showed that COL1A1 expression is closely related to the late stage of ACC, BCLA, ESCA, KICH, KIRP, STAD, and THCA(Figure. 2d).
3.5 Genetic alteration analysis of COL1A1
We found COL1A1 mutation in various tumors by cBioportal. Among them, the highest mutation frequency was in melanoma(16.22%). Breast invasive carcinoma had the highest amplification frequency(6%) and sarcoma had the highest fusion frequency of COL1A1. It is noteworthy that all mesothelioma cases had copy number amplification(4.6%) and all uveal melanoma had deletion of COL1A1(1.25%) (Supplementary Figure.1a). Furthermore, the figure presented the types, sites, and case number of the COL1A1 genetic alteration. The most common alteration type was missense mutation(Supplementary Figure.1b). In addition, we analyzed the relationship between genetic changes and the prognosis of melanoma. The results showed that the alternations of COL1A1 are related to the better prognosis of melanoma with overall survival(p=0.0158), progression-free survival(p=0.0385), and disease-specific survival(p=0.0361). Although these alternations showed a tendency of better disease-free survival, there is not statistically significant(p=0.395) (Supplementary Figure.1c).
3.6 Methylation Profile of COL1A1
COL1A1 promoter methylation levels were significantly lower in primary tumors than in normal tissues in gastrointestinal cancer(Supplementary Figure.2a). The promoter methylation levels of other tumors were shown in supplementary Figures.3. Moreover, we investigated the detailed information of the correlation between COL1A1 expression level and DNA methylation, copy number and clinical data in STAD(Supplementary Figure.2b). The results showed that COL1A1 expression was significantly correlated with DNA methylation of CPG islands(p < 0.01).
3.7 The relationship between survival and gene expression of COL1A1 in pan-cancer
In order to analyze the correlation between COL1A1 gene expression and prognosis, we used GEPIA2 to analyze the OS and DFS of COL1A1 among different cancers. The results indicated that high expression of COL1A1 is related to poor overall survival of KIRP (HR=2.2, p = 0.011), LGG (HR=2, p = 0.00028), MESO (HR=2.2, p = 0.0014), SKCM (HR=1.5, p = 0.0032), and STAD (HR=1.5, p = 0.013) (Figure. 3a). Moreover, high expression of COL1A1 is related to poor disease-free survival of CESC (HR=2, p = 0.018), COAD (HR=1.6, p = 0.041), ESCA (HR=1.6, p = 0.045), KIRP (HR=3.1, p <0.001), LGG ( HR=1.4, p=0.024), and PRAD (HR=2, p=0.0011) (Figure. 3b).
3.8 Immune infiltration analysis of COL1A1 in pan-cancer
In addition, we observed a strong positive correlation between COL1A1 expression and cancer-associated fibroblasts in most tumors. Moreover, BLCA, ESCA, HNSC, COAD, READ, STAD showed a statistically positive correlation of COL1A1 expression and macrophages. However, CESC, HNSC-HPV+, and SKCM showed a negative relationship between COL1A1 expression and CD8+ T cells(Figure. 4a). Furthermore, we presented the scatterplot with purity and infiltration level of COL1A1 in certain types of tumors(Figure. 4b). And the scatterplot showed consistent results with heatmap.
The results of TISIDB showed that COL1A1 expression is positively related to TILs in most tumors, especially THCA(Figure. 4c). Wheras the association of COL1A1 expression with MHC molecule(Figure. 4d), immunoinhibitor(Figure. 4e), and immunostimulator(Figure. 4f) are diverse. LGG and THCA showed positive relation in MHC molecule. THCA showed positive relation and TGCT showed negative relation in most kinds of immunoinhibitors. And TGCT showed negative relation in most kinds of immunostimulator.
3.9 The relationship of COL1A1 expression and PD-L1, TMB/MSI in pan-cancer
We found that COL1A1 expression was positive correlative to PD-L1 in COAD, DLBC, GBM, LAML, LGG, LIHC, LUAD, PAAD, PRAD, and THCA(Figure. 5a). Moreover, through analysis of COL1A1 expression and TMB/MSI, we found that COL1A1 is closely related to TMB and MSI in a variety of tumors, and can be used as a predictor of immunotherapy. Among them, COL1A1 expression was associated with high TMB in THYM, LAML, ACC, KICH, PRAD, and LGG(Figure. 5b). What’s more, high MSI was observed in TGCT, MESO, PRAD, COAD, SARC, and CESC(Figure. 5c).
3.10 Genes co-expressed with COL1A1 in pan-cancer: pathway and drug sensitivity analysis
We obtained 50 experimentally verified proteins related to COL1A1 based on STRING(Supplementary Figure 4a). Besides, we obtained 100 genes related to COL1A1 expression from GEPIA2. Then the Venn diagram showed the overlapping genes(Supplementary Figure 4b), of which 8 genes were closely related to the expression of COL1A1, namely MMP2, SPARC, COL5A1, DCN, BMP1, BGN, COL1A2, and HTRA1. Heatmaps(Supplementary Figure 4c) and scatter plots(Supplementary Figure 4d) confirmed that these gene expressions were closely related to COL1A1 among tumors. Then, we performed pathway enrichment and drug sensitivity analysis of COL1A1 and these 8 genes in 33 tumors by GSCALite. The results showed that the pathway was mainly activated in epithelial-mesenchymal transition (EMT) (Supplementary Figure 5a), and high expression of HTRA1 was resistant to multiple drugs(Supplementary Figure 5b).