COL5A3 is a prognostic biomarker and correlated with immune inltrates in pancreatic cancer

Pancreatic cancer is a malignant tumor of digestive system with high fatality rate, and its prognosis is very poor. Type (cid:0) collagen α3 (COL5A3) is highly expressed in a variety of tumor tissues, but its prognostic value and immune inltration in pancreatic cancer are still unclear. Therefore, we evaluated the prognostic role of COL5A3 in pancreatic cancer and its correlation with immune inltration.


Background
Pancreatic cancer is a highly malignant tumor disease of the digestive system, and its incidence is on the rise in recent years. In the United States, the number of deaths from pancreatic cancer is increasing.
Because of its poor prognosis, its 5-year survival rate is about 10%, which is increasingly becoming a common cause of cancer death [1]. According to the 2018 global cancer morbidity and mortality estimates compiled by the International Agency for Research on Cancer, there are 458918 new cases of pancreatic cancer and 432242 deaths worldwide, accounting for 2.5 per cent of newly diagnosed cancers and 4.5 per cent of all cancer deaths that year [2]. Due to the clinical characteristics of patients with pancreatic cancer, such as hidden onset, di cult early diagnosis, rapid progress, low resection rate, easy recurrence and metastasis after operation, clinical diagnosis and treatment is very challenging. With the introduction of new surgical techniques and medical methods, such as laparoscopy and neoadjuvant radiotherapy and chemotherapy, the treatment of pancreatic cancer is gradually developing. However, the improvement of therapeutic effect on patients with pancreatic cancer is very limited [3]. In recent years, immunotherapy has achieved signi cant e cacy in the treatment of a variety of malignant tumors [4][5][6][7], in which the treatment of pancreatic cancer is beginning to dawn, but immunotherapy is not suitable for all cancer patients [8]. In addition, targeted anticancer therapy for patients with pancreatic cancer has also made signi cant progress, which is expected to become the future trend of precision oncology [9]. In view of the limitations of the current treatment of pancreatic cancer and the ineffectiveness of treatment results, and little is known about the potential pathogenesis of pancreatic cancer, the detection of biomarkers of early pancreatic cancer is very important in the study of pancreatic cancer.
The collagen gene family is the main component of the extracellular matrix, accounting for 30% of the total protein mass of mammals, while type V collagen alpha3 (COL5A3) is a member of the collagen family and plays a key role in regulating ber formation in the extracellular matrix [10]. At present, the related research on COL5A3 is rarely reported in the literature. It has been reported that the expression of collagen gene COL5A3 plays an important role in bone formation [11]. Some studies have shown that COL5A3 is involved in the occurrence and development of a variety of tumors, including breast cancer [12], uveal melanoma [13], prostate cancer [14], renal cell carcinoma [15] and so on. Although it has been reported that COL5A3 is highly expressed in many tumors by high-throughput sequencing, the correlation between the expression of COL5A3 in pancreatic cancer and its prognostic value has not been reported. Therefore, in this study, we used several online databases, such as TCGA (The cancer genome atlas) and GEO (Gene expression omnibus), and used their clinical indicators and survival data to evaluate the signi cance of COL5A3 expression in patients with pancreatic cancer. In addition, we also studied the relationship between COL5A3 mRNA levels and tumor in ltrating immune cells. In summary, our study closely linked the overexpression of COL5A3 to the low survival rate of pancreatic cancer.

Results
Expression pattern of COL5A3 in pan-cancer perspective To assess the mRNA expression of COL5A3 in different cancer types, we analyzed datasets from 33 cancer types. As shown in Fig. 1A, compared with normal tissues, COL5A3 was signi cantly upregulated in 28 of all 33 cancer types. This data indicated the mRNA expression of COL5A3 was differentially expressed in multiple tumor types. COL5A3 mRNA and protein expression was upregulated in patients with pancreatic cancer In order to determine the expression level of COL5A3mRNA and protein in pancreatic cancer, we analyzed the expression data of COL5A3 from TCGA data and HPA immunohistochemistry, and further analyzed and veri ed the expression of COL5A3 using GEO database. As shown in Fig. 1B, the expression level of COL5A3 in pancreatic cancer is signi cantly higher than that in normal tissues (P < 0.001). Consistent with Fig. 1C, we can see that the expression of COL5A3 is up-regulated in the GSE16515 data set compared with normal tissues (P < 0.05). Finally, as shown in Fig. 1D, the immunohistochemical staining of HPA also showed that the expression of COL5A3 protein was up-regulated in pancreatic cancer tissues. The above results showed that the mRNA and protein expression levels of COL5A3 in pancreatic cancer tissues were up-regulated.
The relationship between COL5A3 expression and clinical pathological characteristics in pancreatic cancer patients We studied the clinicopathological features of pancreatic cancer patients with differential expression of COL5A3. As shown in Table 1, compared with the group with low expression of COL5A3, the group with high expression of COL5A3 had a signi cantly worse initial therapeutic effect (P = 0.004), and the group with high expression of COL5A3 had a more signi cant history of alcohol (P = 0.036). However, there was no statistical difference in other clinicopathological features, such as age, sex, TNM stage, pathological stage and so on. To sum up, the high expression of COL5A3 is closely related to some poor clinicopathological features, and also provides a new idea for the study of single gene molecules related to tumor drug resistance. In order to evaluate the effect of COL5A3 on the clinical prognosis of pancreatic cancer, we used K-M curve analysis to verify the prediction of clinical outcome of COL5A3. As shown in Fig. 2A-2B, the survival rate of OS in the high expression group was signi cantly lower than that in the low expression group of COL5A3 (p = 0.019), consistent with that in the DSS group, the survival rate in the high expression group was also signi cantly lower than that in the low expression group (p = 0.026). Next, we studied the clinical bene ts of COL5A3, and we used ROC curves to demonstrate its value in the differential diagnosis of pancreatic cancer. As shown in Fig. 2C, the area under the curve (AUC) is 0.843, suggesting that COL5A3 has high sensitivity and speci city in the diagnosis of pancreatic cancer. The above results show that the up-regulation of COL5A3 indicates a worse prognosis and has high diagnostic value for clinical diagnosis.

Col5a3-associated Ppi Network And Functional Enrichment
In order to construct COL5A3-associated PPI networks and functional annotations, we used STRING database, GO and KEGG analysis. As shown in Fig. 3A, the PPI network shows that some genes are closely related to COL5A3, such as ADAMTS14, ADAMTS2, ADAMTS3, BMP1, COL11A1, COL1A2, LUM, P4HA3, PCOLCE and PCOLCE2 As shown in Fig. 3B, COL5A3-related genes were involved in many biological processes (BP), cellular compositions (CC), and molecular functions (MF) The results of GO and KEGG enrichment analysis of COL5A3-related genes were mainly involved in extracellular structure organization, extracellular matrix organization, collagen bril organization, protein digestion and absorption. The GO and KEGG analysis demonstrated that COL5A3 might promote tumor proliferation, migration and invasion by affecting extracellular matrix. Analysis of the correlation between COL5A3 expression and co-expression genes in pancreatic cancer patients based on TCGA database, as shown in Fig. 4A-J.

Analysis of the relationship between the expression of COL5A3 and the level of immune cell in ltration in pancreatic carcinoma
We used TIMER database to analyze the expression of COL5A3 and tumor purity and the correlation between six kinds of in ltrating immune cells (B cells, CD8 T cells, CD4 T cells, macrophages, neutrophils and dendritic cells). As shown in Fig. 5A,The results displayed that the expression level of COL5A3 had obviously positive correlation with in ltrating levels of CD4 + T cells(r = 0.213 P = 5.40e-03), macrophage cells(r = 0.414 P = 1.82e-08) neutrophils(r = 0.399 P = 6.52e-08) dendritic cells(r = 0.375 P = 4.45e-07) in pancreatic cancer, but no association with tumor purity and CD8 + T cells. P < 0.05 was de ned as statistically signi cant. Figure 5B shows the relationship between COL5A3 and the expression of 28 kinds of tumor in ltrating lymphocytes in human cancer. As shown in Fig Pancreatic cancer is still a malignant tumor with a very high fatality rate [16]. Therefore, the discovery of new biomarkers of pancreatic cancer plays an important role in its early diagnosis, treatment and prognosis.
As there are few studies on COL5A3 in cancer, we use bioinformatics analysis to nd out its biological function in pancreatic cancer. In this study, we found that COL5A3 is highly expressed in a variety of tumor tissues, including pancreatic cancer. High expression of COL5A3 is associated with poor clinicopathological features, such as initial treatment e cacy and alcohol history. ROC curve analysis shows that COL5A3 has high predictive e ciency and can be used as a promising biomarker for early diagnosis of pancreatic cancer. According to the Kaplan-Meier curve analysis, we con rmed that the high expression of COL5A3 was associated with poor OS and DSS. Therefore, COL5A3 can be used as a potential biomarker for poor prognosis of pancreatic cancer.
In order to further study the potential biological function of COL5A3, we carried out enrichment analysis through GO function and KEGG pathway. COL5A3-related genes are mainly related to extracellular structure tissue, extracellular matrix tissue, collagen ber tissue, protein digestion and absorption. Some studies have shown that type V collagen, including COL5A3, is highly expressed in the extracellular matrix of pancreatic cancer cells and promotes the proliferation, migration and metastasis of pancreatic cancer by binding to α-2-β-1 integrin receptor. Some studies have also shown that the up-regulation of COL5A3 in pancreatic cancer is a signi cant feature of brosis and malignant tumor stroma [18]. These results suggest that COL5A3 may play a role in affecting tumor microenvironment and regulating cell proliferation, migration and metastasis.
Tumor microenvironment (TME) contains a complex cellular environment, including lymphatic endothelial cells, vascular endothelial cells, mesenchymal cells and immune cells, as well as extracellular matrix and in ammatory matrix [19,20]. The dynamic and symbiotic relationship between tumor cells and their microenvironment forms in the early stage of malignant tumor growth, and affects the occurrence and development of tumor [21,22]. Different types of tumor patients have obvious clinical effects after immunotherapy, but if all patients are given the same immunotherapy, the effect is often not ideal [23,24]. Because of the complexity of tumor immune environment, it is very important to predict and guide immunotherapy by analyzing the special immune microenvironment in the tumor. However, the relationship between the expression of COL5A3 and immune cell in ltration in pancreatic cancer has not been reported. Our results show that the expression of COL5A3 in TME is positively correlated with several tumor in ltrating immune cells (Treg, Tgd, Tcm CD4, NK, pDC and Tcm CD8) through Timer and TISIDB database. There is growing evidence that innate immune cells (macrophages, neutrophils, dendritic cells,

innate lymphoid cells, myeloid suppressor cells and natural killer cells) and acquired immune cells (T cells and B cells) can promote tumorigenesis and development in tumor
microenvironment (TME). The interaction between cancer cells and proximal immune cells eventually leads to an environment that promotes tumor growth and metastasis [25][26][27]. These studies suggest that there is a potential correlation between COL5A3 and immune in ltration of pancreatic cancer.
In this study, there are still the following shortcomings: we are based on the mining of public database samples for statistical analysis, lack of our own clinical samples for further veri cation. In addition, the detailed mechanism of the effect of COL5A3 on pancreatic cancer can be further studied by designing in vitro and in vivo experiments.

Conclusions
In summary, this study found for the rst time that the expression of COL5A3 was up-regulated in pancreatic cancer and was related to initial therapeutic e cacy and alcohol history. In addition, our study also found that COL5A3 has a high predictive effect on the diagnosis of pancreatic cancer and is related to the level of immune in ltration of pancreatic cancer. Therefore, COL5A3 may be used as a potential new prognostic biomarker to provide new ideas for further experiments and clinical trials.

Materials And Methods
TCGA datasets TCGA (The Cancer Genome Atlas) is a landmark cancer genomics project, including 33 cancer types, a total of more than 20, 000 samples collected. The transcriptional expression data and clinical information of COL5A3 we studied were downloaded from the o cial website of TCGA. In addition, we also studied the expression level of COL5A3 gene in cancer and paracancerous tissues combined with the integrated data of TCGA and GTEx database. The RNAseq data in TPM (transcripts per million reads) format were analyzed and compared after log2 transformation. Finally, in the GEO dataset (GSE16515), we compared the expression of COL5A3 in normal and tumor tissues.

RNA-Sequencing data of COL5A3 in pancreatic cancer
Download the RNA-Seq expression data of COL5A3 in pancreatic cancer from TCGA website. Because there are few normal tissue samples of pancreatic cancer in TCGA database, we integrated the normal tissue samples of pancreatic cancer in GTEx database, and nally retained the data of 171cases of adjacent normal tissue and 179cases of tumor tissue. The selected clinical samples contain COL5A3 gene expression data and related clinical information, such as patient sex, age, smoking and drinking history, tumor TNM stage and initial treatment outcome.
The human protein atlas (THPA) THPA (https://www.proteinatlas.org/) is the human protein map, which aims to provide information on the tissue and cell distribution of all 24000 human proteins. In this study, we used THPA to observe the expression of COL5A3 in normal tissues and pancreatic cancer by immunohistochemical (IHC) images.

Analysis of prognostic indexes
In order to explore the clinical value of COL5A3 gene in the prognosis of patients with pancreatic cancer, we analyzed the survival indexes such as OS (overall survival) and DSS (disease speci c survival). The RNA sequencing data and corresponding clinical information of pancreatic cancer patients were downloaded from TCGA database and visualized by Kaplan-Meier curve. The expression of COL5A3 was divided into two groups: low expression group and high expression group, and the P value was obtained by Log-rank test and Cox regression analysis.
Protein-Protein Interaction (PPI) networks and functional enrichment analysis STRING (https://www.string-db.org/) is a commonly used database for searching known protein-protein interactions and predicting protein-protein interactions. By studying protein-protein interaction networks, it is helpful to mine core regulatory genes. In this study, we obtained the top ten COL5A3 related genes through STRING database and constructed PPI network map. We use GO and KEGG enrichment analysis, mainly ggplot2 package (version 3.3.3) for visualization, and clusterPro ler package (version 3.14.3) to analyze the selected data.
Tumor immune estimation resource database (TIMER) TIMER (https://cistrome.shinyapps.io/timer/) uses RNA-Seq expression pro le data to detect the in ltration of immune cells in tumor tissues. At present, it mainly provides the in ltration of six kinds of immune cells. In this study, we used TIMER database to determine the expression of COL5A3 and tumor purity and the correlation between six kinds of immune in ltrating cells (B cells, CD4+ T cells, CD8+ T cells, neutrophils, macrophages, and dendritic cells).
Tumor immune system interaction database (TISIDB) TISIDB (http://cis.hku.hk/TISIDB) is a website for studying the interaction between tumors and the immune system. It covers a number of tumor-related databases. In this study, we determined the expression of COL5A3 and 28 kinds of tumor in ltrating lymphocytes (TILs) in different human cancers, and used TISIDB to make the relationship between the expression of COL5A3 and a variety of TILs abundance.

Statistical analysis
The statistical analysis and visualization of all our data are carried out in R (version 3.6.3), and the main R package involved: ggplot2 package (version 3.3.3) is used for visual analysis. Mann-Whitney U test was used to determine the difference between pancreatic cancer tissue and adjacent normal tissue. In order to evaluate the effect of COL5A3 expression on survival, we used Kaplan-Meier and log-rank tests for statistical analysis of survival data, mainly survival package (version 3.   The correlation analysis between the expression of COL5A3 and co-expressed genes in pancreatic cancer. (A-J)