Comprehensive analysis of the expression and prognosis for COL4A family genes in kidney renal clear cell carcinoma

Background COL4A family genes are a group of genes related to the extracellular matrix, which have been proven to be associated with various cancers. However, its relationship with kidney renal clear cell carcinoma (KIRC) has not been reported. Methods Hence, we obtained the data of differential expression and survival time of COL4A genes in KIRC from an online open-access database including ONCOMINE, UALCAN, GEPIA, Cancer Genome Atlas (TCGA) database, cBioPortal, Metascape, and STRING. Results We found signicant overexpression in COL4A1, COL4A2 while COL4A 3, COL4A4, COL4A5, and COL4A6 has decreased in tumor tissues. Moreover, almost all COL4A family genes obviously correlated with individual cancer stages of KIRC; and higher expression of COL4A1/2/3/BP/4 was found to accompany better overall survival time (OS) while COL4A5 with a lower OS in KIRC patients. We also found that the COL4A genes altered group had longer OS and DFS than unaltered teams.


Introduction
Cancer is the second leading cause of death in the world. More than 1762450 new carcinoma cases are reported each year and renal cell tumors account for 2% in America [1]. However, seventy-ve percent of the renal tumors were kidney renal clear cell carcinoma (KIRC), which tends to develop metastases after early radiotherapy, chemotherapy, and surgery treatment [2,3]. Relevant studies of the KIRC pathogenesis were limited. There are sometimes variable outcomes to predict individuals' risks with KIRC using TNM stage and Fuhrman grade [4,5]. Therefore, further research underlying molecular mechanisms of the progress and prognosis of KIRC is urgently needed to provide better clinical treatment for patients.
Collagen IV (COL4A) gene encoded a non brillar protein, an essential component of the basement membrane expressing in all tissues [6,7], which maintains organ and tissue structural integrity. The COL4A family gene was composed of 7 gene subtypes, COL4A1, COL4A2, COL4A3, COL4A3BP, COL4A4, COL4A5, and COL4A6, which encoding the different α chains of type IV collagen protein respectively [8].
Abnormal expression of collagen 4 (COL4) protein injured the renal parenchyma cell. COL4A mutations were identi ed as independent factors of Alport disease (AD), focal segmental glomerulosclerosis (FSGS) [9], and thin basement membrane nephropathy (TBMN) [10]. The destruction of the basement membrane provides the basis for renal in ammation and protein toxicity. New progress has been made in the study of the pathogenesis of renal cell carcinoma, which revealed the overexpression and mislocalization of nucleoprotein SPOP produced under hypoxia's physiological condition were the core factors of the recurrence and progression in renal cancer [11]. Continuous chronic in ammatory repair reactions and hypermetabolism caused by renal tubulin hyper ltration may further lead to increased demand for oxygen in tubular cells.
However, almost all the studies about COL4A focus on familial glomerular damage and hematuria, failing to know the correlation between the renal carcinoma and COL4A family gene. Therefore, the elucidation of KIRC from the perspective of COL4As family genes could provide a new insight for predicting clinical outcomes of patients with KIRC. Furthermore, we also analyzed the gene interaction network by obtaining 20 frequently altered similar genes to further explore the potential molecular pathway in KIRC.

ONCOMINE
ONCOMINE (https://www.oncomine.org) is a powerful and open access online bioinformatics database providing DNA or RNA sequences expressing analysis based on submitted tumor or normal samples [12].
In the current study, we obtained the expression data for the COL4A family gene in renal cell carcinoma from the "Expression Analysis" module of UALCAN. We set 0.05 as p-value cutoff and 1.5 as fold change to generate a p-value. UALCAN UALCAN (http://ualcan.path.uab.edu/analysis.html), a comprehensive online accessible database, providing the analysis of tumor gene expression and survival difference of prognosis based on The Cancer Genome Atlas (TCGA) and the clinical prognosis outcomes [13]. We abstracted the expression data between KIRC and normal tissues and further analyzed the survival difference in various tumor stage in patients with KIRC. Using UALCAN online analyzed resource, the cutoff of p-value was set as 0.05 in the Student's t-test. GEPIA GEPIA (http://gepia.cancer-pku.cn/index.htm), which was developed by peaking university, is a newly online analysis tool to obtain the RNA-seq expression data of more than 9,736 tumors and 8,587 normal tissue samples [14]. In our study, we analyzed differential gene expression of COL4A family among the tumor/normal tissues, different survival status, and the various pathological stage. Besides, we obtain the top 30 similar genes which closely to COL4A family genes by using a similar gene detection module.
After removing the repeated genes, 178 genes were reserved to further analysis. The cutoff of p-value was set as 0.05 in Student's t-test.
Cancer Genome Atlas (TCGA) database TCGA database (https://portal.gdc.cancer.gov/), a comprehensive tumor database containing gene data of 10897 samples and pathological ndings, was seen as a useful online analysis tool to explore better diagnosis treatment methods by analyzing the obtained cancer genomics and clinical pro les of 34 cancer types [15]. cBioPortal cBioPortal (http://www.cbioportal.org) has been used as a comprehensive online access database to explore the cancer genome data with multi-perspective [16]. Our current study obtained the genomics pro les data containing genetic mutations and putative copy-number alterations from cBioPortal based on 510 KIRC samples from the TCGA database. We set ± 1.8. Kaplan-Meier plots as the z-score threshold of mRNA Expression (RNASeq V2 RSEM) were also applied to explore the relationship among the genetic alterations in COL4A family members and the overall survival (OS), disease-free survival (DFS) of KIRC patients. P-value as 0.05 was accepted.

Metascape
Metascape (http://metascape.org) is an online utility tool that provides reliable gene list annotation and enrichment analysis [17]. The 178 genes (were similar to COL4A) obtained from similar gene detection modules of Gepia were analyzed with the module of Gene Oncology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) in Metascape database. The GO module can analyze the functional roles of genes related to COL4A family members in biological processes (BP), cellular components (CC), and molecular functions (MF). And KEGG module can prescribe the pathways of the COL4A family numbers. STRING STRING (http://string-db.org) is an available protein-protein interaction (PPI) database designed for collecting, integrating, and scoring publicly available data to explore the potential protein interaction network [18]. COL4A family members and its similar gene were used to generate PPI using STRING and visualization with software Cytoscape [19]. A plug-in model Molecular Complex Detection (MCODE) of Cytoscape was used to identify the tightly connected module. We set related parameters of MCODE as: Max depth = 100, node score cut-off = 0.2, k-score = 2, and degree cut-off = 34. We considered the hub genes as the top ten genes with the highest degree of connectivity. Obviously, the higher degree connected, the more important the hub genes-maintained network structure stability.

Statistical analysis
Related statistical analysis about the correlation between mRNA expression of COL4A family genes and individuals' survival status with KIRC was conducted by SPSS (version 23.0). We remain the meaningful parameters when patients' clinicopathological parameters and mRNA expression of COL4A has a signi cant correlation (p < 0.05) for further multivariate analysis. P-value was accepted when less than 0.05.
Correlation of mRNA expression levels of COL4A family genes and the tumor progression related clinicopathological parameters in KIRC patients.
The patients' individual clinicopathological parameters about cancer stages and tumor grades was explored with Gepia and Ulacn database after an over-expressed level of COL4A family members in KIRC tumors was observed. As we found in Fig. 4, there are remarkable correlations between mRNA expressions of COL4A3 (p < 0.001), COL4A3BP (p < 0.001), COL4A4 (p < 0.001), COL4A6 (p = 0.0434) and patient' pathological stages in Gepia database. However, based on the analysis of Ulcan (Fig. 5), we found a signi cant statistical correlation between all COL4A family genes and patients' different stages (p < 0.01 for all) except for COL4A3BP, which differential expression only observed in stage 4.
Prognostic value of COL4A family genes in KIRC Using Gepia database, the prognostic value of COL4A family genes in patients were explored based on the difference of overall survival (OS) and disease-free survival (DFS) between high expression group and low expression group (Fig. 6). The OS of KIRC patients shows a statistical difference among all COL4A family genes (p < 0.05 for all), except for COL4A6. However, most of the COL4A family genes was shown signi cant correlations with kidney cancer individuals' prognosis, including COL4A3, COL4A3BP, COL4A4, COL4A5 (p < 0.05 for all).

The correlations between genetic mutations in COL4A family numbers and OS, DFS of KIRC patients
The online database cBioPortal was used to analyze the genetic mutations of differentially expressed COL4A family members in KIRC patients. Based on Fig. 7A, the mutation rate of COL4A1, COL4A2, COL4A3, COL4A3BP, COL4A4, COL4A5, COL4A6 genes was 8%, 9%, 9%, 11%, 7%, 9%, and 8% in 512 samples. What's more, the association between genetic mutations and the prognosis of KIRC patients was explored by Kaplan-Meier module and log-rank test in cBioPortal. And a statistically signi cant correlation was found between genetic mutations of COL4A family numbers and OS (p = 0.0405), DFS (p = 0.0298) in KIRC patients.
Networks Analyses and Functional Enrichment Analysis of COL4A family genes and their Neighboring Genes in KIRC patients.
After con rmed the correlation between genetic mutations in COL4A family numbers and prognosis values, the COL4A's neighbor genes (total 118) obtained from String database was used to construct PPI network to explore the interaction among neighbor genes. And the top ten interacted genes were chosen and highlighted with red by using the plug-in MCODE of Cytoscape (Fig. 8A). As shown in Fig. 8A, the neighbor genes containing COL4A1, COL4A4, COL4A6, COL4A3, COL15A1, HSPG2, COL4A5 and NID1 were the most probably involved in a different expression of COL4A family genes in KIRC patients. Based on 118 neighbor genes, the functional and pathway enrichment analyses were performed to explore the COL4A's biological classi cation via the online tool Metascape. The COL4A family members and their neighbor genes were signi cantly involved in collagen − activated tyrosine kinase receptor signaling pathway, collagen − activated signaling pathway, glomerular basement membrane development, retina vasculature development in camera − type eye and glomerulus vasculature development in biologic processes (BP); and collagen type IV trimer, basement membrane collagen trimer, network − forming collagen trimer, collagen network, and complex of collagen trimers in in cellular components (CC); and extracellular matrix structural constituent conferring tensile strength, GABA receptor binding, myosin binding, extracellular matrix structural constituent, and growth factor binding in molecular function (MF); and collagen − activated tyrosine kinase receptor signaling pathway, Anchoring bril formation, Crosslinking of collagen brils, collagen − activated signaling pathway, and glomerular basement membrane development in KEGG pathway.

Discussion
Previous researches on COL4A family genes mostly focused on renal parenchyma changes other than the tumor. However, in the current study, the signi cant correlation between COL4A family genes and renal cell carcinoma was observed (Fig. 1). And COL4A genes play an important role in the occurrence and development of KIRC ( Fig. 2-5). It is reported that COL4A increases angiogenesis in tumor tissue through the interferon genes (STING) signaling pathway [27]. What's more, the COL4A carry mutations in multiple forms involving the immune cell in ltration, which expression levels have a signi cant nagtive correlation to the patients' survival time with cervical cell carcinoma [28]. Besides, high methylation levels of COL4A family genes are also found in colorectal cancer [29]. Since the alteration of COL4A has been con rmed in tumor tissues, more attention should be paid to COL4A and its' related genes to further explain the underlying mechanism in KIRC.
To further explore the correlation between COL4As family genes expressing and KIRC's progression and prognosis, we analyzed the distinct COL4As family members and their correlation to clinical parameters in KIRC patients. Until now, there are no reported researches about the role of COL4A family genes in the KIRC. Based on Fig. 2 and Fig. 3, we found that 6 out of 7 genes were differentially expressed in KIRC (downregulation of COL4A1 and COL4A2, upregulation of COL4A3, COL4A4, COL4A5, COL4A6).
Moreover, the expression of COL4A1, COL4A2, COL4A3, COL4A4, COL4A5, and COL4A6 was found increasing when the tumor progressed (Fig. 5). We found that the better prognosis (better overall survival time) was accompanied by low expression of COL4A5 and high expression of COL4A1, COL4A2, COL4A3 and COL4A4 (Fig. 6). As for disease-free survival (DFS), the better results were observed in the low expression of COL4A5 and high expression of COL4A3 and COL4A4 (Fig. 6). Based on those observations, the COL4A5 might act as a protective factor in the prognosis of KIRC patients, while COL4A3 and COL4A4 could be treated as a risk factor. Signi cant mutation of COL4A genes in KIRC was found in Fig. 7. The mutation of COL4A genes was con rmed as an independent indicator in the poor prognosis of renal nonneoplastic diseases, such as FSGS [9], Alport syndrome [30], and familial hematuria [31]. Moreover, the tumorigenesis and progress of renal cell carcinoma involved in gene-altered [32]. Although the laboratory information of mutation about COL4A family members in KIRC is limited, the more altered COL4A genes result in shorter OS and DFS time in patients (Fig. 7B, 7C). We found different degrees of correlation among the differentially expressed COL4A genes, indicating that these altered genes may have an adverse role in the tumorigenesis of renal tumors.
The GO enrichment analysis and KEGG pathway enrichment analysis shows that COL4A family members and its' related 118 similar genes were primarily involved in collagen − activated tyrosine kinase receptor signaling pathway and affecting extracellular matrix structural constituent (Fig. 8). Overexpression of collagen ber deposited in the stroma plays an unfavorable role in tumor prognosis of ductal carcinoma in situ (DCIS) [33]. Cumulating evidence has con rmed a close connection between collagen ber component and myxoid stroma, which has different expression in several tumors [34][35][36][37]. Those reported data may provide a promising target on potential drug therapeutic methods by regulating collagen ber expression in KIRC.
In the molecular interaction network (Fig. 8), we also found that heparan sulfate proteoglycan 2 (HSPG2) gene is closely involved in the process of renal cell carcinoma. Notably, similar to COL4A genes, the HSPG2 also widely expressed in all basement membranes including those epithelial and endothelial cells to protect the integrity of the extracellular matrix [38][39][40]. For NID1, also an extracellular matrix protein that regulated the activation of NK cells has been found excessively expression in basal cell carcinoma (BCC) [41], breast cancer [42], and lung cancer [43]. Reported studies have already indicated that NID1 was an indicator of prognosis in several tumors [42]. The downregulated NID1 could improve cancer patients' survival outcomes by deactivating the MET, which may prohibit the migration and invasion of cancer cells [44]. As shown in Fig. 8, eight extracellular matrix-associated genes COL4A1, COL4A4, COL4A6, COL4A3, COL15A1, HSPG2, COL4A5, and NID1 together closely involving in KIRC. These genes may reveal an extraordinary correlation between the extracellular matrix and renal cell carcinoma. These gene-related proteins may indicate a potential mechanism in KIRC and provide a new sight in further research.

Limitations
Some limitations need to be recognized in the current study. Firstly, Our data were obtained from the online public database, and its analysis module may have a decisive impacts on our analysis progress. So our results should be con rmed by further studies. Secondly, we did not con rm the potential diagnostic and therapeutic values of COL4A family genes in KIRC patients because there are no lab results.

Conclusion
The COL4A family genes play an important role in KIRC. Low expression of COL4A5 and high expression of COL4A3 and COL4A4 may be a positive indicator for patients with KIRC. The high mutation of COL4A foreshadowing a poor prognosis. The relationship between extracellular matrix-related genes and KIRC was worthy of further studies.

Declarations
Ethics approval and consent to participate: Not applicable.

Consent to publish:
Not applicable.

Availability of data and materials
The data that support the ndings of this study are available from the corresponding author upon reasonable request.

Competing interests
Author(s) declare(s) that there is no con ict of interest Author contribution JQ conceived the study idea. BZ collected the data. BZ, YY, and LF contributed to the analysis of the data as well as wrote the initial draft with all authors providing critical feedback and edits to subsequent revisions. All authors approved the nal draft of the manuscript. All authors are accountable for all aspects of the work in ensuring related questions accuracy or integrity. Any parts of the work are appropriately investigated and resolved. JQ is the guarantor. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.      Genetic mutations in COL4A family genes and their association with OS and DFS of KIRC patients (c ioPortal). A high mutation rate (32%.) was observed in KIRC patients. All family members were observed genetic alterations, and their mutation rates were 8%, 9%, 9%, 11%, 7%, 9%, and 8%., respectively (A). Genetic alterations in COL4A genes were associated with shorter OS (B) and DFS (C) of HCC patients.