3.1 Identification of expression level of BM genes in ccRCC tissues
BM genes with differential expression levels between normal and ccRCC tissues were screened. The volcano plot showed a significant difference between the two groups. Among them, 151 (68%) BM genes showed statistically differences with a threshold of p < 0.05 and |logFC|>1.5 (Fig. 1A).
3.2 The mechanism and function of BM genes
GO and KEGG analyses were performed to explore the function and signaling pathway of BM genes. GO enrichment analysis revealed that BM genes were involved in extracellular matrix organization, extracellular structure organization, external encapsulating structure organization, cell-substrate adhesion, cell-matrix adhesion, integrin-mediated signaling pathway, gastrulation, formation of primary germ layer, endodermal cell differentiation, and endoderm formation (Fig. 1B). KEGG analysis showed that BM genes participated in ECM-receptor interaction, PI3K-Akt signaling pathway, focal adhesion, human papillomavirus infection, small cell lung cancer, protein digestion and absorption, hypertrophic cardiomyopathy, dilated cardiomyopathy, arrhythmogenic right ventricular cardiomyopathy, and amoebiasis, et al (Fig. 1C). Moreover, we investigated the network of BM genes in STRING database and revealed the relationship between BM genes from the protein-protein interaction (PPI) network (Fig. 1D).
3.3 Construction of BM genes-related prognostic signature
Univariate Cox regression analysis of the training cohort showed that 103 BM genes were associated with the prognosis of ccRCC. Subsequently, LASSO and multivariate Cox regression analysis were performed to identify four BM genes (ADAMTS14, COL7A1, HSPG2, and TIMP3) (Fig. 2A-C). A BM genes-based signature was established according to the following risk score formula:
$$Riskscore =0.38\times ADAMTS14+0.33\times COL7A1-0.28\times HSPG2-0.30\times TIMP3$$
3.4 Assessment of BM genes-related prognostic signature
ADAMTS14 and COL7A1 were risk factors for ccRCC, while HSPG2 and TIMP3 were protective factors. The formula was applied to calculate their risk scores, which showed a median risk score of 0.854. Accordingly, the training cohort was divided into high-risk group (n = 133) and low-risk group (n = 133) according to the median risk score (Fig. 3A). With the increase of risk score, the survival time was gradually shortened, and the mortality rate was gradually increased (Fig. 3B). Finally, the expression levels of OS-related BM genes were presented in the form of heat map (Fig. 3C). Log-rank test showed that there was a significant difference between the two groups (Fig. 3D). The time-dependent ROC curve showed that OS-related BM genes had a predictive effect on prognosis, and the area under the curve (AUC) was 0.79 (Fig. 3E). Subsequently, we validated the prognostic BM genes signature in the validation cohort and the entire cohort. The results showed significant differences between high- and low-risk groups in both the validation cohort and the entire cohort. Moreover, the AUC of the validation cohort and the entire cohort were 0.68 and 0.74, respectively, indicating that BM genes’ prognostic signature were robust (Fig. 3F-O).
3.5 Clinical values of BM genes related prognostic signature
Univariate Cox regression showed that the risk score was a risk factor affecting the prognosis of ccRCC patients (HR:1.128 ~ 1.216, p < 0.001), and a high-risk score predicted a shorter OS time. Furthermore, age (HR:1.018 ~ 1.045, p < 0.001), stage (HR:1.635 ~ 2.129, p < 0.001), and grade (HR:1.877 ~ 2.829, p < 0.001) were also closely related to the prognosis of ccRCC patients (Fig. 4A). Although clinical characteristics were controlled, risk score remained an independent risk factor (HR:1.061 ~ 1.155, p < 0.001) (Fig. 4B). Moreover, ROC analysis showed that the prognostic value of risk score (0.739) was higher than that of age (0.654), gender (0.505), and grade (0.720), but slightly lower than that of stage (0.801) (Fig. 4C). In addition, the time-dependent ROC curve showed that AUC of the entire cohort at 1-, 3-, and 5-year were 0.739, 0.713, and 0.728, respectively, showing high robustness (Fig. 4D). Similarly, when the entire cohort was subdivided according to age, gender, stage, and grade, all the high-risk groups had lower OS than the low-risk groups (Fig. 4E-L). In all, these data illustrated that BM genes-related prognostic signature was strongly associated with the prognosis of ccRCC patients.
3.6 Construction and assessment of nomogram, as well as enrichment analysis of the model-related different expression genes (DEGs)
Nomogram is a common prognostic visualization tool in oncology that allows quantification of patient survival based on an inclusive index score. According to multivariate Cox regression, age, stage, and risk score were used as indicators for the construction of nomogram. The total scores were obtained by age, stage, risk score, and individual indicator score that predicted 1-, 3-, and 5-year survival probabilities (Fig. 5A). In addition, the calibration curves showed consistent predicted survival times at 1-, 3-, and 5-year compared to the reference line (Fig. 5B).
We further screened DEGs between the high- and low-risk groups. The screening threshold was |log FC|>1 and p < 0.05. As a result, a gene set consisting of 615 genes was obtained. Based on this gene set, GO and KEGG analyses on DEGs were applied to reveal the functions and signaling pathways. The results of GO analysis showed that the model-related DEGs were involved in humoral immune response, complement activation (classical pathway), regulation of complement activation, complement activation, humoral immune response mediated by circulating immunoglobulin, regulation of humoral immune response, enzyme inhibitor activity, immunoglobulin complex, blood microparticle, collagen-containing extracellular matrix, external side of plasma membrane, apical plasma membrane, apical part of cell, antigen binding, glycosaminoglycan binding, endopeptidase inhibitor activity, and protein-lipid complex binding (Fig. 5C). KEGG analysis demonstrated that DEGs were involved in signaling pathways including complement and coagulation cascades, cholesterol metabolism, fat digestion and absorption, mineral absorption, rheumatoid arthritis, and PPAR signaling pathway (Fig. 5D).
3.7 Immune cell infiltration in ccRCC correlates with risk model of BM genes
From the above enrichment results, we noted that BM gene model-related DEGs were involved in multiple immune processes. So we further analyzed the immune characteristics of high- and low-risk groups. There were significant differences between high- and low-risk groups in the expression level of B cells naïve, B cells memory, plasma cells, T cells CD4 memory resting, T cells CD4 memory activated, T cells follicular helper, T cells regulatory (Tregs), T cells gamma delta, NK cells resting, monocytes, macrophages M0, macrophages M1, dendritic cells resting, and mast cells resting, which were completely consistent with our expected (Fig. 6A). Correlation analysis showed that risk score was significantly correlated with the level of the stromal score, macrophages M1, NK cells resting, T cells CD4 memory resting, B cells naïve, neutrophils, macrophages M0, T cells regulatory, T cells CD8, mast cells resting, monocytes, T cells follicular helper, plasma cells, macrophages M2, NK cells activated, T cells CD4 memory activated, and B cells memory (Fig. 6B).
3.8 Immune function associated with the risk model of BM genes
According to the expression characteristics of the low- and high-risk group, there were significant differences in type II IFN response, T cell co-inhibition, T cell co-stimulation, and inflammation-promoting (Fig. 6C).
3.9 Immune escape and immunotherapy prediction
Jiang et al. [21] designed a computational architecture to calculate the possibility of tumor immune escape, called tumor immune dysfunction and exclusion (TIDE). Researchers can calculate TIDE scores on the website: http://tide.dfci.harvard.edu/ with cancer expression feature data. The higher the scores are, the greater the likelihood of cancer immune escapes. The TIDE scores of the low- and high-risk group were calculated, and the result revealed that the high-risk group had a higher TIDE score, suggesting that the high-risk group had a higher possibility of immune escape and a poorer response to immunotherapy (Fig. 6D).
3.10 Analysis of the four genes associated with a pan-cancer risk model
We examined the expression differences of ADAMTS14, COL7A1, HSPG2, and TIMP3 in 43 cancers compared with the corresponding normal tissues and found that they were differentially expressed in most cancers (Fig. 7A-D). We then analyzed the association between the expression of these four genes in a variety of cancers and the expression of immune regulatory genes and immune checkpoint genes. The results showed that these four genes were associated with the expression of immune regulatory genes and immune checkpoint genes in most of 43 cancers, suggesting that these four genes affected cancer progression by interfering with immune regulation and immune checkpoint (Fig. 7E-L).