1. Explore the prognostic genes related to lymphangiogenesis in the BLCA cohort
To further identify the key genes in the lymphangiogenesis gene set, we collected the clinical information of BLCA patients, and used univariate Cox regression analysis to screen out the prognostic genes in BLCA. In addition, 43 prognostic-related genes (p < 0.005) were screened out (Fig. 1A).
2. Functional enrichment
We performed GO and KEGG enrichment analysis on 43 prognostic genes and found that these prognostic genes were significantly enriched in peptidyl-tyrosine phosphorylation, peptidyl-tyrosine modification, ERK1 and ERK2 cascade pathways in GO enrichment (Fig. 1B). In the process of KEGG enrichment, there were pathways such as Focal adhesion, Rap1 signaling pathway, and Ras signaling pathway (Fig. 1C).
3. Identification of prognosis-related genes and construction of a prognostic model
We further screened out the characteristic genes in BCa by the lasso algorithm. Finally, we obtained seventeen lymphangiogenesis genes with prognostic significance and their regression coefficients (Fig. 2A-C). We obtained the best risk score value corresponding to each sample for subsequent analysis. Better overall survival was observed in the low-risk group of both the training and test groups (Fig. 2D-E). The ROC curves of the two sets both suggested that the model had good validation performance (Fig. 2F-G).
4. Immune microenvironment
Through a thorough analysis of the correlation between risk score and tumor immune infiltration, we explored the intrinsic molecular mechanism underlying the impact of risk score on the progression of bladder cancer. We assessed the percentage of 22 different types of immune cells that had infiltrated and observed variations in immune cell composition between the low-risk and high-risk groups (Fig. 3A). The findings indicated a notable reduction in Monocytes, Dendritic cells resting, Dendritic cells activated, T cells CD8, T cells focal helper and T cells regulatory (Tregs) in the high-risk group. T cells CD4 memory resting, Macrophages M0, Macrophages M1, Macrophages M2 significantly increased (Fig. 3B). Furthermore, our investigation delved deeper into the association between the risk score and the composition of immune cells, revealing a significant positive correlation between the risk score and various immune cell types such as Macrophages M0, Macrophages M2, Neutrophils, etc., though exhibiting a substantial inverse relationship with Tregs, activated dendritic cells, and CD8 T cells (Fig. 3C). Finally, the expression differences in immune-related chemokines, immunosuppressants, immune stimulators, immune receptors and other genes between the two groups are presented (Fig. 4).
5. Clinical predictive value of the model
We utilized the GDSC database to predict the responsiveness of each tumor sample to chemotherapy, and further explored the relationship between risk score and the sensitivity of common chemotherapy drugs. The study findings indicated a strong correlation between risk score and patients' sensitivity to ATRA, ABT.888, MS.275, and Roscovitine (Fig. 5A). Furthermore, we explored the signaling pathways involved in the high-low risk correlation model to explore the underlying mechanisms by which the risk score affects tumor progression. The GSVA results indicated that the differential pathways of the two groups of patients were mainly enriched in signaling pathways such as EPITHELIAL MESENCHYMAL TRANSITION, APICAL JUNCTION, and HYPOXIA (Fig. 5B). The GSEA results indicated that the involved pathways were ECM-receptor interaction, IL-17 signaling pathway, and TNF signaling pathway (Fig. 5C). The network of molecular interactions within each specific pathway is displayed (Fig. 5D).
6. Validation of the prognostic model with external datasets
Afterwards, we analyzed the difference in survival outcomes between the two groups in the GEO database, and significant survival differences were also found (Fig. 6A-B). The ROC curve was used to assess the performance of the risk score in external datasets, thereby determining its accuracy. The area under the curve (AUC) for predicting the 5-year survival of two gene expression datasets, GSE13507 and GSE32894, was calculated to be 0.62 and 0.79, respectively (Fig. 6C-D).
7. Incidence risk and independent prognosis analysis
We established a nomogram to accurately predict the prognosis of patients diagnosed with BCa (Fig. 6E). The outcome of the logistic regression analysis indicated that the risk score value played a substantial role in the scoring procedure of the nomogram predictive model across all of our collected samples. Calibration plots demonstrated accurate clinical predictions (Fig. 6F).
8. Relationship between the risk score and clinical indicators
For an exploration of the relationship between risk score and clinical indicators, we presented the results of each clinical index grouping in the form of a box plot. After applying the rank sum test (kruskal.test), it was determined that there was a significant difference in the distribution of risk score values among groups for various clinical indicators including age, fustat, grade, Stage, N, and T stage (Fig. 7). The reverse prediction of 17 model genes was conducted through the Mircode database, and 86 miRNAs and 764 pairs of mRNA-miRNA relationships were obtained, and visualized using Cytoscape (Fig. 8A).
9. Model gene-associated transcriptional regulation analysis
We used model genes for the gene set analysis, and we discovered that these genes are regulated by several transcription factors. Therefore, cumulative recovery curves were utilized to conduct enrichment analysis on these transcription factors. The analysis results indicated that the MOTIF annotation was cisbp__M2273 (Fig. 8B). In this motif, 10 prognostic genes were found to be enriched, including ADAM17, ANGPT1, CNTN1, ECM1, FN1, IGF1, MYO5A, NES, NTRK2 and STAT2. The NES was 4.5.
10. GWAS analysis
Next, we analyzed the GWAS data of BCa to identify the causative regions of 17 model genes in BCa. The Q-Q plot revealed significant single nucleotide polymorphism (SNP) loci associated with BCa identified by GWAS data (Fig. 8C). Through the precise location of the GWAS data, the key SNP sites distributed in the enrichment area were described. The SNP pathogenic regions corresponding to these 17 model genes are displayed (Fig. 8C). The significant SNP sites corresponding to the 17 model genes are presented in the table (GWAS data.xls).