Data collection
We obtained the mRNA expression profile data, clinical information, and copy number variation (CNV) information of the BLCA and control samples from The Cancer Genome Atlas (TCGA) via the UCSC Xena website (https://xenabrowser.net/datapages/). The CNV information was downloaded using the “RTCGA” R package and the mutation data were downloaded using the “TCGAbiolinks” R package. The GSE13507 dataset was downloaded from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). The immunotherapy dataset of IMvigor210 for BLCA was obtained using the “IMvigor210CoreBiologies” package. The data of TCGA, GEO, and IMvigor210 are all publicly available. Therefore, this study did not require the approval of the local ethics committee. The research followed the data access policies and publication guidelines of TCGA, GEO, and IMvigor210.
Data preprocessing
To maintain data consistency, we first converted the IMvigor210 dataset through log2 (fpkm+1) and then used the “sva” R package to normalize the expression profiles of the three datasets.
Unsupervised cluster analysis
According to the expression data of 55 ferroptosis genes (a total of 60 ferroptosis genes, of which 55 were expressed in the three datasets), unsupervised cluster analysis was used to identify different ferroptosis expression types and further analyze the patient classification. The consensus clustering algorithm was used to determine the number of clusters and their stability. We used the “ConsensusClusterPlus” R package. The clustering method used was K-means, the distance used was Euclidean distance, and 1000 repetitions were performed to ensure the stability of the classification.
Gene Set Variation Analysis(GSVA) and single-sample Gene Set Enrichment Analysis (ssGSEA)
To study the differences in ferroptosis expression patterns in biological processes, we used the “GSVA” R package to perform GSVA analysis. GSVA analysis is a non-parametric, unsupervised method that is mainly used to estimate changes in pathways and biological process activity in samples. We downloaded the “c2.cp.kegg.v7.1.symbols.gmt” gene set from the MSigDB database (https://www.gsea-msigdb.org/gsea/index.jsp) for GSVA analysis. In order to evaluate the ratio and the difference of 28 immune cells in different ferroptosis expression subgroups (ferr.cluster), we used the ssGSEA analysis with the “GSVA” R package to obtain the infiltration ratio of 28 immune cells. Then, the samples of each ferr.cluster were compared by Wilcoxon test, and the different cells were subjected to Cox regression analysis to compare the prognostic differences.
Identify differentially expressed genes among differentferr.cluster
Based on the expression of 55 ferroptosis genes, we divided the BLCA samples in the three datasets into two categories, and used the “limma” R package to determine the differentially expressed genes between different categories. The significance standard for determining the difference gene was set as adj.P.Val<0.05 (P values were adjusted by Benjamini & Hochberg correction.), and the multiple of difference was greater than 2 times or less than 0.5 times.
Ferroptosis score (ferr.score) calculation
For the differential genes obtained in the previous step, the random forest method was used to remove redundant genes. Then survival analysis was performed on the remaining genes, and genes that were less related to survival were filtered out (P < 0.05 was considered to be related to survival). Finally, the remaining genes were used for principal component analysis (PCA), and the following formula was used to calculate the ferr.score:. The P for positive and N for negative. According to the median of the ferr.score, the samples were divided into high and low groups. The correlation between the two types of samples and the prognosis was further analyzed.
Correlation between ferr.score and other biological processes
Mariathasan et al. [19] constructed a set of genes to store genes related to certain biological processes, including immune checkpoints; antigen processing and presentation; EMT1, EMT2, EMT3, and other epithelial-mesenchymal transition (EMT) markers; DNA damage repair; mismatch repair; and nucleotide excision repair and other pathways. We used GSVA to calculate the enrichment score (ES) for these biological functions in each sample. Then, the Pearson correlation analysis was performed on the ferr.score and the ES score of these biological processes to further reveal the relationship between the ferr.score and related biological pathways.
CNV analysis
The GISTIC method was used to detect the common copy number change area in all samples based on SNP6 CopyNumber segment data. The parameters of the GISTIC method were set as follows: Q ≤ 0.05 was the significance level of the change; when determining the peak interval, the confidence level was 0.95. The analysis was performed through the corresponding MutSigCV module in the online analysis tool GenePattern (https://cloud.genepattern.org/gp/pages/index.jsf), which was developed by the Broad Research Institute.
Tumor immune dysfunction and exclusion (TIDE) and inhibiting concentration 50 (IC50)
Researchers from Harvard developed the TIDE (http://tide.dfci.harvard.edu/) tool to evaluate the clinical effects of immune checkpoint suppression therapy. A higher tumor TIDE prediction score is related to a poorer immune checkpoint suppression therapy efficacy and has a poor prognosis. The prognosis prediction of immune checkpoint suppression therapy in this study was completed by the TIDE score. We used the “pRRophetic” R package to estimate the IC50 value of drugs (Cisplatin and Gemcitabine) according to the expression profile, and then compared the IC50 differences in the ferr.score high and low group samples.
Statistical analysis
In the significance analysis between various scores, the Wilcoxon test was used to compare the differences between the two groups of samples. The Kaplan–Meier method was used to generate the survival curve for prognostic analysis, and the log-rank test was used to determine the significance of the difference. The receiver operating characteristic curve (ROC) was used to evaluate the ferr.score's prediction of immunotherapy, and the area under the curve (AUC) was quantified using the “pROC R” package. When displaying mutation maps, the “maftools” R package was used to present the mutation landscape of ferr.score patients with high and low subtypes. The “RCircos” R package was used to plot the chromosome distribution of 55 ferroptosis genes in 23 pairs of chromosomes.