Identification of prognostic m6A-related lncRNAs in BRCA patients.
The entire workflow was shown in Figure 1. Firstly, data were downloaded from UCSC Xena [14] and divided into a training set and a testing set. The construction of the model was based on the training set. For m6A-related lncRNAs, the correlation between 21 m6A genes and all 9391 lncRNAs was calculated. 931 lncRNAs related to m6A expression (|Pearson R| > 0.3 and q < 0.01) were sorted out. Then we used univariate Cox regression analysis to obtain 348 prognosis related lncRNAs. Finally, combining with other basic clinical information, we used LASSO Cox regression to identify 18 m6A-related lncRNAs for prognosis. The correlation between m6A genes and the 18 m6A-related lncRNAs in the TCGA training set was shown in Figure 2.
Construction of risk model
BRCA samples were categorized into high- and low-risk groups according to the median of the prognostic risk score of the comprehensive 18-lncRNAs model. However, we found that the performance for prognosis was not very satisfactory: P-values of the log-rank test were <0.0001 (Figure S1A) and 0.038 (Figure S1B) for the training set and the testing set, respectively. Therefore, we speculated that some other genes associated with these lncRNAs should be added to the model. Hence, we chose two basic clinical parameters (age and clinical stage) and 10 m6A-related lncRNAs which were significant for prognosis in multivariate Cox regression analysis as fixed factors, and other genes as punitive factors to build a LASSO Cox regression model. In this way, we screened out 29 other genes, including some mRNAs and some lncRNAs that were not related to m6A. Finally, age, clinical stage, 10 m6A-related lncRNAs and 29 genes were used as OS-related factors. The forest plot of multivariate Cox regression of these total 39 genes and clinical factors were shown in Figure 3.
Validation of the risk model of 10 m6A-related lncRNAs, 29 genes and clinical factors.
Samples were divided into high- and low-risk groups again based on the median of the prognostic risk score of the final comprehensive 39 genes model. The distribution of risk scores of the high-risk and low-risk groups was depicted in Figure 4A. The survival status and survival time of patients were shown in Figure 4B. The relative expression Z-score of the 10 m6A-related lncRNAs and 29 associated genes for each patient was shown in Figure 4C. The Kaplan-Meier analyses demonstrated that the overall survival (OS) of the low-risk group was significantly longer than that of the high-risk group (p < 0.0001) (Figure 4D).
Since this model showed good performance in the training set, we further applied it to the testing set and the entire set calculating the risk score for each patient in these two sets. The distribution of risk score, the survival status, relative expression Z-score, and survival time of the testing set and entire set were shown in Figure 5A-D and Figure 5E-H, respectively. As expected, there was a significant difference in survival analysis between patients in the high-risk group and low-risk group in the test set (Figure 5D, p =0.00015) and the entire set (Figure 5H, p < 0.0001). We also compared the disease-free interval (DFI), disease-specific survival (DSS) and progression-free interval (PFI) between the high-risk and low-risk group of BRCA patients, and found that the difference was significant in DSS (Figure S2B, p<0.0044), but not in DFI (Figure S2A, p=0.44) or PFI (Figure S2C, p=0.036).
Because of the addition of clinical factors to our model, we were concerned that patients in the same clinical stage could not be classified by our model. Therefore, we conducted Kaplan-Meier analyses by stratifying patients according to age, clinical stage, histological type (Figure 6 A-B, Figure 6 E-F, and Figure 6 I-J). In addition, we also conducted Kaplan-Meier analyses under the common prognostic biomarker classification of BRCA, such as PAM50 [23], progesterone (PR) status, estrogen (ER) status and human epidermal growth factor receptor 2 (HER2) status, we intended to observe whether the risk score of the comprehensive model could subdivide the patients under these classifications. Our model was able to distinguish the patients in further among all subtypes of PAM50 (Figure 6 C-D, Figure 6 G-H, Figure 6 K), HER2-negative (P<0.0001), HER2-positive (0=0.023), ER-negative (p=0.003), ER-positive (P<0.0001), PR-negative (p<0.0001) as well as PR-positive patients (P<0.0001) (Figure S3). These results indicate that the risk score of the 39-genes model was capable of predicting the prognosis of BRCA patients grouped by conventional biomarkers.
Identification of highly different components of the immune microenvironment in high- and low-risk groups
The enrichment of several immune cells, pathways, or functions in BRCA was further analyzed based on the comprehensive 39-genes model. The cell enrichment scores for immune cells, such as T helper cells 17 (Th17), effector memory T cells (Tem), dendritic cells (DC), monocytes, macrophages, and neutrophils, of the high risk group were higher than that in the low risk group (Figure7 A). On the contrary, some other immune cells had higher enrich scores in the low-risk group, such as T helper cells 1 (Th1), T helper cells 2 (Th2), follicular helper T cells (Tfh), central memory T cells (Tcm), natural killer T cells (NK), gamma delta T cells (Tgd), CD4 T cells and CD8 T cells (Figure 7 B). These data suggested that the environment of T-cell infiltration could be beneficial for the BRCA patients to have a better prognosis.
Next, we used gene ontology (GO) enrichment of differentially expressed genes between low-risk and high-risk group to study the pathways that were enriched. The results showed that positive regulation of inflammatory response and lipid droplet were enriched in the high-risk group (Figure 7C) while positive regulation of leukocyte chemotaxis and lymphocyte migration were enriched in the low-risk group (Figure 7D).
Based on the TCGA somatic mutation data, the tumor mutation burden (TMB) of each patient was calculated, which was significantly higher in the high-risk group than that in the low-risk group (p =5.5e−05, Figure 7E). However, exclusion score (p=0.2, Figure 7F), dysfunction score (p=0.087, Figure 7G) and TIDE (Tumor Immune Dysfunction and Exclusion) value (p=0.12, Figure 7H), which reflected tumor evasion, were not significantly different between the two groups of patients.
Evaluation of clinical features of BRCA and the comprehensive prognostic risk model of 39 genes
In order to evaluate the performance of the model, we compared the comprehensive 39-genes risk model with other characteristics, including age, clinical stage, histological type, PR status, ER status, HER2 status, PAM50 subtypes, as well as the model based solely on the 18 m6A-related lncRNAs. The Area Under Curve (AUC) of the 39-genes risk model was higher than the AUCs of other models, showing that the comprehensive prognostic risk model of the 39 genes for BRCA was more reliable (Figure 8A). A nomogram containing risk scores and characteristics was used to predict the OS incidences of OS in 1, 3, and 5 years. The risk score of the comprehensive 39-genes model showed predominant predictive ability in the nomogram (Figure 8B) compared with clinicopathological characteristics, PAM50 classification, and the model based only on m6A-related lncRNAs.
Interaction between m6A related lncRNAs and other associated genes
Our model was initially established based on m6A-related lncRNAs, but it did not achieve the desired effect. The model had good performance only when combined with 29 other genes, indicating that these m6A-related lncRNAs may interact with these genes, both these m6A-related lncRNAs and 29 other genes contributed to the survival of patients.
We referred to the method of finding the immune dysfunction genes in the TIDE [20], the m6A-related lncRNAs and other genes in the model were put into the Cox regression model, and the m6A related lncRNAs-genes pairs were screened with significant interaction coefficients (see Methods). The impact of lncRNA expression was then evaluated on the survival of patients on different paired-gene expression levels. As expected, m6A-related lncRNAs affected the role of some genes, which in turn influenced the survival of patients. For example, MAP2K6 was affected by AC121247.1. DNAJB5 was affected by AC121247.1, AC243960.3, and AL078581.2 (Figure9).