3.1 Landscape of mutation profiles of UCEC
We downloaded the somatic mutation information of 542 UCEC patients from TCGA, in which the “Masked Somatic Mutation” data type and “VarScan2” workflow type were selected. In the TCGA database, there are four types of mutation data: Annotated Somatic Mutation, which is an annotation file of somatic mutations in the format of VCF; Raw Simple Somatic Mutation, which is the original file of somatic mutations in the format of VCF, Aggregated Somatic Mutation, which is protected Mutation annotation file in the format of MAF, Masked Somatic Mutation, which is open access annotation file in the format of MAF. Next, we input the prepared MAF files and visualizing the results of the patients' mutation data using the "maftools” package. The detailed mutation information of each UCEC patient was shown in the waterfall plot (Figure 1). The clinical baseline of all 545 UCEC patients was summarized in Table 1, of whom the mean age was 63.93±11.14. To discuss the patient's variant details in more depth, we further categorized and summarized the mutations. To sum up, we found that missense mutation was the most frequently occurring variant classification (Figure 2a), single nucleotide polymorphism (SNP) was the most common variant type (Figure 2b), and C>T accounted for the largest proportion in single-nucleotide variation (SNV, Figure 2c). In addition, the number of variants for each patient sample was calculated and displayed (Figure 2d), and variant classification levels were presented again in a box plot (Figure 2e). Subsequently, we listed the top ten frequently mutated genes, which were PTEN (64%), PIK3CA (48%), ARID1A (44%), TTN (38%), TP53 (37%), PIK3R1 (30%), KMT2D (26%), CTCF (26%), MUC16 (25%), and CTNNB1(24%, Figure 2f). What's more, our study continued to examine the consistency and exclusivity correlations among these mutated genes, with green for co-occurrence and brown for mutual exclusion. It can be observed from the figure that there were coexistence relationships across numerous mutated genes, while the mutually exclusive relationships between PTEN and TP53 and between TP53 and ARID1A were obvious (Figure 2g). Finally, the genetic cloud map was drawn to distinctly recap the mutated genes again (Figure 2h).
3.2 The relationship between TMB level and survival prognosis and tumor grades in UCEC
Tumor mutation burden (TMB) is defined as the total number of somatic gene coding errors, base substitutions, gene insertions, or deletion errors detected per million bases. The TMB level of each UCEC sample was calculated, and patients were divided into high TMB group and low TMB group with the median as the cut-off value. Then, according to the sample ID, TMB levels were merged with the patient survival information and the clinicopathological characteristics information. Comparing the survival outcomes of the two groups, it was found that patients with high-level TMB kept better overall survival (OS), and the results were statistically significant (p = 0.048, Figure 3a). Surprisingly, higher TMB was correlated with advanced pathological grades of UCEC (p = 0.002, Figure 3b). It seemed that TMB was higher in patients aged 65 and younger than in older patients, but not statistically significant (p = 0.893, Figure 3c).
3.3 Differential expression gene identification and functional enrichment analysis
Transcriptome RNA-sequencing data of HTSeq-FPKM type were downloaded from TCGA, including 552 UCEC tissues and 23 adjacent non-tumor tissue samples. Comparing the transcriptome genes of the two TMB groups, 427 differential genes were obtained. Then, we selected the 40 genes with the most significant differences to draw a gene heatmap (Figure 4a). Moreover, we performed enrichment analyses on differential genes, including gene ontology (GO) analysis, Kyoto gene and genome encyclopedia (KEGG) analysis, and gene set enrichment analysis (GSEA). It is well known that GO can be divided into three parts: Molecular Function (MF), Biological Process (BP), and Cellular Component (CC). Our study found that in the BP group, humoral immune response, lymphocyte-mediated immunity, and complement activation were frequently enriched. In the CC group, differential genes were mainly involved in immunoglobulin complex, external side of the plasma membrane, and immunoglobulin complex, circulating. In the MF group, antigen binding, immunoglobulin receptor binding, and receptor-ligand activity were the three main terms (Figure 4b). According to the above results, we discovered that the major GO terms enriched by differential genes were mainly concerned with the immune response. The top ten KEGG pathways were Neuroactive ligand-receptor interaction, Alzheimer disease, Cytokine-cytokine receptor interaction, Breast cancer, Cell adhesion molecules (CAMs), Gastric cancer, Hippo signaling pathway, Proteoglycans in cancer, MAPK signaling pathway and Wnt signaling pathway (Figure 4c). Finally, we listed four excellent results of GSEA, in which the high TMB levels were significantly enriched in pyrimidine metabolism, nucleotide excision repair, P53 signaling pathway, and fructose and mannose metabolism (Figure 4d).
3.4 Comparison of immune cells infiltration between two groups of high and low TMB levels
In 2015, Newman et al. of Stanford university school of medicine proposed a new method for analyzing single-cell types, called CIBERSORT, which is a computer algorithm that reconstructs the type and number of original cells based on the RNA content of all cell mixtures(28). Based on the CIBERSORT algorithm, our study evaluated the immune profile of each patient and compared the infiltration differences of these 22 immune cells between the high and low TMB groups. The immune infiltration profile of each sample was shown in Figure 5a, where each bar represented a patient, and different colors represented different cell components. Additionally, the violin-plot revealed that the infiltration levels of CD8 T cells, memory activated CD4 T cells, follicular helper T cells, and M1 macrophages in the high TMB group were significantly higher than those in the low TMB group (Figure 5b). Based on the above results, it was not difficult to recognize that higher TMB generally heightened the level of immune infiltration of UCEC samples and advanced the patients’ anti-tumor immune response.
3.5 Construction of a TMB-related immune genes risk model for UCEC patients
Through the intersection of immune-related genes and differential genes, 108 differential TMB-related immune genes (DTIGs) were obtained (Figure 6a). Next, we performed multivariate Cox regression analysis on these DTIGs and acquired 4 risk-related DTIGs (EDN3, FGF19, IL13RA2, TRAV21) and their coefficients, which participated in the construction of the risk model (Table 2). The risk score in the model was calculated as ∑coefficients * expression values. Here, we defined this risk score as the “TMB Risk Index” (TMBRI). The median TMBRI was 1.1734604, which was regarded as a cut-off value, and patients were divided into the high-risk group (n=261) and low-risk group (n=262). Kaplan–Meier (KM) survival analysis showed that there was a significant difference in OS between the two groups, and OS in the high-risk group was even worse (Figure 6b). The 5-year survival rate of patients in the high-risk group was nearly 20% lower than that in the low-risk group (Table 3). Furthermore, ROC analysis was performed to verify the reliability of this risk model. The area under the curve (AUC) of the ROC curve was 0.670, indicating that the TMBRI risk model had certain applicability in predicting the prognosis of patients with UECE (Figure 6c). Besides, we further evaluated the potential relationship between the mutants of these DTIGs in the risk model and immune infiltration in the microenvironment. Using the SCNA module of the TIMER database, the relationship between END3 or FGF19 or IL13RA2 mutants, and the infiltration of 6 immune cells were exhibited by the box plot (Figure 6d-f).