Prognostic analysis of tumor mutation burden combined with immune inltration of Thymic epithelial tumors

Background: Thymic epithelial tumors (TETs) are uncommon neoplasms with poor prognosis and limited effective therapeutic options. This study aims to investigate the prognosis of tumor mutation burden (TMB) and the potential association with immune inltrates in TETs. Methods: Tumor mutation burden (TMB) was calculated using Maftools package and the samples were classied into high-TMB and low-TMB groups. Differentially expressed genes (DEGs) combined with immune cell inltration and survival rate were analyzed between the low-TMB and high-TMB groups. Results: Single nucleotide polymorphism (SNP) occurred more frequently than insertion or deletion, and C>T was the most common single nucleotide variants (SNV) in TETs. The results of Kaplan–Meier curve indicated that a high TMB was associated with worse clinical outcomes of TETs. Moreover, 3 hub immune genes associated with immune inltration were signicantly associated with prognosis. Besides, the TMB-related signature (TMBRS) model based on the three hub immune genes possessed good predictive value with area under curve (AUC) 0.729, and patients with higher TMBRS scores showed worse TETs outcomes. In addition, inltration levels of native CD4 + T cell, activated memory CD4 + T cell and follicular helper T cells in low-TMB group were higher than those in high-TMB group, which were correlated positively with prognosis of TETs. Conclusion: TETs patients with low TMB have better prognosis than those with high TMB, and TMB might affect the development of TETs by regulating immune inltration.

According to the WHO (2015) classi cation criteria, thymic epithelial tumors are classi ed based on the morphology of epithelial cells and its ratio to lymphocytes. TETs are divided into 6 subtypes, namely A, AB, B1, B2, B3 (thymoma) and C (thymoma) [2]. The aggressiveness and malignancy of tumors of each subtype increase sequentially. At present, surgery is the main treatment strategy for TETs, and for metastatic or incompletely resected tumors, radiotherapy and chemotherapy are the options often used in clinics [3]. To sum it all, there is no effective treatment method for advanced and recurrent TETs [4]. To this response, nding new therapy is imperative in improving the prognosis of patients with TETs.
Nowadays, immunotherapy is an emerging research approach in cancer treatment. It has become an effective method for the treatment of advanced or metastatic cancer [5].Tumor cells can survive and grow in the process of anti-tumor immune response through immune escape mechanism. However these cells can be eliminated through immunotherapy by activating and enhancing the anti-tumor immune response.
Tumor immunotherapy includes immuno-cell therapy, immune checkpoint inhibitors, small molecule synthesis inhibitors, and tumor vaccines [6] and the likes. Speci cally, immunotherapy has been proven to have de nite effects in the treatment of various cancers such as melanoma [7] and non-small cell lung cancer [8]. Thus, immunotherapy is expected to become an alternative therapy for TETs.
Tumor mutation burden (TMB) is de ned as the total number of somatic gene coding errors, base substitutions, gene insertion or deletion errors detected per million bases. From the existing literature, it has been shown that, tumors with high TMB can express more new antigens. And in doing so, these tumors are more likely to be recognized by the immune system. Therefore, tumors with high TMB are more sensitive to immunotherapy. In recent years, programmed cell death-1 (PD-1)/ programmed death-ligand 1 (PD-L1) checkpoint inhibitors therapy has opened a new chapter in the treatment of advanced TETs [9][10][11]. However, Rajan et al. [12] reported that, in patients with relapsed thymoma, PD-L1 inhibition has anti-tumor activity but this is accompanied by a high frequency of immune-related adverse events. Therefore, nding suitable biomarkers to improve the e cacy of immunotherapy is crucial in TMB immunotherapy.

Materials And Methods
Acquisition of data THYM gene mutation data, transcriptome data, and clinical data were downloaded from the TCGA database (https://portal.gdc.cancer.gov/) [11]. The data of gene mutation was from arScan2 variant Aggregation and Masking include 3,839 mutated genes. The transcriptome data of RNA-seq are from 123 Thymoma patients. Clinical data including survival time and survival status were also obtained. Upon standardizing the data, all analyses were performed using R software.

Assessment of TMB for patients and prognostic analysis
TMB is the total number of somatic gene coding errors, base substitutions, inserts, or deletions detected in a million bases. In this study, the genetic mutations were calculated in each sample by writing prescripts. The analysis was carried out through the "Maftools" package and the TMB were divided into the high-TMB group and the low-TMB group by median. Eventually, these were combined with clinical data for Kaplan-Meier analysis. The survival differences between the two groups were compared and the p values of the log-rank test were calculated.

GO/KEGG analysis and Gene set enrichment analysis (GSEA)
Based on the median of TMB, we were able to have 60 low-TMB samples and 57 high-TMB samples. We then identi ed differentially expressed genes (DEGs) into two groups with Fold Change (FC) = 1 and False Discovery Rate (FDR) < 0.05. We obtained a total of 3,936 different genes, and then drew the heatmap of different genes with the "PheatMap" package (top 40). Based on these different genes, we conducted GO and KEGG analysis through "clusterPro ler", "org.Hs.eg.db", "enrichplot" and "ggplot2". We described the enrichment results from three aspects, including biological process (BP), cellular component (CC), and molecular function (MF). We further identi ed Genome Encyclopedia (KEGG) pathways. The results were shown with the dotplot. Finally, we used GSEA software to carry out gene set enrichment analysis of the high-and the low-TMB groups, which was downloaded at https://www.gsea-msigdb.org/gsea/index.jsp.

CIBERSORT
CIBERSORT is a newly developed silicon algorithm that uses RNA-SEQ or microarray data from a large number of samples to characterize portions of a cell subpopulation. Transcriptome pro les were normalized via "limma" package.
Based on the CIBERSORT algorithm, we obtained the estimated scores of 22 immune cell subtypes of transcriptome pro les of THYM. Moreover, we analyzed 22 immune cell in ltrates between low-and high-TMB groups.
COX analysis and risk model of hub independent risk signature The latest immune genes were downloaded from IMMPORT database (www.immport.org/shared/home), and then intersection genes were chosen with differentially expressed genes (logFC > 2 and FDR < 0.05) through the Venn diagram. Then the expression data of the intersection genes and clinical data were merged. Univariate Cox analysis was uutilized to screen prognostic genes and multivariate Cox was utilized to further identify hub independent risk characteristics to establish risk model. The TMB related signature (TMBRS) was subsequently calculated with TMBRS = (core*EXPi), where core could be derived from the multivariate Cox analysis. Receiver operating characteristic (ROC) curve was drawn to assess the predictive power of TMBRS by "survivalROC" package, and Kaplan-Meier analysis with log-rank test was obtained by"survival" package.

TIMER
We then analyzed key genes in the study by the TIMER database (https://cistrome.shinyapps.io/timer/), including prognostic genes and major mutated genes. TIMER web server is a comprehensive resource for systematical analysis of immune in ltrates across diverse cancer types [13]. The abundances of six immune in ltrates (B cells, CD4 + T cells, CD8 + T cells, Neutrophils, Macrophages, and Dendritic cells) are estimated by TIMER algorithm, including the correlation between the gene expression, mutated genes, clinical outcome or somatic cells and the abundance of immune in ltrates.

Genome-wide mutation pro ling in TETs
We acquired the mutation data of a total of 123 TET patients from the TCGA database. The mutation data were downloaded and visualized using the "maftools" package in R software. Waterfall plot showed higher frequency of gene mutations in patients with TET, such as GTF2I (33%), HRAS (8%), TTN (5%), MUC16 (3%), TP53 (3%) and so on (Fig. 1A, H). In gene mutation correlation diagrams, most genes were independent and a few genes were synergistic. HRAS and GTF2I had a high synergistic correlation (p < 0.001) (Fig. 1B). Missense mutations were the most common type of mutation in patients with TETs ( Fig. 1C), SNP occurred more frequently than insertion or deletion (Fig. 1D), and C > T was the predominant mutation type detected (Fig. 1E). In addition, the number of mutated bases per sample was shown in Fig. 1F. In Fig. 1G, the mutation types were shown in different colors in box diagram. Figure 1. Landscape of mutation pro les in TET samples. A, Mutation information of each gene in each sample was shown in the waterfall plot. The annotation of mutation types were shown at the bottle with various colors and the number of mutation burden was listed in the bar chart above the legend. B, The relationship between mutated genes C-E, Based on statistical calculations of different types of mutations, where missense mutations accounted for the majority, SNP occurred more frequently than deletion or insertion, and C > T was the most common type of SNV. F-G, Illustration of tumor mutation burden in per samples. H, The top 10 mutated genes in TET.

TMB was associated with prognosis
In order to explore the relationship between TMB and the prognosis of patients with TETs, we downloaded the prognostic information of the patients and plotted a Kaplan-Meier curve. The results indicated that a low TMB was associated with a better clinical outcome of patients with TETs (Fig. 2). Genetic changes associated with TMB and functional pathway analysis To study the DEGs associated with TMB in TET patients, we divided the patients with TET into high TMB and low TMB groups. The heatmap showed TOP 40 DEGs in two TMB groups (Fig. 3A). The GO functional analysis revealed that, these mutant genes mainly enriched in plasma membrane signaling receptor complex, cell-cell junction, cell junction assembly and actin binding (Fig. 3B). In KEGG pathway analysis, PI3K-Akt signaling pathway, cytokine-cytokine receptor interaction, human papillomavirus infection, Rap1 signaling pathway and focal adhesion were top 5 signaling enriched (Fig. 3C). In addition, the GSEA results suggested that patients in high-TMB group tend to be more associated with tumor-related signaling pathways, including focal adhesions, ErbB signaling, ECM-receptor interaction and TGF-β signaling pathway. (Fig. 3D). Identi cation and Kaplan-Meier survival analysis of hub TMB-related immune genes We screened 97 differential immune genes from high TMB and low TMB groups, and then utilized univariate cox analysis to screened 13 survival related genes among these 97 differentially expressed immune genes (Table 1). Then, the multivariate cox regression analysis was performed to further select 3 hub TMB-related immune genes that are highly associated with prognosis. Higher expression levels of HCK correlated positively with poor prognosis, while lower levels of CD1B and CD1E correlated with a worse prognosis ( Figure,4A-C). In addition, we utilized the 3 genes to establish TMB-related signature (TMBRS) model. The AUC of ROC was 0.729, indicating that the high predictive accuracy for our identi ed TMBRS model (Fig. 4D). The patients were divided into two groups according to the median value of risk score, and patients in high-risk group had poor prognosis (Fig. 4D).

Associations of 3 hub TMB-related immune genes expression with immune cells in ltration
Then multivariate Cox is utilized to further identify hub independent risk characteristics and establish a risk model. We further screened three hub survival-related genes from 13 prognostic mutation related genes to construct our prognostic model. The 3 hub TMB-related immune genes expression between the high-and the low-TMB groups were shown in volcano plot in Fig. 5A. Among these, CD1B and CD1E were lowly expressed in high-TMB group, while HCK was highly expressed. More importantly, we further assessed the underlying relationships of the expression of these hub genes with immune in ltrates in TET microenvironment.The expression of HCK was negatively correlated with the in ltration of CD8 + T cell and CD4 + T cell, while the levels of CD1B and CD1E were positively correlated with immune in ltrates. These immune in ltrates include B cell, CD8 + T cell, CD4 + T cell, macrophages and dendritic cells (Fig. 5B-D). These suggest that, low-expression CD1B, CD1E and high-expression HCK in tumors can inhibit immuno-immersion.

Tumor in ltrating immune cells (TIICs) in high-and low-TMB groups
To investigate the correlation between TIICs and TMB in TETs, we rst used CIBERSORT to calculate in ltration of 22 immune cells in the patients with TETs (Fig. 6A). Besides, the Wilcoxon rank-sum test indicated that native CD4 + T cells, plasma cells, activated memory CD4 + T cell, follicular helper T cells and regulatory T cells were higher in ltrating in low-TMB group, while native B cells, activated NK cells, resting mast cells, activated dendritic cells, M0, M1and M2 macrophages showed higher in ltrating levels in high-TMB group (Fig. 6B). B, TIICs associated with TMB. Red means high TMB and green means low TMB Association of top mutated genes in TET with immune in ltrates GTF2I (33%) and HRAS (8%) are top mutated genes, which have synergistic mutation in patients with TET (Fig. 1H). The mutated GTF2I and HRAS can inhibit the in ltration levels of B cells, CD8 + T cells, CD4 + T cell and dendritic cells (Fig. 7A, B). Moreover, mutated HRAS can also inhibit the immune in ltration of macrophage (Fig. 7B).These results suggested that mutated GTF2I and HRAS might promote tumor development by inhibiting anti-tumor immune response, resulting in a poor prognosis for patients.

Discussion
Thymic epithelial tumors are rare tumors originating from thymic epithelial cells, which are related to many autoimmune diseases. Although they are sensitive to chemotherapy, there are almost no effective treatment methods for recurrent or refractory TETs. TMB is a new type of biomarker for cancers, which has proven e cacy in a variety of cancers, such as breast cancers, lung cancers, and colorectal cancers. Li et al. [14] found that high-TMB was associated with a worse prognosis, and it could promote tumor metastasis and development in patients with chromophobe cell carcinoma (chRCC). Zhang et al. [15] also reported that, the high-TMB patients showed poorer survival results, higher pathological stages and higher tumor grade in patients with clear renal cell carcinomas when compared with the low-TMB group.
Apparently, our results showed that high-TMB patients had worse survival outcomes, which was consistent with these studies.
Another important nding of our study was that, three differentially expressed immune-related genes (CD1B, CD1E, and HCK) were identi ed in the high and low TMB groups. CD1B and CD1E are members of the CD1 antigen-presenting molecule family, which is specialized in presenting lipid antigens to T cells [16]. They are mainly expressed in B cells, bone marrow cells and monocytes. The activation of CD1 restricts T cells in vivo, leading to rapid antitumor cytotoxicity and interferon-γ production, which could prevent tumor metastasis [17]. Bagchi et al. [18]found that CD1B-autoreactive T cells can effectively identify phospholipid antigens derived from tumors and exert antitumor immunity against CD1B + T cell lymphoma. Recently, it has been reported that the low expression of CD1B was associated with poorer biochemical recurrence-free survival in prostate cancer [19]. Moreover, CD1B could be used as a biomarker for the prognosis of breast cancer [20], and the low expression of CD1B might lead to the metastatic recurrence of breast cancer [21]. In addition, Yuan et al. [22] found that, the low expression of CD1B was related to the deterioration of lung adenocarcinoma. Compared with CD1B, CD1E is less studied, except that Yang et al. [23] found that CD1E could be used as a prognostic indicator of cervical cancer. Similar results to our study was that, CD1B and CD1E could be used as prognostic indicators of TETs, and the low expression levels might be related to the poor prognosis.
HCK is an important member of the receptor tyrosine protein kinase family and is expressed in myeloid monocytes and B lymphocytes [24].
It can regulate series of cell processes, including mitosis, differentiation, survival, migration and adhesion and the likes. In addition, HCK plays an important role in the integrin signaling pathway, which is closely related to tumor proliferation, invasion and metastasis [25]. It has been reported that, excessive activation of HCK is associated with various leukemias, such as chronic myelogenous leukemia (CML) and acute lymphoblastic leukemia, as well as solid malignancies including colorectal cancer, breast cancer and gastric cancer. In gastric tumor epithelial cells, over expression of HCK can induce alternative polarization of macrophages, activation of STAT3 and promote tumor growth [26]. Roseweir et al. [27] reported that, HCK was highly expressed in patients with colorectal cancer, it could promote tumor progression, inhibit local lymphocyte in ammatory in ltration and are related to the poor prognosis of patients. This poor prognosis is suggested to be due to effects on proliferation and facilitation of an alternative M2-like macrophage polarization. Moreover, HCK participated in the development of glioblastoma (GBM) by mediating the EMT process. This might become a promising therapeutic target for GBM [28].
In our study, we found that CD1B, CD1E and HCK were closely associated with the patient's prognosis. This may be related to its suppression of anti-tumor immune response. Therefore, a prognostic model (TMBRS) was developed using the 3 hub immune genes which can be very useful for survival prediction. Patients with high TMBRS revealed worse survival outcomes when compared with those with low TMBRS.
The tumor microenvironment, especially tumor immune in ltration affects the effects of tumor immunotherapy [29]. And interestingly, this effect plays a dual role in cancer, which can inhibit tumor growth or promote tumor growth [30].So far, there have been many studies on the relationship between immune cells and tumor immunotherapy. As the main force in tumor immunity, T cells can migrate to the tumorigenesis site at a higher rate and inhibit the survival of cancer cells [31].Once T cells (mainly activated effector cells) are reduced in tumor, this may lead to poor prognosis in tumor patients, like colon cancer, lung cancer or other cancers. As another important role of immunity, B cells also play a positive role in tumor immunity [32,33]. However, it is worth noting that, the anti-tumor function of B cells may also be limited to some extent. After all, many studies have shown that B cells cooperate with T cells to participate in the anti-tumor effect [34].The in ltration of macrophages plays a "double-edged sword" role in development of tumors. M1 macrophages have the role of killing tumor cells, while M2 macrophages are shown to promote tumor growth, so the effect of macrophages in TETs is no longer discussed here.
In our results, the survival rate of the high-TMB group was signi cantly lower than that of the low-TMB group. Considering the reasons of immune in ltration, there was no doubt that the signi cant decrease of T cell in ltration (including native CD4 + T cells, activated memory CD4 + T cell and follicular helper T cells were likely to be important factors in uencing the prognosis. In the high-TMB group, high in ltration of B cells could theoretically improve anti-tumor immunity, but the fact was that, patients in the high-TMB group were still a manifestation of poor prognosis. Therefore, we speculated that, due to the lack of T cells, the anti-tumor effect of B cells was obviously limited.
Tumor mutations can increase the probability of immunogenicity of cancer [35]. GTF2I is a signal-induced transcription factor that responds to various extracellular signaling pathways, including B and T cell receptor, triggering the tyrosine residue to get phosphorylated.
Many studies have reported that GTF2I is mutated in most thymoma, and it can potentially be used as a new therapeutic target for patients with thymoma [36,37]. HRAS is a member of the RAS oncogene family. Gene mutations in the RAS family are often found in thymic cancer.
Moreover, mutant HRAS may be related to Type 1 regulatory T (Tr1) cells, which differentiate in response to signals involved in T cell receptor (TCR), expressing high levels of immunosuppressive cytokine IL-10, but not Foxp3, and can suppress in ammation and promote immune tolerance [38].Our results showed that GTF2I (33%) and HRAS (8%) had a high synergistic correlation. Although they had no differences between low TMB and high TMB groups, they may indirectly affect the process of anti-tumor immunity.

Conclusion
In conclusion, the above studies suggested that TETs patients with low TMB have better prognoses than those with high TMB, and the three hub TMB-related immune genes (CDIB, CDIE, HCK) is an effective predictive model. The authors have declared no con icts of interest.

Consent for publication
Not applicable.

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Consent for publication
Not applicable.
Ethics approval and consent to participate Not applicable.

Funding
This work was supported by Grant 31670121 and 31771277 from National Natural Science Foundation of China.

Authors' contributions
Pinglang Ruan and Dan Liu analyzed the data and drafted the manuscript. Lili Wang, Ling Qin, and Yurong Tan led the analytics. Yurong Tan edited the manuscript. All authors read and approved the nal manuscript.