Tumor Mutation Burden: A Predictive Biomarker for Gastric Cancer Immunotherapy

Background: Few studies have focused on the underlying relationship between the prognosis of tumor mutation burden (TMB) and immune cell inltration in gastric cancer (GC). This study aims to explore the relationship among TMB and various components in tumor microenvironment (TME). Methods: The transcription proles and somatic mutation data of 375 tumor and 32 normal samples were obtained from TCGA. The specic mutation information was summarized and visualized with waterfall chart, then number of TMB per million bases of each GC sample was calculated. Immune/stromal scores and tumor purity were calculated by the ‘ESTIMATE’ package, and the fractions of 22 immune cells in each sample were evaluated by CIBERSORT algorithm. Finally, Lass regression analysis was utilized to generate a prognostic scoring signature with TCGA cohort as the training set, while GES84437 cohort as the validation set. Results: Higher TMB indicated favorable overall survival (OS, P = 0.043) (cid:0) better disease specic survival (P = 0.029), and longer progression free interval (P = 0.004). TMB was positively correlated with MSI and tumor purity, while negatively associated with immune/stromal scores. Moreover, TMB high group has lower T cells CD4 memory resting (P < 0.001) and T cells regulatory (P < 0.001), and more T cells CD4 memory activated (P < 0.001) and T cells follicular helper (P = 0.009). More importanly, the inltration of dendritic cells activated predicted a worse OS, while T cells CD4 memory activated and T cells follicular helper meant a better OS. Finally, a nomogram combined TMB-related signature with clinicopathologic variables can successfully predict the OS with high accuracy and eciency. Conclusion: TMB can effectively reveal the immune inltration status in TME of GC, and might serve as a prognostic classier for individualized treatment of clinical decision-making.


Introduction
Gastric cancer (GC) is one of the most common gastrointestinal malignancies worldwide with high incidence and mortality. According to the latest International Cancer Research Agency data, there are approximately 930,000 new cases and 700,000 deaths each year, the incidence and mortality of GC rank fourth and second among all malignant tumors, respectively [1,2]. The treatment methods for cancer are surgical treatment, radiotherapy, chemotherapy, targeted therapy and immunotherapy. Among them, immunotherapy mainly includes immune checkpoint blocking therapy [3,4], cytokine therapy [5], chimeric antigen receptor T-cell immunotherapy [6] and cancer vaccine [7]. Currently, the most widely studied and applied immune checkpoints are cytotoxic T lymphocyte-associated antigen-4, programmed cell death protein-1 and programmed cell death 1 ligand-1. The immune checkpoint inhibitors approved by the US Food and Drug Administration for cancer treatment are ipilimumab, pembrolizumab, nivolumab and atezolizumab. Although the existing immune checkpoint inhibitors have achieved signi cant results in the immunotherapy of GC patients, there are still a large number of patients who are not sensitive to the existing immune checkpoint inhibitors. Therefore, it is urgent to clarify the mechanism of patients' differential response to immunotherapy. For instance, carefully exploring the patient's immune in ltration density in tumor microenvironment (TME), identifying the effect of individual immune cells, and looking for novel immune checkpoints. Additionally, it is necessary to nd biomarkers that can predict immunotherapy response and establish risk models for accurate assessment of patients' long-term overall survival (OS).
In essence, cancer is an abnormal cell proliferation disease caused by somatic gene mutations. These mutations mainly occur during DNA damage repair, DNA replication, cell division, and nucleic acid metabolism. Meanwhile, the number of somatic gene mutations will be further increased under the effect of external physical or chemical mutagenic factors. Therefore, the range, type and frequency of gene mutations (namely tumor mutation burden, TMB) can vary greatly due to differences in tumor types, living environment and genetic characteristics. TMB can cause changes in protein sequence. These abnormally expressed proteins can serve as neoantigens to bind to the type I or type II major histocompatibility complex, then they can be recognized by the immune system when presented to the cell surface, thereby activating T lymphocytes to produce immune response. Therefore, tumor immunogenicity is closely related to TMB. The higher the TMB, the higher the tumor immunogenicity.
Correspondingly, the components in TME will also show individual differences, especially the density of immune cells in ltration.
In recent years, TMB has been shown to be a potential marker for predicting the e cacy of various cancer immunotherapy [8][9][10]. However, few studies have focused on the underlying relationship between the prognosis of TMB and immune cell in ltration in GC. This study aims to explore the relationship among TMB and components in TME.

Materials And Methods
Acquisition and processing of publicly available data First, the transcription pro les, somatic mutation data and clinicopathologic information (including the survival time, survival status, age, sex, tumor grade and TNM stage) of GC patients were accessed from TCGA (http://cancergenome.nih.gov/) and GEO (GSE84437, https://www.ncbi.nlm.nih.gov/geo/) databases (as of December 2019), while the immune-related genes were accessed from the IMMPORT portal (https://www.immport.org/). Then, a total of 32 normal and 808 (375 from TCGA and 433 from GSE84437) tumor samples were enrolled in the study after processing the original data via Perl software.
Moreover, the Perl software was utilized to transform gene ID of the transcription pro le into gene symbol and match it with survival information. And somatic mutation data was visualized through 'maftools' package in collaboration with annotation le which we have prepared.
Calculation and evaluation of TMB in GC First, the TMB of each sample was calculated by Perl software with the guidance of annotation le, and matched with survival information via 'limma' package. Meanwhile, the Kaplan-Meier survival analysis with log-rank test completed by 'survival' and 'survminer' packages was used to evaluate the prognostic signi cance of TMB in GC. Based on Kaplan-Meier analysis, the patients were divided into TMB low and TMB high groups. Subsequently, we explored the relationship between TMB and clinicopathologic variables through chi-square test, conducted by 'limma' and 'ggpubar' packages.

Integrated analysis of TMB, MSI, TME and immune cells in ltration
Initially, we summarized the previous research and accessed the MSI of each sample from TCGA. Additionally, the study calculated the immune score, stromal score, and tumor purity of each sample in TME through 'ESTIMATE' package. And the content of 22 immune cells in each sample was identi ed by CIBERSORT software, run in R software and assisted by three packages ('e107', 'preprocessCore' and 'limma'). Then, the correlation analysis between TMB and MSI, TMB and immune score, TMB and stromal score, TMB and tumor purity, and TMB and immune cells was completed through 'ggplot2', 'ggpubar' and 'ggExtra' packages. Meanwhile, the 'limma' package was used to investigate the differences in MSI (immune score or stromal score or tumor purity or immune cells) between TMB low and TMB high groups. Moreover, Kaplan-Meier analysis was performed on immune cells. Subsequently, we explored the relationship between MSI (immune score or stromal score or tumor purity) and clinicopathologic variables through chi-square test, conducted by 'limma' and 'ggpubar' packages.

Immune checkpoints
The 38 validated immune checkpoints were summarized through extensive reading of the literature (Table 1) [11][12][13][14][15][16][17][18][19]. Subsequently, the differential expression of these immune checkpoints in the TMB low and TMB high groups was analyzed by the 'ggpubar' package and visualized with box plots. Differential analysis and pathway enrichment analysis across TMB groups First, Perl software was applied to separate the samples in the TMB low and TMB high groups. Then, under the criteria of |log 2 (FC)| > 1 and FDR < 0.05, the 'limma' package was utilized to identify the differentially expressed genes (DEGs) with Wilcoxon-test in the two groups, and the 'pheatmap' package was used for visualization. Subsequently, the DEGs symbol was converted to gene ID with 'org.Hs.eg.db' package, then Gene Ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis were completed with packages ('clusterPro fer', 'org.Hs.eg.db', 'enrichplot' and 'ggplot2').
Additionally, the TMB data was used for the phenotype prepared by Perl software, the GSEA software was downloaded from the home page (http:// software.broadinstitute.org/gsea/index.jsp), and the 'c2.cp.kegg.v6.2.symbols.gmt gene sets' accessed from MSigDB database (http://software.broadinstitute.org/gsea/msigdb/), which were utilized for KEGG enrichment analysis across two TMB groups. Finally, multiple GSEA was performed via 'ggplot2' package to merge the enrichment pathways with P < 0.05.

Establishment and validation of TMB-based signature
After log2-scale transformation of DEGs expression, we obtained 261 prognosis-related genes via univariate analysis, which were intersected with the genes in GSE84437, then enrolled the genes into Lass regression analysis to generate a prognostic scoring signature that could divide patients into high-risk and low-risk groups based on the mean risk score (RS) value. TCGA cohort was the training set, while GES84437 cohort was the validation set. The RS was calculated as the sum of the products of gene expression levels and coe cients, via the following formula: where 'i' and 'k' represent the 'i'th gene and the number of genes, respectively. Finally, Kaplan-Meier analysis with log-rank test are used to evaluate the accuracy and predictive e ciency of signature.

Construction and validation of nomogram
Initially, the prognostic variables in TCGA cohort (training set) were determined by univariate and multivariate analysis. Subsequently, a nomogram was generated through 'rms' package to predict 3-year and 5-year OS of GC, combining the clinical attributes and RS. The e ciency and accuracy of nomogram were separately evaluated through receiver operating characteristic (ROC) curves and calibration curves.
Subsequently, we constructed a nomogram with the same variables in the GSE84437 cohort (validation set) and evaluated it with ROC curves and calibration curves to verify the stability of the nomogram.

Genome-wide mutation analysis
The mutation data of 433 samples was obtained from the TCGA database and visualized through 'maftools' package. In summary, the variant classi cations were performed, where missense mutation was the most common category (Fig 1A). Among variant type, the proportion of single nucleotide polymorphism was much greater than deletion and insertion (Fig 1B). And the frequency of C > T was much higher than other single nucleotide variants ( Fig 1C). Additionally, the number of mutated bases was calculated in each sample (Fig 1D). The box plot was applied to show variant classi cations ( Fig   1E). And the histogram exhibited the top ten mutated genes, and the sequence of variants from high to low was TTN (48%), TP53 (44%), MUC16 (31%), ARID1A (25%), LRP1B (24%), SYNE1 (22%), FLG (19%), FAT4 (19%), CSMD3 (18%), and PCLO (17%) (Fig 1F). Moreover, the waterfall chart showed the detailed variant categories with different colors of mutated genes in 89.38% of samples ( Fig 1G). Cloud map ( Fig  1H) visualized the mutated genes with various colors, where the larger the font, the higher the frequency of gene variant. And the variant relationship between mutated genes was shown in Fig 1I, where brown indicated that two genes are mutually exclusive variant, while green represented common variant.

Evaluation of TMB in GC
After calculating the number of TMB per million bases in each GC sample, TMB was divided into TMB low and TMB high groups based on the median value. Kaplan-Meier analysis showed that TMB high indicated better OS ( Additionally, TMB was closely related to the patient's clinicopathologic variables (age, sex, stage, T stage and N stage), among which TMB was higher in elderly and female patients, and the higher the TMB, the earlier the TNM stage, T stage and N stage ( Fig   2D).

Joint analysis of TMB and MSI
We acquired the MSI of each sample in TCGA cohort from the previous research. Correlation analysis showed that TMB was positively correlated with MSI (Fig 3A), and MSI exhibited a signi cant difference between TMB low and TMB high groups, that was, MSI in TMB high group was higher than TMB low group ( Fig   3B). Additionally, based on the relationship among MSI and clinicopathologic variables (Fig 3C), MSI was closely related to age, sex and N stage, which was manifested as higher MSI in elderly female patients with earlier N stage.

Comprehensive analysis of TMB and TME
After calculating the immune score, stromal score and tumor purity in TME, correlation analysis demonstrated that TMB was negatively associated with immune/stromal scores, while positively correlated with tumor purity (Fig 4A). Subsequently, the t-test proved that the TMB low group had higher immune/stromal scores, while lower tumor purity (Fig 4B). Additionally, chi-square test was utilized to explore the relationship between TME and clinicopathologic variables (Fig 4C-4E), we observed that lowgrade patients had lower immune/stromal score and higher tumor purity than high-grade patients; patients with earlier T stage have lower stromal score and higher tumor purity; and patients with earlier TNM stage have higher tumor purity.

Integrated analysis of TMB and immune cells in ltration
The content of 22 immune cells in each sample was calculated based on the CIBERSORT algorithm, where different colors indicated cell types (Fig 5A), then the correlation analysis of TMB and immune cells was performed (Fig 5B). TMB was negatively correlated with B cells naive, mast cells resting, monocytes and T cells regulatory, while positively associated with T cells CD4 memory activated, T cells follicular helper, macrophage M0 and macrophage M1. Besides, the difference analysis (Fig 6A) demonstrated that the TMB low group has more T cells CD4 memory resting, T cells regulatory, monocytes, dendritic cells resting, mast cells resting and NK cells activated; while the TMB high group has more T cells CD4 memory activated, T cells follicular helper, mast cells activated, neutrophils, macrophages M0, macrophages M1 and NK cells resting. Furthermore, Kaplan-Meier curves (Fig 6B) revealed that the in ltration of dendritic cells activated and macrophage M2 predicted a worse OS, while NK cells activated and T cells CD8 meant a better OS. Additionally, the in ltration of T cells CD4 memory activated and T cells follicular helper also indicated the trend of better OS, but no statistical signi cance was seen, which may be the reason for the insu cient sample size.

Functional analysis of DEGs
A total of 816 DEGs were identi ed and the heatmap showed the rst 40 DEGs with signi cantly differentially expression across two TMB groups, where most DEGs were signi cantly up-regulated in the TMB low group (Fig 8A). According to GO annotation, biological process (BP) was mainly enriched in regulation of membrane potential, calcium ion homeostasis and regulation of blood circulation; cellular component (CC) was involved in collagen-containing extracellular matrix, contractile ber and myo bril; while molecular function (MF) was participated in receptor ligand activity, actin binding and glycosaminoglycan binding (Fig 8B), etc. Moreover, KEGG enrichment analysis revealed that DEGs were closely related to cAMP signaling pathway, calcium signaling pathway and ECM-receptor interaction, etc ( Fig 8C). Additionally, GSEA was completed with TMB as phenotype, the active pathways in TMB low group included calcium signaling pathway, cell adhesion molecules cams and ECM-receptor interaction ( Fig 8D); while cell cycle, DNA replication and P53 signaling pathway activated in TMB high group (Fig 8E).

Construction, evaluation and validation of TMB-based signature and nomogram
After univariate analysis of DEGs in the training set (TCGA cohort), 261 prognostic genes were obtained and intersected with the validation set (GSE84437) genes to construct a TMB-based prognostic scoring signature via Lasso regression analysis. Finally, a three-gene signature was identi ed for OS in GC, the RS could be calculated based on the expression levels of three genes and the relative coe cients. RS = (0.0816923878082827 * expression of SCG5) + (0.0398504629411058 * expression of SERPINA5) + (0.00259059404152605 * expression of SPARCL1). Additionally, whether in the training set (Fig 9A, P =   9.508e-03) or the validation set (Fig 9B, P = 4.273e-03), the Kaplan-Meier curves revealed that the OS in the low-risk group was better than that in the high-risk group.
In the training set, univariate analysis showed that age, TNM stage, N stage and RS were closely related to OS (Fig 9C). After statistical adjustment of above variables by multivariate analysis, age and RS were independent prognostic indicators of OS (Fig 9D). Subsequently, three clinical attributes (namely age, TNM stage and N stage) and RS were enrolled into the construction of nomogram for predicting 3-and 5year OS of GC patients (Fig 9E). The area under the curve of ROC curves for predicting 3-and 5-year OS was all 0.705 (Fig 9F). Additionally, the calibration curves showed that the prediction ability of nomogram for the 3-year and 5-year OS had no signi cant deviation from the actual reference line (Fig 9G). Moreover, the same variables were utilized to construct a nomogram again in the validation set, the ROC curves also showed a higher predictive e ciency (Fig 9H), and there was no deviation based on the calibration curves (Fig 9I).

Discussion
The study systematically and comprehensively analyzed the prognostic value of TMB in GC and its relationship with the components (especially the in ltration of immune cells) in TME by obtaining relevant information from public databases. It was found that higher TMB predicts better survival and clinicopathologic outcomes, which is closely related to higher T cells CD4 memory activated and T cells follicular helper, and lower T cells CD4 memory resting and T cells regulatory in TME. Additionally, a nomogram combined TMB-related signature with clinicopathologic variables was constructed, which can successfully predict the OS of GC patients with high accuracy and e ciency.
TMB has been shown to be a novel biomarker for predicting immunotherapy response in a variety of tumors. For example, Schrock et al. proved that TMB is an independent prognostic marker for the sensitivity of metastatic colorectal cancer to immune checkpoint [20]. Chae et al. proved that higher TMB can signi cantly improve the sensitivity of non-small cell lung cancer patients to anti-PD-1/PD-L1 immunotherapy, and enable patients to achieve better OS [21]. In HER2-positive refractory metastatic breast cancer, patients with higher TMB can achieve better survival outcomes [22]. In this study, higher TMB indicated better OS, higher DSS and longer PFI, in accordance with similar results in most other malignancies that higher TMB tended to induce local immune recognition and bring out improved prognosis. Additionally, higher TMB was observed in older women and patients with earlier stages, which meant that the higher the TMB, the slower the development and evolution of tumors, and the greater of elderly female patients to immunotherapy response, but this requires further experimental data to support it.
MSI is determined by mutations in damage-repair-related genes and can serve as a subset of TMB. This study revealed that MSI and TMB are positively correlated in GC, and patients with higher MSI are mostly distributed in the TMB high group. More importantly, MSI has a similar role to TMB in the initiation and development of tumorigenesis, that is, higher MSI is more common in elderly women and patients with earlier stages, which is consistent with the results of a meta-analysis [23]. In this meta-analysis, MSI mainly appeared in patients with stage I~III, with a distribution proportion of 11.2% in stage I, 21.2% in stage II, 10.8% in stage III and 7.9% in stage IV. Among female patients, the incidence of MSI was 46.2%, which is higher than microsatellite stability (33.7%) (OR=1.57, P<0.001). The mean age of MSI patients was 65.9 years, while MSS patients was 60.4 years, indicating that MSI was more common in elderly patients (OR=1.58, P<0.001). Studies have shown that MSI can increase the number of somatic mutations in tumors, which may induce innate anti-tumor immune responses and make tumors more sensitive to immune checkpoint inhibitors [24]. Therefore, patients with higher MSI may be the dominant population of immune checkpoint treatment, while MSI is a favorable predictor of therapeutic e cacy [25,26]. Consequently, this study once again proved that TMB high group patients may have a better immunotherapy response from the perspective of MSI.
Further analysis of TMB and TME, revealed that the immune/stromal score of the TMB high group was lower, while the tumor purity was higher. Additionally, low-grade tumors are characterized by low immune/stromal score and high tumor purity, while advanced patients have lower tumor purity. The above is consistent with previous research on glioma [27], that is, advanced glioma has higher immune/stromal score and low tumor purity characteristics, which is related to poor prognosis.
The above results highlight the importance and necessity of exploring the immune/stromal component of TME. Previous studies have found that CD4 + T cells, CD8 + T cells, B cells, and NK cells have vital immune surveillance and killing effects during tumorigenesis and progression, while macrophages, granulocytes, monocytes and dendritic cells assist tumor cells from immune surveillance and killing effect of T lymphocytes and NK cells through a variety of complex ways [28][29][30][31][32][33][34]. In view of the Kaplanmeier analysis and distribution density of 22 immune cells across two TMB groups, which provided a reasonable explanation for the different OS, DSS and PFI. In addition, we analyzed 38 validated immune checkpoints across two TMB groups. For example, FGL1 is normally secreted by hepatocytes and its high expression in malignant tumors is associated with poor prognosis. Wang et al. [13] found that FGL1 is a ligand of LAG3, and the interaction between FGL1 and LAG3 mediates tumor immune escape. VTCN1 receptor and PD-1 are co-expressed on the surface of CD8+ T cells [35], and up-regulated VTCN1 is closely related to the poor prognosis of malignant tumors [18]. Blocking VTCN1 will lead to a compensatory increase in PD-1, and blocking the two will enhance anti-tumor effect [35]. This study observed that CD28, CD40, CD40LG, IL12B, JAK1, PTPRC, TNFRSF4, TNFSF18 and VTCN1 are highly expressed in TMB low group, while IFNG, LDHA, PD-L1, and TNFSF9 are up-regulated in TMB high group, which provides new insights into the development of more effective immunotherapy.
The GO annotation and KEGG enrichment analysis of DEGs were completed, and GESA revealed active signaling pathways across two TMB groups, which further explained the difference in survival and clinicopathologic variables. Additionally, the TMB-related signature with high accuracy and e ciency was established to assess survival risk of GC patients. To our knowledge, this is the rst TMB-related predictive signature in GC though Lasso regression analysis, which can identify the combination of genes with the best predictive power. Moreover, a nomogram based on the TMB-related risk score and three clinicopathologic variables provides a visual method for predicting OS in GC, which is more accurate and effective than signature alone.
The published literature was analyzed to further understand the regulatory mechanism of three genes (SCG5, SERPINA5 and SPARCL1) in signature. SERPINA5 belongs to the serine protease inhibitor superfamily and has multiple functions such as preventing tumor cell metastasis and anti-angiogenesis [36], and its expression has been shown to be decreased in a variety of tumors [37][38][39][40]. For example, the DNA dose and expression level of SERPINA5 were signi cantly reduced in hepatocellular harcinoma (HCC) and negatively correlated with malignant progression. Additionally, SERPINA5 was observed to reduce the metastasis potential of HCC cells through direct interaction with bronectin and disruption of the bronectin-integrin signaling pathway [41]. SPARCL1 is a matricellular protein that is strongly expressed in the stomach, large intestine and lung [42]. Studies have revealed that SPARCL1 is an angiocrine tumor suppressor secreted by tumor vascular endothelial cells, with anti-adhesive, antiproliferative and anti-tumorigenic functions. It is frequently downregulated in tumors such as colorectal cancer or non-small cell lung cancer [42]. However, no studies have demonstrated the expression and regulatory function of SCG5 in GC or tumors. However, the study had limitations. First, basic study is required to demonstrate a potential regulatory relationship between TMB or MSI and immune in ltration. Second, large clinical studies are needed to reveal the effect of TMB or MSI on prognosis and clinicopathologic variables in GC patients. Third, laboratory studies are necessary to identify the effect of three genes signature on the biological behavior of GC.

Conclusion
This study observed that higher TMB means favorable survival and clinicopathologic outcomes for GC patients, which is closely related to the in ltration density of more T cells CD4 memory activated and T cells follicular helper, and lower T cells CD4 memory resting and T cells regulatory in TME. All above None.
Authors' contributions TF designed and supervised the project. RSX performed data analysis and wrote the paper. TF revised the manuscript.