Characteristics of Subtypes Within Microsatellite Instability-high Gastric Cancer Based on a Gene Signature Related to Immune Microenvironment Components

Background: Gastric cancer patients with microsatellite instability-high (MSI-H) status have a better clinical prognosis and higher response rate to immune checkpoint inhibitors. However, recent studies have suggested that some molecular pathways in MSI-H tumors could affect tumor immune microenvironment (TIME) components, thereby leading to immunotherapy resistance. We aimed to establish subtypes based on the TIME components of MSI-H gastric cancer and analyze the characteristics of each subtype. Methods: Cohorts from the Cancer Genome Atlas, the Asian Cancer Research Group, and Peking University Cancer Hospital were used for this study. CIBERSORT software was used to analyze the TIME components. A set of genes based on the TIME component characteristics, which we named the MSI-TIME signature, was dened using k-means cluster and differentially expressed gene analysis. Results: By using the MSI-TIME signature in the aforementioned cohorts for cluster analysis, the TIME subtypes within MSI-H gastric cancer (MSI-S1, MSI-S2) were established; the differences between the subgroups were reected in multiple aspects. The MSI-S1 subtype was characterized by a high density of CD8+ T cells, high expression levels of immune checkpoint molecules including PD-L1, PD-L2, CTLA-4, and a high T-cell inammation level. Patients with the MSI-S1 subtype could also benet from adjuvant chemotherapy. In contrast, the WNT/β-catenin pathway was enriched in the MSI-S2 subtype. Conclusion: We found that patients with MSI-H gastric cancer showed very different TIME characteristics and could be divided into two subtypes accordingly. These results might benet MSI-H gastric cancer patients developing individualized treatment strategies in the future.

promote resistance to immune checkpoint blockade [11]. In addition, a study on metastatic melanoma found that immunogenic neoantigen density could not explain the antitumor activity of the immune microenvironment [12]. Therefore, there may be different gene expression characteristics in patients with MSI-H that lead to different tumor immune microenvironment (TIME) components and, consequently, have different responses to immune checkpoint blockade.
Differences in gene expression pro les related to TIME characteristics in MSI-H gastric cancer are unclear.
In this study, we used CIBERSORT software for the deconvolution of immune cell subtypes based on the gene expression pro les of MSI-H gastric cancer patients from TCGA and ACRG databases.
Subsequently, we de ned MSI-H gastric cancer subtypes based on TIME in ltration patterns; a cohort with gene expression pro le data from our center, Peking University Cancer Hospital (PUCH), was used for further validation. Our study on MSI-H gastric cancer subtypes can help understand gene expression differences within MSI-H and may contribute to the development of new treatment strategies.

Methods
Gene expression data processing Gene expression data and corresponding clinical data for the TCGA gastric cancer cohort were obtained from the TCGA data portal site (https://portal.gdc.cancer.gov/repository) on January 12, 2020. The downloaded RNA-sequencing data with fragments per kilobase million values were transformed into transcripts per kilobase million values. According to MSI status data (TCGA clinical data element ID: 3226963), 67 gastric cancer samples with MSI-H status were identi ed among the available 375 gastric cancer samples.
The microarray gene expression data of the ACRG cohort (GSE62254, Affymetrix Human Genome U133 Plus 2.0 Array) were downloaded from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih. gov/geo/). The data with log10-transformed robust multichip analysis signal intensity was used for further analysis. Among the 300 gastric cancer samples of the ACRG cohort, 68 gastric cancer samples with the MSI-H status were found according to the clinical feature data.
Additionally, a gastric cancer cohort including 31 samples with mismatch repair de ciency (dMMR) obtained from patients treated at PUCH between 2007 and 2010 was considered for further validation.
Gene expression pro ling was performed using the Agilent human mRNA & lncRNA Array 4.0 platform; detailed processing has been described in our previous study [13].

Somatic mutation data processing
The somatic mutation data (VarScan2 Variant Aggregation and Masking) were also obtained from the TCGA website. The tumor mutation burden (TMB) of each sample was calculated using the method proposed by Zaretsky et al. [14]. Next, the mutant-allele tumor heterogeneity (MATH), which is used to measure intratumor heterogeneity, was calculated using the maftools package in R software (Version: 3.6.0). The relative abundance of 22 tumor-in ltrating immune cell types, including B cells, T cells, natural killer  cells, macrophages, and dendritic cells, was estimated using CIBERSORT software (http://cibersort.stanford.edu/) according to gene expression data [15]. The evaluation of broblasts and stromal score were based on analysis via xCell software (https://xcell.ucsf.edu), which is a gene signature-based tool to infer immune and stromal cell types [16]. The k-means clustering method was used to identify TIME patterns and classify patients. The optimal number of TIME pattern clusters was determined using the NbClust R package.

Estimating and clustering the in ltrating cells in the TIME
Determining the MSI-H gastric cancer subtypes The analysis of differentially expressed genes (DEGs) among the TIME patterns was performed using the limma R package; genes with an adjusted P value < 0.05 were determined as DEGs. Based on these DEGs, we used a non-negative matrix factorization (NMF)-based unsupervised clustering method to de ne MSI-H gastric cancer subtypes. The NMF R package was used for the above clustering process.
Gene ontology (GO) and pathway enrichment analysis GO enrichment analysis of the DEGs was performed using the clusterPro ler R package. Signi cant GO terms were identi ed with P value < 0.05. Pathway analysis based on the different subtypes of MSI-H gastric cancer was performed using the gene set enrichment analysis (GSEA) JAVA program (http://software.broadinstitute.org/gsea/index.jsp). The analysis was conducted with 1000 gene set permutations, and pathways were ranked based on their enrichment scores; the top 20 pathways were selected for further analysis.

Statistical analysis
Comparisons among groups were performed for continuous parameters using Student's t-test for normally distributed parameters or Wilcoxon-Mann-Whitney rank sum test for non-normally distributed parameters. For the analysis of differential genes, the P values were converted to FDRs using the Benjamini-Hochberg method. Survival differences were compared using the log rank test and plotted using the Kaplan-Meier method. All statistical analyses were conducted using R software (Version: 3.6.0); statistical test results were two sided, and P values < 0.05 were considered statistically signi cant.

Results
Landscape of two different TIME patterns in the TCGA cohort A summary of the immune cell composition of the 67 MSI-H gastric cancer tissues in the TCGA cohort is shown in Supplemental Fig. 1. The ve most common immune cells were M0 macrophages, resting memory CD4 + T cells, M2 macrophages, CD8 + T cells, and M1 macrophages in descending order ( Fig. 1A). NbClust provides 30 indexes for determining the optimal number of clusters; for the cluster analysis of TIME components, among the range of tested clusters (2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15), two clusters were considered optimal, as supported by six indexes (Supplemental Fig. 2). Based on the k-means method, two distinct TIME pattern landscapes were evident, as shown in Fig. 1B. The rst pattern was characterized by an increase in the composition of resting memory CD4 + T cells, resting NK cells, M0 macrophages, activated mast cells, and neutrophils; whereas, the second pattern had a notable increase in the composition of CD8 + T cells, activated memory CD4 + T cells, regulatory T cells, gamma delta T cells, activated NK cells, M1 macrophages, and resting mast cells.

Functional annotation of the DEGs and subtyping of MSI-H gastric cancer
Using the limma R package, 77 DEGs were identi ed among the two MSI-H TIME patterns were of the TCGA cohort; of these, 65 DEGs were upregulated in the second TIME pattern and 12 DEGs were upregulated in the rst TIME pattern. The detailed results are shown in Supplemental Table 1. GO analysis using the clusterPro ler R package revealed that the 65 upregulated DEGs in the second pattern were mainly enriched in terms related to immune response, defense response, response to stimulus, and T cell activation, whereas the 12 upregulated DEGs in the rst pattern were mainly enriched in in ammatory response and collagen catabolism terms ( Fig. 1C and 1D). The NMF algorithm was applied to the 65 upregulated DEGs in the second pattern, and the optimal number of clusters was selected where the magnitude of the cophenetic correlation coe cient begins to fall (Supplemental Fig. 3). Then, the TCGA cohort fell into two distinct subgroups; 29 tumors were classi ed as MSI-H subtype 1 (MSI-S1) and 38 tumors were classi ed as MSI-S2 ( Fig. 2A). Clinical characteristics, including sex, age, tumor location, histology, Lauren type, and stage, were similar between the two MSI-H subtypes (Table 1). Furthermore, we also validated the obtained DEG subtyping results using the NMF method in the ACRG and PUCH cohorts. Analysis of the optimal number of clusters in both cohorts suggested the presence of two subgroups; among the 68 samples of the ACRG cohort, 33 tumors were classi ed as MSI-S1 and 35 tumors were classi ed as MSI-S2; whereas the 31 samples of the PUCH cohort were distributed as follows: 17 MSI-S1 and 14 MSI-S2 (Supplemental Fig. 4-7). Interestingly, the ratio of MSI-S1 to MSI-S2 in all three cohorts was close to 1:1.
By analyzing somatic mutation data, the top 20 frequently mutated genes of the two subgroups were noted, as shown in Fig. 3A. We identi ed 130 mutation genes with signi cant differences between the two subtypes (Supplemental Table 2); 13 mutation genes with P < 0.01 were shown in Fig. 3B. Previous studies have reported the association between ABCB4 and PIK3R2 and drug resistance in gastric cancer [19,20]; these data may help us explore the mechanism of their impact on the TIME in the future.
In addition, some studies have suggested that the level of TMB or MATH could re ect the tumor heterogeneity and that they were associated with immunogenic antigen density. Moreover, a comparison of the level of TMB and MATH between the two subgroups showed no signi cant difference (P = 0.501; 0.621, respectively) ( Fig. 3C and 3D). This result may suggest that the suppression of antitumor TIME in MSI-H gastric cancer did not result from the lack of antigens. Furthermore, we analyzed and compared the level of broblasts and the stromal score between the two subgroups and noted no signi cant differences (P = 0.105; 0.056, respectively) ( Fig. 3E and 3F). In other words, there may be some molecular pathway mechanisms that affect the composition of the TIME in MSI-H gastric cancer.
Validation of the gene expression pro le and TIME characteristics in the ACRG and PUCH cohorts To further validate the versatility of the subtypes based on the DEGs, we performed the corresponding analysis in the ACRG and PUCH cohorts. For the ACRG cohort, some immune checkpoint genes had signi cantly high expressions in the MSI-S1 subgroup, including PD-L1 (P < 0.001), PD-L2 (P < 0.001), CTLA-4 (P < 0.001) (Fig. 4A). Moreover, TIME component analysis revealed signi cantly high proportions of CD8 + T cells, activated memory CD4 + T cells, and M1 macrophages in the MSI-S1 subgroup and high proportions of resting memory CD4 + T cells, and M0 macrophages in the MSI-S2 subgroup (all P < 0.05, Fig. 4B). As for the PUCH cohort, we saw a similar expression pro le with high levels of PD-L1 (P < 0.001), PD-L2 (P < 0.001), and CTLA-4 (P < 0.001) along with a high proportion of CD8 + T cells (P = 0.003) in the MSI-S1 subgroup (Fig. 4C, 4D). In addition, the T cell-in amed signature was highly expressed in both of the ACRG and PUCH cohorts (Fig. 4E,4F). Notably, the expression of immune-related genes and the TIME component characteristics were consistent in all three cohorts.
GSEA was performed based on gene expression data from the MSI-S1 and MSI-S2 subgroups; the top 20 enriched pathways in the MSI-S2 subtype of all three cohorts are shown in Supplemental Table 3. Based on the top 20 pathways, we analyzed the common-enriched pathways in the three cohorts; bile acid metabolism, downregulation of K-ras signaling, and WNT/β-catenin pathways were enriched in MSI-S2 of all three cohorts (Fig. 5).
The Kaplan-Meier method revealed no signi cant difference in patient survival between the two subgroups (MSI-S1 and MSI-S2) in the ACRG and PUCH cohorts (Fig. 6A, 6D). However, we conducted further strati ed analysis based on whether adjuvant chemotherapy was received; patients in the MSI-S1 subgroups bene ted from adjuvant chemotherapy (P = 0.043; 0.050, respectively) (Fig. 6B, 6E), whereas those in the MSI-S2 subgroups were not affected by adjuvant chemotherapy (Fig. 6C, 6F).

Discussion
Our study focused on the TIME characteristics of MSI-H gastric cancer. We used three cohorts to analyze and validate the involved gene expression pro les and somatic mutation data; we found that one set of genes based on the TIME patterns, which we named the MSI-TIME signature, can divide MSI-H gastric cancer into two subgroups with completely different TIME characteristics; in other words, we were the rst to demonstrate TIME subtypes within MSI-H gastric cancer (MSI-S1, MSI-S2). The differences between the two subgroups were re ected in multiple aspects, including differences in in ltration of multiple immune cells, expression levels of immune checkpoints, T cell in ammation levels, and clinical chemotherapy bene t. These ndings will help us better understand the TIME characteristics of MSI-H gastric cancer and optimize the clinical application strategy of immunotherapy, chemotherapy, and targeted therapy.
In recent clinical trials on MSI-H/dMMR non-colorectal cancer patients (including gastric cancer, endometrial cancer, pancreatic cancer, etc.) receiving pembrolizumab treatment, the objective response rate (ORR) differed among different tumors, ranging from 0% in brain tumors to 57.1% in endometrial cancer [21]. In addition, the ORR of gastric cancer patients was 45.8%. Although these results showed that the response rate to immune checkpoint inhibitor therapy is relatively good in tumor patients with MSI-H, about half of the patients with gastric cancer still respond poorly. Interestingly, the grouping ratio of our MSI-H subtyping according to TIME characteristics is close to 1: 1, which is consistent with the ORR of the aforementioned clinical trial, suggesting that there are distinct TIME characteristics in MSI-H gastric cancer that affect the e cacy of immunotherapy. In recent studies, most biomarkers that could predict the e cacy of immune checkpoint inhibitors, except for the MSI-H/dMMR status, involved TMB, intratumor heterogeneity, immune checkpoint molecule expression levels, and tumor-in ltrating immune cell levels [9,14,[22][23][24]. We found that the expression levels of immune checkpoint molecules and the levels of tumor-in ltrating immune cells were signi cantly different between the two subgroups. Considering their predictive effects on the e cacy of immunotherapy, the PD-L1 molecular level and CD8 + T cell density should be measured before applying immunotherapy in MSI-H gastric cancer.
MSI-H tumors were often associated with defects in DNA mismatch repair, which led to higher levels of genetic mutations and neoantigens and eventually caused an active antitumor immune microenvironment; this is generally believed to improve the immunotherapy e cacy [25][26][27]. Our study found that the tumor mutation load and intratumoral heterogeneity level were comparable between the subgroups. In a previous study, the density of immunogenic neoantigens in melanoma could not su ciently explain the existence of the antitumor immune microenvironment [12]; considering this and our results, we believe that the differences in the TIME of MSI-H gastric cancer may not be related to TMB and MATH. Although we used mutation data analysis to nd some differential mutant genes between subgroups, the impact of these gene mutations on the TIME is still unknown; as they may have potential effects, further research is needed in the future. Moreover, we analyzed stromal cells and found no signi cant differences in the levels of broblasts (the main component of tumor stroma) between the two subgroups. Finally, we focused on the effects of molecular pathways; according to the results of the GSEA, terms for bile acid metabolism, downregulation of K-ras signaling, and WNT/β-catenin pathways were highly enriched in the MSI-S2 subgroup of the three cohorts. Some studies suggested that the WNT/ β-catenin pathway could promote tumor intrinsic resistance to immune checkpoint inhibitors [28][29][30].
Although the effects of bile acid metabolism and K-ras signaling on the antitumor immune response are unclear, our study may provide some potential targets for future research to help improve the e cacy of immunotherapy.
The effect of the MSI-H status on prognosis depends on the type of tumor. It is considered to lead to more aggressive malignancy in endometrial cancer [31], whereas it is associated with better clinical prognosis in colorectal and gastric cancers [32][33][34][35]. Recently, studies have discussed more about chemotherapy resistance in MSI-H tumors; the negative prognostic effects of adjuvant chemotherapy in MSI-H colorectal cancer have been con rmed [32,36]. Moreover, the National Comprehensive Cancer Network guidelines recommended that the MMR protein test be done before decision of adjuvant chemotherapy for resected stage II colon cancer; recent studies on gastric cancer also suggested that resected MSI-H gastric cancer could not bene t from adjuvant chemotherapy [36,37]. However, after dividing MSI-H gastric cancer into two subgroups, we realized that this is not the case in the MSI-S1 subtype; this may indicate that the MSI-H phenotype is not completely resistant to adjuvant chemotherapy and further supports individual screening; after all, a considerable number of patients can still bene t from adjuvant chemotherapy. Thus, since the TIME characteristics of MSI-S1 suggested better immunotherapy e cacy, testing whether a combination of immunotherapy and chemotherapy can achieve a better clinical prognosis in patients with this subtype is worth further research. Our current research is also not enough to explain the speci c mechanisms underlying the effectiveness of chemotherapy in these patients; thus, more in-depth research is necessary.

Conclusions
In summary, we found that patients with MSI-H gastric cancer showed very different TIME characteristics and could be divided into two subtypes accordingly. Based on the multi-omics and multi-cohort analyses of MSI-H gastric cancer subtypes, the two subgroups were found to have different levels of immune cell in ltration and enriched molecular pathways, which may affect the clinical e cacy of immunotherapy as well as chemotherapy and targeted therapy; however, further research is needed to con rm these conclusions. We do not believe that our research results can be quickly used for clinical practice; however, we believe that future molecular-level studies and well-designed prospective clinical trials can help MSI-H gastric cancer patients obtain the best treatment strategy.   DEGs to classify patients into MSI-S1 and MSI-S2 subtypes, clinicopathological parameters are shown as patient annotations. (b) Comparison of CYT score between the MSI-S1 and MSI-S2 subtypes (P < 0.001).

Figure 4
Gene expression pro le and TIME characteristics in the ACRG and PUCH cohorts. Comparison of the immune checkpoint gene expression levels between the MSI-S1 and MSI-S2 subtypes in the (a) ACRG cohort and (b) PUCH cohort. The range of P values are labeled above each boxplot with asterisks (*P < 0.05, **P < 0.01, ***P < 0.001). Comparison of the TIME components between the MSI-S1 and MSI-S2

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. supplementarymaterials.pdf