Evaluation of samples’ purity and ploidy
The purity and ploidy of the STAD samples calculated by the above method is shown in S2_Table. The CCF of SSNV is demonstrated in S3 _ Table, and the CCF of SCNV is shown in S4 _ Table. In the end, 24359/10480 clonal/sub-clonal events (69.9%) were obtained for SNV, and 129048/5191 clonal/sub-clonal events (96.1%) for SCNV.
Genomic mutant signature analysis
Mutated signatures can reflect the potential effects of previous exposure to different carcinogens, as well as some characteristic changes associated with DNA damages and repair in STAD tumors. Here the brunet algorithm in NMF was used to identify the SNV signature. In order to ensure that the optimal number of SNV signatures could be identified, we evaluated cophenetic and rss when k = 2–10 (that is, there were 2–10 SNV signatures), respectively. According to these two indexes, k = 3 (that is, three SNV signatures) was chosen as the optimal quantity (S1_Fig). According to the trinucleotide mutation pattern, three SNV signatures were obtained, which were defined as Signature A-C. According to the base mutation pattern, signature A was mainly composed of “C > T”, while signature B was mainly composed of “C > A”, “C > G” and “C > T”, and the mutation pattern of “C > G” only appeared on signature B. Signature C mainly consisted of “T > G” (Fig. 1A). SNV was divided into clonal events and sub-clonal events based on CCF. No significant difference was observed in the contribution of the two types of SSNV to the three mutated signatures (S2 Fig A), showing that clonal events and sub clonal events were similar in mutation patterns.
Identification and distribution of mutant signatures
In order to evaluate the heterogeneity of mutant signatures, contributions of signature A-C were calculated in each sample (the larger the contribution, the higher the proportion of the signature in the sample). It was found that signature A accounted for a large proportion in most samples, while signature B and C accounted for a high proportion only in specific samples (Fig. 1B). By using the known 30 mutant signatures provided by COSMIC, we calculated the cosine similarity between three signatures and COSMIC mutational signatures (expressed by correlation coefficient), finding that signature B had high similarity with signature 3, signature13, signature C and signature 17(Fig. 1C). The similarity between signature A and signature 1 was the strongest (S2 Fig B). Signature 3 was associated with the repair failure of double strand breaks of homologous recombined DNA, which was mainly found in breast, ovarian, and pancreatic cancers. Apart from gastric cancer, Signature 17 was also found in esophageal cancer, breast cancer, liver cancer, lung adenocarcinoma, B-cell lymphoma and melanoma, but its relationship with tumor was still unknown. Signature 1 had been found in all cancer types and most cancer samples. It was generated by the endogenous mutation caused by spontaneous deamination of 5-methylcytosine. Its number of mutations was related to the age when cancer was diagnosed. (https://cancer.sanger.ac.uk/cosmic/signatures_v2)。
Variation analysis of cloning and subcloning genomes
The clonal/sub-clonal events data of SCNV and SSNV were integrated, and the clonal and sub-clonal structures of STAD samples were analyzed. The SCNV and SSNV genes with more than 5% occurrence times in all samples were selected, and 46 SCNV genes with the highest occurrence frequency and 101 SSNV genes (Fig. 2, S5_Table) were obtained, respectively. The results showed that the number of mutations of TP53, TTN and MUC16 genes in the samples was the highest (> 20%), and the major mutation was clonal events (S3_Fig, enrichment p < 0.05, S6_Table), indicating that these genes were more likely to occur as early mutation events. The number of clonal and subclonal mutations in common proto-oncogenes such as PIK3CA and CDH1 was relatively small (< 10%). MIEN1, GRB7 and PNMT genes had the largest number of CNV (Gain) appearing in the samples (> 10%), while for ERBB2, MYC, KRAS and HRAS, the number of CNV was small (6%-10%, S4_Fig).
Analysis of temporal sequence relationship between mutation and tumor evolution
In order to analyze the mutations involved in the occurrence and development of STAD, 46 SCNA and 101 SSNV with the highest mutation frequency were sorted according to CCF value (Fig. 3A, S5_FigA-B). On the whole, the CCF of SCNV was significantly higher than that of SNV (rank test p < 1e-5, mean ccf: 0.9287/0.9003). Gain was the major result for SCNV, and Loss accounted for a very small proportion. (Gain/Loss:1214/3).
In order to facilitate the demonstration, only the gene pairs with intergenic edges > = 2 were retained, and finally 369 SCNA pairs (S7_Table) and 119 SSNV pairs (S8_Table, S10_Table) were obtained. Five early SCNA genes and eight early SSNV genes (S9_Table, S10_Table) were obtained by edges enrichment analysis. In the temporal order results of SSNV, it was found that TP53, USH2A and GLI3 appeared the earliest in STAD, which could be the driver events of STAD. On the other hand, CTNNB1, LRP1B and ERBB4 appeared the latest in STAD, which might be related to the progress of STAD (Fig. 3B). The results of all SSNV’s edges are shown in S5_FigC (edges were not filtered). In the temporal order results of SCNA, MYC, KRT14 and KRT16 were defined as early genes and metaphase genes, and KRAS, ERBB2 and CCNE1 were late genes (Fig. 3C). For the results of all SCNV edges, see S5_FigD (edges were not filtered).
Relationship between cloning or sub-cloning events and prognosis
In order to study the effect of clonal or sub-clonal events on the survival of patients, the kaplan-meier method was adopted to analyze the prognosis relationship between the clonal status and overall survival of 46 high frequency SCNA genes and 101 high frequency SSNV genes (the number of mutations > 5%). When log rank test p < 0.1, five early genes (Fig. 4A-B, S6_Fig) which were obviously related to overall survival, 12 metaphase genes (Fig. 4C-D, S7_Fig) and eight late genes (Fig. 4E-F, S8_Fig) were obtained. From the KM curve of the overall survival rate, it could be found that the clonal events of early genes of MYC and DNAH9 had a more significant effect on the prognosis than the sub-clonal events did. On the other hand, in the intermediate and late genes, clonal events and sub-clonal events showed significant effects on the prognosis of OS. Interestingly, the clonal events of the intermediate gene OBSCN corresponded to better prognosis, while the sub-clonal events corresponded to poor prognosis, which was different from other genes.
Relationship between cloning or sub-cloning events and clinical characteristics
Based on the previous method, clonal events of SCNA and SSNV were obtained. The relationship between clonal events and sub-clonal events and clinical characteristics was analyzed with the clinical information provided by TCGA. The differences of clonal/sub-clonal events in TNM, stage, age, gender and organization types were analyzed. The results showed that there were significant differences in the number of sub-clonal events in T stage (Fig. 5A). N stage, gender and tissue types have significant differences in clonal events (Fig. 5B-D). The risk of gastric adenocarcinoma in males was higher than that in females, which was consistent with our observation that clonal events in males was significantly higher than that in females. Papillary and tubular mutation in tissue types was significantly higher than that in other types. There was no significant difference observed in clonal/sub-clonal events among M, stage, age and grade (S9_Fig).
Relationship between cloning or sub-cloning events and TMB/Neoantigens
Tumor mutation burden (TMB) and neoantigen are important biomarkers in immunity checkpoint therapy, and the appearance of clonal/sub-clonal events also has an important effect on the occurrence and progression of tumor. Therefore, the relationship between clonal/sub-clonal events and TMB and neoantigen was analyzed. Because the distributions of TMB, neoantigen and clonal/sub-clonal events did not conform to normal distribution (shapiro test p < 1e-5), the spearman method was used to evaluate the correlation among them. According to the significance test, there is a highly significant relationship among clonal events and TMB and neoantigen (Fig. 6A-C), but the correlation among sub-clonal events and TMB and neoantigen was weak (Fig. 6D), which seemed to indicate that the emergence of clonal events contributed greatly to tumor mutation burden and new antigen production. The mutation of mismatch repair genes (MMR) has an important effect on the mutation burden of genome. The clonal/sub-clonal difference between MMR’s mutated samples (Mut) and non-mutated samples (WT) was further evaluated. It was found that the clonal/sub-clonal events in the Mut group was higher than that in the WT group (Fig. 6E), but there was no significance observed, which might be related to the smaller sample size of the Mut group. TMB and neoantigens in the Mut group were significantly higher than those in the WT group (Fig. 6F), but there was no significant difference between them in OS (Fig. 6G), indicating that although the abnormality of the mismatch repair system had an important effect on genomic stability, its relationship with prognosis was complicated.