After multiple steps of screening, we obtained a total of 2318 candidate genes, which were accounted potentially driving the development and progression of GCs (Fig. 1, Fig. 2A). In the process of integrating multi-omics data, we required that the candidate genes should have concordant changes between copy number alteration (CNA) and mRNA expression. For example, the candidate gene DERL1 was recognized to be copy number amplification, which have significantly higher mRNA expression level in patients with CNA compared to wild-type patients (p <= 0.001, two-sided Wilcoxon’s rank-sum test; Fig. 2BD). Conversely, for candidate gene NAA15, patients with copy number deletion have significantly lower mRNA expression levels compared to wild-type patients (p <= 0.001, two-sided Wilcoxon’s rank-sum test; Fig. 2CE). As a result, the gene size decreases rapidly with the sample size increases when implemented the step that CNA concordant affects mRNA expression itself (Fig. 2AF). After integrating CNA and mutation profile, we found that the candidate genes have high variation frequency, which ranged from 16.9% to 69.8% (the average is 39.1%), and the variant samples size of most genes were between 11 and 20 (Fig. 2G). Also, we found that the number of genes with more than 30 variant samples reached 38 (Fig. 2G). This suggests that adding somatic mutation information can help us to conduct a more comprehensive genomic variation research.
Determining prognostic-related key driver genes (KD genes)
Studies have shown that cancer driver mutations can provide tumor cell with growth advantage, thus contribute to tumor initiation, progression or metastasis [27,28]. The key driver genes should have an impact on the patient's survival time. Therefore, it is urgent to identify key driver genes from numerous tumor genome events. First, we obtained two expression profile datasets of GCs from the GEO database (GSE13911 and GSE29272), and performed gene differential expression analysis (see Materials and methods in detail). Hierarchical clustering analysis showed that candidate genes in GC patients were significant expression disorders compared to normal patients (Fig. S1AB). By overlapping the differentially expressed genes with the above-identified candidate genes, we obtained the candidate driver genes with different expression levels of tumor tissues and normal tissues (388 in total). For each candidate driver gene, the cancer samples were split into two groups according to its copy number status (one group with CNA and the other without). Then, we used the clinical data of GCs for survival analysis based on CNA status (see Materials and methods in detail). Finally, we have obtained a total of 31 prognostic-related key driver genes (KD genes) (Supplementary Table 1). For example, the patients with KD gene ABCE1 CNA (deletion) had significantly better disease-free survival (DFS) (p < 0.0096, Log-rank test; Fig. 3A). In contrast, patients with KD gene SHFM1 CNA (amplification) had significantly worse overall survival (OS) (p < 0.033, Log-rank test; Fig. 3B). Further, we mapped the genomic variation landscape of the 31 prognosis-related KD genes in GC patients (Fig. 3C). We found that these KD genes have high genomic variation frequencies (including CNA and somatic mutation). For example, KIAA0196 (also known as WASHC5) has the highest genomic variation frequency (65.5%), which exhibited copy number amplification in 189 patients (where 23 patients were high-level amplification), and somatic mutations occurred in 12 patients (Fig. 3C). In addition, the variation frequency of KD gene ABCE1 (ATP-binding cassette E1) was 42.2%, which included copy number deletion in 103 patients (all were copy number loss) and somatic mutations occurred in 5 patients (Fig. 3C). ABCE1 is a member of the ATP-binding cassette transporters and regulates a broad range of biological functions including viral infection, cell proliferation, and anti-apoptosis. Previous research have shown ABCE1 plays an essential role in lung cancer progression and metastasis [29]. In our study, however, ABCE1 as a key driver gene associated with prognosis in GCs.
Construction of transcriptional dysregulatory networks of KD genes
Recent studies have confirmed that regulatory factors can affect the transcriptional activity of protein-coding genes, and thus contribute to disease [30,31]. Regulatory factors miRNA functions in RNA silencing and post-transcriptional regulation of gene expression, which via base-pairing with complementary sequences within mRNA molecules [32]. Transcriptional factors that are activators can help turn specific genes "on" or "off" by binding to nearby DNA [31]. For example, the oncogenic TF TAL1 can produce a modified autoregulatory circuitry that drives the oncogenic program in T-cell acute lymphoblastic leukemia [31]. Thus, KD genes should have closely functional association on biological network with regulatory factors. We obtained experimental and/or predicted miRNA-target gene pairs and TF-target gene pairs from known databases (Table 2; see Materials and methods in detail). After calculation, we determined the regulatory factors (TFs and miRNAs) with significant regulatory coefficients for each KD gene (Fig. S2AB). Simultaneously, using the regulatory relationship between regulatory factors and KD genes, we constructed a transcriptional dysregulatory network (Fig. 3D; see Materials and methods in detail). Consistent with previous studies, we found that regulatory factors, particularly miRNAs, play a crucial role in cell growth and development [33,34]. For example, KD gene SPOCK1 was regulated by 12 miRNAs, including miR-7-5p, miR-155-5p, miR-326, and miR-107 (Fig. 3DE). Research shows that SPOCK1 can promote the invasion and metastasis of gastric cancer through Slug-induced epithelial-mesenchymal transition [35]. While miR-155-5p can form a regulatory feedback loop with STAT1 and might trigger cancer immunoediting in order to allow tumor cells to escape from immunosurveillance and even to promote tumorigenesis [36]. Also, KD gene NR3C2 was regulated by 9 miRNAs including miR-7d-5p, miR-155-5p, miR-421 and miR-32-5p (Fig. 3DE). Studies have shown that NR3C2 play a role in transcription regulator and molecular transducer activity [37], and can be inhibited by migration inhibitory factor (MIF) induced signaling pathway, which was a key driver of disease aggressiveness in patients [38]. From the dysregulatory network, we found that miR-7d-5p, miR-155-5p and miR-135b-5p were synergistic regulated to KD genes SPOCK1 and NR3C2 (Fig. 3DE). In addition, several miRNAs could simultaneously regulate multiple KD genes, for instance, miR-140-5p regulated to KD genes SPOCK1 and ABCE1, and miR-26b-5p regulated to SPCS3 and SMARCA5 (Fig. 3D). Studies have shown that miR-140-5p has been characterized as a tumor suppressor in gastrointestinal cancer [39,40]. Especially, gastric cancer, miR-140-5p could suppress the proliferation, migration and invasion of tumor cells by regulating YES1 [41]. This indicates that these miRNAs play a "hubs" regulatory role in the process of biological metabolism and have a biological significance.
Table 2. Information of TF and miRNA regulating mRNA used in this study.
|
Interaction
|
Regulator
|
Target
|
Resources
|
TF-gene
|
626,331
|
752
|
19,257
|
Transfac[16], UCSC[17], Chipbase[18]
|
miRNA- gene
|
680,008
|
2,562
|
16,339
|
miRTarbase[19],starbase[20]
|
In the dysregulatory network, a total of 7 TFs are involved (Fig. 3DE, Fig. S2B), including TF MYC positive regulated to KD gene NAA15 (Padj = 9.95e-08, R = 0.32), CUX1 negative regulated to MRPL13 (Padj = 0.001, R = -0.20), and EGR1 negatively regulated to SMARCA5 (Padj = 0.022, R = -0.14). Similarly, we found that multiple TFs were synergistic regulated to KD gene. For example, TFs HNF4A, SREBF1 and POU2F1 simultaneously regulated the transcription of KD gene ETFDH, in which up-regulated by HNF4A (Padj = 3.5e-04, R = 0.22) and SREBF1 (Padj = 0.02, R = 0.14), while down-regulated by POU2F1 (Padj = 7.05e-06, R = -0.27) (Fig. 3DE, Fig. S2B). Previous researches have shown SREBF1 (also known as SREBP-1) is a key regulator of fatty acid metabolism and plays a pivotal role in the transcriptional regulation of different lipogenic genes that mediate lipid synthesis, which acts as a cancer promoter in human diseases [42,43]. The TFs as the dysregulatory factor of KD genes, we also found the "hubs" in the dysregulatory network. For instance, TF HNF4A regulated KD genes ETFDH and ACADVL (Padj = 0.33, R= 0.33), and ARNT regulated KD genes NR3C2 (Padj = 3.34e-04, R = 0.22) and DERL1 (Padj = 3.51e-05, R = 0.25) simultaneously (Fig. 3D). Wang H et al. generated a miRNA-TF regulatory network, and discovered 5 regulators might have critical roles in colorectal cancer pathogenesis, which was helpful to understand the complex regulatory mechanisms and guide clinical treatment [44]. Interestingly, in our study, KD genes were regulated by multiple types of dysregulatory factors. For example, KD gene ACADVL was regulated by TF NHF4A and miRNAs miR-100-5p and miR-99a-5p, while, KD gene RAA15 was regulated by MYC, miR-497-5p and miR-145-5p (Fig. 3DE). Hao S et al. revealed that 5 miRNAs (including miR-145, miR-497, miR-30a, miR-31, and miR-20a) were considered to regulate tumor cells proliferation through TFs [45]. These findings suggested that transcriptional regulators play a crucial role in the dysregulation of KD genes, and studies of these dysregulatory factors may facilitate biomarker discovery.
Functional mechanism portrays of KD genes and dysregulatory factors
In order to characterize the molecular mechanisms of KD genes, we first use the g:profiler online tool for functional enrichment analysis (see Materials and methods in detail). As a result, our KD genes were enriched in many types of biological functions, including Gene Ontology (MF, BP, CC), pathway (REAC and WP) (Fig. 4A). In details, KD genes were enriched in apoptosis-associated functions, like "Apoptosis", "Programmed cell death", "Apoptosis-related network due to altered Notch3 in cancer", and "Apoptosis induced DNA fragmentation", etc.. (Fig. 4B). In addition, several KD genes were enriched in immune-associated function, such as "Antigen processing: Ubiquitination & Proteasome degradation" (Fig. 4B). To further characterize our results, we sought to characterize cancer hallmark landscape of KD genes. In brief, we calculated the semantic similarity score between KD genes-related GO terms and known cancer hallmark-related GO terms [46]. We found that the semantic similarities between our KD genes and the of apoptotic-related cancer hallmark, "Evading Apoptosis", were 0.35. Also, the semantic similarities with two immune-related cancer hallmark ("Evading Immune Detection" and "Tumor Promoting Inflammation") were 0.3 and 0.48, respectively (Fig. 4C). For cancer hallmark "Genome Instability and Mutation", the semantic similarity with KD genes is the highest, reaching 0.65 (Fig. 4C), which indicates that genomic variation of KD genes have important functional mechanisms, including genome instability occurred, and thus play a carcinogenic role in biology [47]. Interestingly, our KD genes were enriched with the functions related to the synthesis and secretion of gastric hormones, such as "Synthesis, secretion, and deacylation of Ghrelin" (Fig. 4C). Ghrelin is an endogenous peptide hormone mainly produced in the stomach. Previous research have shown that ghrelin can be a promising therapeutic option for cancer cachexia [48]. In addition, KD genes are also enriched with functions related to cell growth, development and metabolism, such as "Biosynthetic process", "Negative regulation of biological process" and "Cellular macromolecule metabolic process".
Our above results have stated that the regulators of KD genes plays an important regulatory role in the carcinogenesis. Using g:profiler, we also observed that KD genes were enriched in the regulatory relationships with TFs and miRNAs (Fig. 4A). In order to study the functional mechanism of regulators, we used the enrichment analysis of KD genes to further characterize. By comparing the regulators we identified with the TFs and miRNAs enriched by KD genes, we found that 4 TFs and 5 miRNAs were confirmed by enrichment (Fig. 4D). The confirmed 4 TFs include EGR1 (enrichment significance P= 1.42e-04), HNF4A (P= 4.42e-03), POU2F1 (P= 7.67e-03) and MYC (P= 4.70e-04) (Fig. 4D; Supplementary table S2). Where EGR1 regulation of 10 KD genes such as SMARCA5, POLR3C, and MRPL13, etc. (Fig. 4D). TF HNF4A simultaneously regulated two KD genes ACADVL and ETFDH. In addition, TFs HNF4A and POU2F1 combination regulated KD gene HMGB2 (Fig. 3D; Supplementary table S2). It is worth noting that TF MYC and miR-139-5p combination regulated KD gene NAA15 (Fig. 3D; Supplementary table S2). While miR-139-5p (P= 0.028) acts as a confirmed miRNA by enrichment, it also regulated two KD genes HMGB2 and DERL1 (Fig. 3D; Supplementary table S2). In addition, miR-30a-5p (P= 0.021) regulated KD genes SPCS3, PPID and DERL1 (Fig. 3D; Supplementary table S2). The miRNAs confirmed by KD gene enrichment also includes miR-26b-5p (P= 0.013), miR-32-5p (P= 0.037) and miR-186-5p (P= 0.044) (Fig. 4D). In summary, functional enrichment analysis indicated that the KD genes were involved in vital biological functions, and further demonstrated that the regulators can be used as potential biomarkers for further experimental studies.
Drug response effects in preclinical cell models of GC
To explore the potential effects of KD genes on drug response, we evaluated whether their expression level could influence drug response across 38 preclinical cell models of GC from Cancer Cell Line Encyclopedia (CCLE). We found that multiple KD genes presented strong correlations with the drugs response in GC cells (Fig. 5A; see Materials and methods in detail). For example, irinotecan, as a broad spectrum anticancer drug, showed a significant positive correlation with the expression levels of 5 KD genes in GC cells, including KIAA0196 (R= 0.81, P= 0.02), POLR3C (R= 0.86, P= 0.01), RNF139 (R= 0.74, P= 0.04), DERL1 (R= 0.88, P= 0.007), TRMT12 (R= 0.79, P= 0.02) (Fig. 5A-C). This result indicated that these KD genes expression levels could enhance the resistance of irinotecan in GC cells. Studies have shown that patients with advanced gastric cancer are often treated with irinotecan monotherapy as salvage-line therapy [49].
Sorafenib is a multi-kinase inhibitor with activity against angiogenesis and RAF-MEK-ERK pathway, it could inhibit proliferation of human gastric cancer cell line, and may reverse resistance to cisplatin through down-regulating MDR1 expression [50]. In our study, however, the drug response of sorafenib in GC cells showed a significant negative correlation with the expression level of KD gene NAA15 (R= -0.6, P= 0.0092) (Fig. 5A-B). Moreover, paclitaxel, as a widely used anticancer drug, exhibited multiple response patterns at the expression level of KD genes. For instance, paclitaxel showed strong resistance in GC cells with upregulated SPOCK1 expression (R= 0.59, P= 0.01), while showing strong sensitivity in GC cells with upregulated HMGB2 (R= -0.67, P= 0.002) (Fig. 5AB and Fig. 5D). Studies have shown that nab-paclitaxel as second-line treatment in locally advanced inoperable or metastatic gastric and gastroesophageal junction carcinoma is an active chemotherapy regimen [51,52].
Interestingly, the KD genes we identified have response effects in multiple anticancer drugs (Fig. 5AE). For example, in addition to sorafenib, RAF265 (R= -0.67, P= 0.0023), Nultlin-3 (R= -0.65, P= 0.0037) and L-685458 (R= -0.64, P= 0.0042) all showed strong drug sensitivity in GC cells with upregulated NAA15 expression (Fig. 5AE). However, the small molecule compound ZD-6474 showed significant drug resistance at the expression level of NAA15 (R= 0.48, P = 0.045). In addition to irinotecan, multiple drugs showed strong resistance at the expression level of KD gene DERL1. For instance, drug responses, including AZD6244 (R= 0.55, P= 0.019), PD-0325901 (R= 0.54, P= 0.020) and panobinostat (R= 0.49, P= 0.041), were presented significant positive correlation with DERL1 (Fig. 5AE). Study shown that pan-histone deacetylase inhibitor panobinostat sensitizes gastric cancer cells to anthracyclines via induction of CITED2 [53]. Moreover, the drug response of multiple small molecules exhibited strong drug resistance on KD gene SPOCK1 (response effects from 0.47 to 0.59), while several small molecules showed strong drug sensitivity on KD gene PFAS (effect from 0.48 to 0.51) (Fig. 5AE). Besides, we further evaluated whether KD genes could influence drug response across 23 gastric cancer cell lines from cancer genome project (CGP). Indeed, the drug response patterns of several KD genes (such as SPCK1, PFAS) in CGP were similar to those in CCLE (Fig. S3). Noteworthy, drug response patterns of multiple KD genes were complementary in two cell line models, such as NAA15, RNF139, and ETFD (Fig.5E, Fig. S3). This result suggested that it is necessary to use a combination of two cell line models to explore drug response [54]. In summary, we used cellular models to study the drug response mechanisms of KD genes on the transcriptional level. The effect of small molecule compounds on KD genes can guide researchers for new drug research and development, and has potential application value.
Clinical application of KD gene signatures in gastric cancer patients
Among the above results, we have identified 31 prognostic-related KD genes (Supplementary Table 1). In order to explore the global clinical application value of all KD genes, we constructed sample grouping signatures based on the transcriptional level of KD genes and verified it in multiple sets of data. In briefly, we grouped patients based on the expression of KD genes using hierarchical clustering method (see Materials and methods). We found that when 265 GC patients were clustered into 4 groups, the model was best classified (Fig. 6A, see Materials and methods). The number of patients in these four groups were 146, 63, 16 and 40 in group 1-4, respectively (Fig. 6B). By calculating the Pearson dissimilarity between the samples, we find that the distance inside the group was relatively close, and the distance between the groups was far, which was in line with our research (Fig. 6B).
To reveal the clinical benefits of KD gene signature in GCs, the log rank test were be used to explore the survival outcomes between patient groups. The KM curve showed that the patients within group 2 have the best disease-free survival (DFS) time (Fig. 6C). By observing the patient's genomic variation events, we found that the patients in group 2 mainly carry variations in KIAA0196 (also known as WASHC5), EIF3H, MRPL13, MTSS1, DERL1, and RNF139 (Fig. S4A). Compared with group 2 patients, the patients within group 1 presented significantly worse DFS (P= 0.02, log rank test; Fig. 6C), and these patients mainly carried genomic events in TRMT12, TRPA1, TCEB1, MRPS28, PON2 and SHFM1 (Fig. S4A). Also, both group 3 and group 4 have shown significantly worse DFS (Fig. 6C). However, the four groups of KD gene signature did not show significant prognostic efficacy on the patient’s overall-survival (OS) time (Fig. S6A). In addition, we found that group 3, with worst survival, showed a higher proportion of LAUREN intestinal class (proportion 87.5%) and WHO tubular class (68.8%) (Fig. 6DE). The other three groups all showed a lower proportion of LAUREN intestinal classes (58.9%, 77.8%, 60.0% for group 1, group 2, group 4, respectively) and WHO tubular class (41.1%, 55.6%, 40.0% for group 1, group 2, group 4, respectively). In contrast, group 3 shows a lower percentage of LAUREN diffuse class and WHO poorly cohesive class (both were 6.25%) (Fig. 6DE). We also counted other clinical features of GCs between these groups, and found that the patients in group 3 showed older age, fewer tumor cells, and more tumor lymphatic infiltration, however, these features did not show statistical significance (Fig. S5).
To further elucidate the clinical value of our KD genes in GCs, we used extra data sets for validation. Similarly, through the expression level of KD genes, we subtyped 443 patients into 4 groups based on hierarchical clustering (Fig. S6BC; see Materials and methods). The KM curve indicated that these four groups of KD gene signature have significant prognostic efficacy in patient’s overall-survival (OS) time (P= 0.016, log rank test; Fig. 6F). Compared with group 1, the patients within both other three groups showed significantly worse OS time (P= 0.011, 0.003 and 0.042 for group 2, group 3 and group 4, respectively). Using multivariate COX proportional hazard model to correct some clinical features, including age, tumor stage, and number of lymph nodes, we found that the patients within both other three groups were risk factors compared to group 1 (group 2 (HR= 1.22, 95% IC [0.77-1.92], P= 0.039), group 3 (HR= 1.75, 95% IC [1.07-2.88], P= 0.026), and group 4 (HR= 1.27, 95% IC [0.65-2.50], P= 0.042) compared to the patients within group 1, respectively; Fig. 6G). This shows that the signatures of our KD gene were independent prognostic factors. The COX model also showed, compared to tumor stage I, both stage III (HR = 2.83, 95% IC [1.52-5.27], p = 0.001) and stage IV (HR = 6.23, 95% IC [2.99- 12.97], P= 1e-04) were risk factors (Fig. 6G). In addition, the patient’s age were showed correlation with the patient’s OS time (HR= 1.03, 95% IC [1.01-1.05], P= 8e-04). Similarly, our results suggest that the patients within group 3, as a poorer OS time group, have higher proportion of patients who enriched in more malignant tubular intestinal type subclass (Fig. S6D), and more advanced TNM stage T3 and N2 (Fig. S6E). In summary, we used KD genes to construct the group signatures for gastric cancer patients and validated their association with patient survival in both inherent data sets and extra data sets, which suggested that the KD genes we identified can be used as prognostic biomarkers for further study by basic experimental and clinical researchers.