Identication of novel prognosis biomarkers through integrating multi-omics data in gastric cancer

Background: Gastric cancer is a fatal gastrointestinal cancer with high morbidity and poor prognosis. The dismal 5-year survival rate urges reliable biomarkers to assess and improve prognosis of gastric cancer. Distinguishing driver mutations that were required for the cancer phenotype from passenger ones poses a formidable challenge for cancer genomics. Methods: We integrated the multi-omics data of 293 primary gastric cancer patients from The Cancer Genome Atlas (TCGA) projects to identify key driver genes by establishing the prognosis model of the patients. The combination of copy number alteration and somatic mutation analysis helped us to comprehensively reveal molecular markers of genomic variation. Integrating the transcription level of genes provides a unique perspective for us to discover dysregulated factors in transcriptional regulation. Results: The results, we identied 31 molecular markers of genomic variation comprehensively. For instance, the copy number alteration of WASHC5 (also known as KIAA0196) frequently recurrence in gastric cancer patients, which cannot be discovered using traditional methods based on signicant mutations. Further, we revealed several dysregulatory factors played a "hubs" regulatory role in the process of biological metabolism based on dysregulatory networks. Cancer hallmark and functional enrichment analysis showed that these key driver genes (KD genes) played a vital role in regulating programmed cell death. The drug response pattern and transcriptional signatures of KD genes reected its clinical application value. Conclusions: These ndings indicated that the KD genes could serve as novel prognostic biomarkers for further research on the pathogenesis of gastric cancers, while elucidating a multidimensional and comprehensive genomic landscape and highlighting the molecular complexity. This study integrates multi-omics datato discover novel driver molecules and their dysregulatory mechanisms based on copy number alteration, somatic mutation analysis and transcription level. We revealed clinical application value of KD genes through drug response pattern and transcriptional signatures. These results will pave the way towards understanding the potential mechanisms that govern the GC progression, which will be useful in clinical practice and might prompt the development of novel therapeutic target for GC patients.


Background
Gastric cancer (GC) is the fourth most common and remains the second cause of death of all malignancies worldwide [1,2]. Accumulating advances in the early diagnosis and treatment of GCs contribute to timely curative resection or chemotherapy for patients [3,4]. However, Metastasis and recurrence of GCs gradually formed due to tumor evolution, resulting in a poor prognosis with dismal 5year survival rate, which is only about 29.3% [3,4]. Genetic factors play an important role in GCs due to genomic variation and aberrant gene expression, leading to a malignant phenotype [5,6].
Several major cancer sequencing projects, such as The Cancer Genome Atlas (TCGA), the International Cancer Genome Consortium (ICGC) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET), have created a comprehensive catalog of genomic variations across all major cancer types, including GC [7,8]. Cancer genomics has produced extensive information on cancerassociated genes. In recent decades, basing on rapid development of high throughput technology, numerous biomarkers have been identi ed and applied to targeted treatment [9,10], such as HER2 (human epidermal growth factor receptor type 2) in breast cancer and EGFR (epidermal growth factor receptor) in lung cancer [11]. Driver mutations are required for the cancer phenotype, whereas passenger mutations are irrelevant to tumor development and accumulate through DNA replication [7,12]. Although many bioinformatics tools dedicated to driver genes identi cation have been developed [13,14], the number and speci city of cancer-driver genes remains a matter of debate, thus, distinguishing driver genes from passenger ones poses a formidable challenge for cancer genomics [15].
Owing to the clinically and genetically heterogeneity of cancer, the current screened driver genes of GCs are far from preventing and treating this fatal disease. Therefore, it is of critical to identify more reasonable biomarkers for assessing the response to therapy and predicting prognosis in GC patients. In this study, we integrated multi-omics data of GCs from TCGA cohort to identify prognostic-related key driver genes (KD genes), which drive the development and progression of GCs. We reveal the biological functions of KD genes, like programmed cell death, and the clinical characteristics, including the drug response patterns and the prognostic e cacy of expression signatures in GC patients.

Data source
We obtained multi-omics data of 293 TCGA primary gastric cancer patients from cBioPort data sources (https://www.cbioportal.org/), including genome-wide human SNP array 6.0 copy number alteration (level 3), IlluminaGA DNASeq mutation (level 2), IlluminaHiSeq RNASeqV2 mRNA expression (level 3) and miRNA expression (level 3), as well as clinical data of 265 patients. In addition, we acquired two data sets of gene expression pro le matched by the disease and normal samples, as well as corresponding clinical data, from Gene Expression Omnibus (GEO) database (accession code were GSE13911 and GSE29272, the sample size is 69 and 268, respectively). The relationships of transcriptional factor (TF) targeting mRNA was from Transfac [16], UCSC [17] and Chipbase [18], while that of miRNA regulating mRNA was from miTarbase [19] and Starbase [20]. Based on previous study [21], we found there were no major batch effects for the expression data or copy number data. In addition, we downloaded an extra data for gastric cancer patients from the Firhose database (https://gdac.broadinstitute.org/).

Building binary genomic variation pro le of coding genes
For mRNA expression data from RNAseqV2, we performed log 2 (RESM+1). For DNA copy number alteration (CNA), we retained CNA gain and loss, as well as high-level ampli cation and homozygous deletion discretized by GISTICv2 [22]. For mutation, silent (synonymous) substitutions were discarded.
We built a binary somatic CNA pro le of protein coding genes, where copy number altered was 1 and wild-type was 0. By integrating the CNA spectrum and the genomic mutations, a binary spectrum of the genomic variation of the coding genes are formed (1, 0 represents the variation and the wild type, respectively). Four requirements were for candidate key genes selection. (i) gene should have a dominant CNA type (ampli cation or deletion, binomial test, p≤0.05); (ii) RSEM of gene was above 0 in more than 75% of cancer samples; (iii) gene should have concordant changes between CNA and expression, that is, copy number ampli cation upregulates its own gene expression level, and copy number deletion downregulates expression level; (iv) in order to improve the accuracy of the calculation, we require the frequency of genomic variation to exceed 0.1.

Identi cation of prognosis-related Key Driver genes (KD genes)
We have downloaded two datasets of matched disease and normal sample gene expression pro les from GEO (GSE13911 and GSE29272), and performed the differential expression analysis using R package DESeq2 at FDR≤0.05 and pvalue ≤0.01. Then, we obtained the intersection of the differentially expression genes and the candidate key genes with genomic variation, and then used the clinical data in cBioPort to train clinical model. Considering the genomic variation and patient's survival information, including overall survival (OS) and disease-free survival (DFS), we screened for the prognostic genes whose genomic variation has an impact on patients survival (in both types of survival, OS and DFS, the minimum pvalue is required to be less than 0.05) used log rank test. Combining the above candidate key genes, we determined prognostic related key driver genes (KD genes).

Construction of transcriptional disregulation network
The experimental con rmation data of miRNA-target gene regulatory relationships from known miRNAtarget interaction databases (miTarbase and Starbase), and the experimental con rmation and prediction data of the regulatory relationship between TF and target genes from known TF-target interaction databases (Transfac, UCSC and Chipbase). Based on known knowledge of expression regulation [23], only regulatory relationships that satis ed the following requirements were remained: 1) miRNA's expression has negative (Pearson's correlation coe cient<0, FDR<=0.05) regulatory effect on its target gene's expression; 2) The relationship between TF and target gene must be signi cant (FDR<=0.05).
Benjamini & Hochberg correction were be used for among all relationships between regulators (miRNAs and TFs) and KD genes.

Functional enrichment analysis of KD genes
A web server g:Pro ler can be used to genes functional enrichment analysis, including Gene Ontology terms (like molecular function (MF), biological process (BP) and cellular component (CC)) and pathways from KEGG, Reactome and WikiPathways. This web server can also be used to gene set enrichment analysis, including miRNA targets from miRTarBase and regulatory motif matches from TRANSFAC. We inputted KD genes list on g:Pro ler web server, and used FDR < 0.05 to screen the functions and target factors (miRNAs and TFs) , which were thought to be enriched by KD genes.

Drug sensitivity analysis
Gastric cancer drug sensitivity data were generated from ongoing high-throughput screening performed by the Cancer Cell Line Encyclopedia (CCLE) from Broad Institute [24] and the Cancer Genome Project (CGP) at the Wellcome Trust Sanger Institute [25]. The values we obtained on the websites include the half maximal inhibitory concentration (IC50), genes expression level across cancer cell lines for each experiment. Totally, we acquired 24 anticancer compounds screened across 38 gastric cancer cell lines and 251 compounds across 25 gastric cancer cell lines from CCLE and CGP, respectively. The effect measures are the spearman's rank correlation coe cient between the drug IC50s and gene expressions (for example, an effect of -0.5 or 0.5 indicates decrease or increase in drug concentration, respectively).

KD genes construction grouping signatures
Using the Pearson correlation coe cient of all KD genes expression level, we measured the similarity of the patients with gastric cancer. Based on the matrix of the Pearson similarity, the patients were classi ed by using hierarchical clustering. To determine the optimal number of categories for the gastric cancer patients, we calculated the Elbow method for the number of categories from 1 to 10. The Elbow method was calculated the total within-cluster sum of square (WSS) using R function fviz_nbclust from packages "factoextra" and "NbClust". The WSS value was consistent with the classi cation performance of the model.

Results
Identi cation of candidate genes driving the development of gastric cancer based on multi-omics data Cancer genomics has produced extensive information on cancer-associated genes [26]. Driver mutations of cancer confers tumor cell growth advantages during carcinogenesis and disease progression, however, distinguishing driver mutations from passenger ones poses a formidable challenge for cancer genomics [7,15]. In order to identify the driver events in GCs and explore its effects in the progression of the tumor, we analyzed multi-omics data of 293 patients from the cBioPort data resource (see Materials and methods), including genomic data (copy number alteration and somatic mutation) and transcriptome data (the expressions of mRNAs, miRNAs and TFs) ( Table 1). We constructed a genomic variation binary spectrum of protein-coding genes in GCs by pre-processing multi-omics data and integrating gene CNA, somatic mutation and gene expression ( Fig. 1; see Materials and methods). After multiple steps of screening, we obtained a total of 2318 candidate genes, which were accounted potentially driving the development and progression of GCs ( Fig. 1, Fig. 2A). In the process of integrating multi-omics data, we required that the candidate genes should have concordant changes between copy number alteration (CNA) and mRNA expression. For example, the candidate gene DERL1 was recognized to be copy number ampli cation, which have signi cantly higher mRNA expression level in patients with CNA compared to wild-type patients (p <= 0.001, two-sided Wilcoxon's rank-sum test; Fig. 2BD). Conversely, for candidate gene NAA15, patients with copy number deletion have signi cantly lower mRNA expression levels compared to wild-type patients (p <= 0.001, two-sided Wilcoxon's rank-sum test; Fig.  2CE). As a result, the gene size decreases rapidly with the sample size increases when implemented the step that CNA concordant affects mRNA expression itself (Fig. 2AF). After integrating CNA and mutation pro le, we found that the candidate genes have high variation frequency, which ranged from 16.9% to 69.8% (the average is 39.1%), and the variant samples size of most genes were between 11 and 20 ( Fig.   2G). Also, we found that the number of genes with more than 30 variant samples reached 38 (Fig. 2G). This suggests that adding somatic mutation information can help us to conduct a more comprehensive genomic variation research.
Determining prognostic-related key driver genes (KD genes) Studies have shown that cancer driver mutations can provide tumor cell with growth advantage, thus contribute to tumor initiation, progression or metastasis [27,28]. The key driver genes should have an impact on the patient's survival time. Therefore, it is urgent to identify key driver genes from numerous tumor genome events. First, we obtained two expression pro le datasets of GCs from the GEO database (GSE13911 and GSE29272), and performed gene differential expression analysis (see Materials and methods in detail). Hierarchical clustering analysis showed that candidate genes in GC patients were signi cant expression disorders compared to normal patients (Fig. S1AB). By overlapping the differentially expressed genes with the above-identi ed candidate genes, we obtained the candidate driver genes with different expression levels of tumor tissues and normal tissues (388 in total). For each candidate driver gene, the cancer samples were split into two groups according to its copy number status (one group with CNA and the other without). Then, we used the clinical data of GCs for survival analysis based on CNA status (see Materials and methods in detail). Finally, we have obtained a total of 31 prognostic-related key driver genes (KD genes) (Supplementary Table 1). For example, the patients with KD gene ABCE1 CNA (deletion) had signi cantly better disease-free survival (DFS) (p < 0.0096, Log-rank test; Fig. 3A). In contrast, patients with KD gene SHFM1 CNA (ampli cation) had signi cantly worse overall survival (OS) (p < 0.033, Log-rank test; Fig. 3B). Further, we mapped the genomic variation landscape of the 31 prognosis-related KD genes in GC patients (Fig. 3C). We found that these KD genes have high genomic variation frequencies (including CNA and somatic mutation). For example, KIAA0196 (also known as WASHC5) has the highest genomic variation frequency (65.5%), which exhibited copy number ampli cation in 189 patients (where 23 patients were high-level ampli cation), and somatic mutations occurred in 12 patients (Fig. 3C). In addition, the variation frequency of KD gene ABCE1 (ATPbinding cassette E1) was 42.2%, which included copy number deletion in 103 patients (all were copy number loss) and somatic mutations occurred in 5 patients (Fig. 3C). ABCE1 is a member of the ATPbinding cassette transporters and regulates a broad range of biological functions including viral infection, cell proliferation, and anti-apoptosis. Previous research have shown ABCE1 plays an essential role in lung cancer progression and metastasis [29]. In our study, however, ABCE1 as a key driver gene associated with prognosis in GCs.

Construction of transcriptional dysregulatory networks of KD genes
Recent studies have con rmed that regulatory factors can affect the transcriptional activity of proteincoding genes, and thus contribute to disease [30,31]. Regulatory factors miRNA functions in RNA silencing and post-transcriptional regulation of gene expression, which via base-pairing with complementary sequences within mRNA molecules [32]. Transcriptional factors that are activators can help turn speci c genes "on" or "off" by binding to nearby DNA [31]. For example, the oncogenic TF TAL1 can produce a modi ed autoregulatory circuitry that drives the oncogenic program in T-cell acute lymphoblastic leukemia [31]. Thus, KD genes should have closely functional association on biological network with regulatory factors. We obtained experimental and/or predicted miRNA-target gene pairs and TF-target gene pairs from known databases ( Table 2; see Materials and methods in detail). After calculation, we determined the regulatory factors (TFs and miRNAs) with signi cant regulatory coe cients for each KD gene (Fig. S2AB). Simultaneously, using the regulatory relationship between regulatory factors and KD genes, we constructed a transcriptional dysregulatory network ( Fig. 3D; see Materials and methods in detail). Consistent with previous studies, we found that regulatory factors, particularly miRNAs, play a crucial role in cell growth and development [33,34]. For example, KD gene SPOCK1 was regulated by 12 miRNAs, including miR-7-5p, miR-155-5p, miR-326, and miR-107 (Fig. 3DE).

Functional mechanism portrays of KD genes and dysregulatory factors
In order to characterize the molecular mechanisms of KD genes, we rst use the g:pro ler online tool for functional enrichment analysis (see Materials and methods in detail). As a result, our KD genes were enriched in many types of biological functions, including Gene Ontology (MF, BP, CC), pathway (REAC and WP) (Fig. 4A). In details, KD genes were enriched in apoptosis-associated functions, like "Apoptosis", "Programmed cell death", "Apoptosis-related network due to altered Notch3 in cancer", and "Apoptosis induced DNA fragmentation", etc.. (Fig. 4B). In addition, several KD genes were enriched in immuneassociated function, such as "Antigen processing: Ubiquitination & Proteasome degradation" (Fig. 4B). To further characterize our results, we sought to characterize cancer hallmark landscape of KD genes. In brief, we calculated the semantic similarity score between KD genes-related GO terms and known cancer hallmark-related GO terms [46]. We found that the semantic similarities between our KD genes and the of apoptotic-related cancer hallmark, "Evading Apoptosis", were 0.35. Also, the semantic similarities with two immune-related cancer hallmark ("Evading Immune Detection" and "Tumor Promoting In ammation") were 0.3 and 0.48, respectively (Fig. 4C). For cancer hallmark "Genome Instability and Mutation", the semantic similarity with KD genes is the highest, reaching 0.65 (Fig. 4C), which indicates that genomic variation of KD genes have important functional mechanisms, including genome instability occurred, and thus play a carcinogenic role in biology [47]. Interestingly, our KD genes were enriched with the functions related to the synthesis and secretion of gastric hormones, such as "Synthesis, secretion, and deacylation of Ghrelin" (Fig. 4C). Ghrelin is an endogenous peptide hormone mainly produced in the stomach. Previous research have shown that ghrelin can be a promising therapeutic option for cancer cachexia [48]. In addition, KD genes are also enriched with functions related to cell growth, development and metabolism, such as "Biosynthetic process", "Negative regulation of biological process" and "Cellular macromolecule metabolic process".
Our above results have stated that the regulators of KD genes plays an important regulatory role in the carcinogenesis. Using g:pro ler, we also observed that KD genes were enriched in the regulatory relationships with TFs and miRNAs (Fig. 4A). In order to study the functional mechanism of regulators, we used the enrichment analysis of KD genes to further characterize. By comparing the regulators we identi ed with the TFs and miRNAs enriched by KD genes, we found that 4 TFs and 5 miRNAs were con rmed by enrichment (Fig. 4D) . While miR-139-5p (P= 0.028) acts as a con rmed miRNA by enrichment, it also regulated two KD genes HMGB2 and DERL1 ( Fig. 3D; Supplementary table S2). In addition, miR-30a-5p (P= 0.021) regulated KD genes SPCS3, PPID and DERL1 ( Fig. 3D; Supplementary table S2). The miRNAs con rmed by KD gene enrichment also includes miR-26b-5p (P= 0.013), miR-32-5p (P= 0.037) and miR-186-5p (P= 0.044) (Fig.  4D). In summary, functional enrichment analysis indicated that the KD genes were involved in vital biological functions, and further demonstrated that the regulators can be used as potential biomarkers for further experimental studies.

Drug response effects in preclinical cell models of GC
To explore the potential effects of KD genes on drug response, we evaluated whether their expression level could in uence drug response across 38 preclinical cell models of GC from Cancer Cell Line Encyclopedia (CCLE). We found that multiple KD genes presented strong correlations with the drugs response in GC cells (  (Fig. 5A-C). This result indicated that these KD genes expression levels could enhance the resistance of irinotecan in GC cells. Studies have shown that patients with advanced gastric cancer are often treated with irinotecan monotherapy as salvage-line therapy [49].
Sorafenib is a multi-kinase inhibitor with activity against angiogenesis and RAF-MEK-ERK pathway, it could inhibit proliferation of human gastric cancer cell line, and may reverse resistance to cisplatin through down-regulating MDR1 expression [50]. In our study, however, the drug response of sorafenib in GC cells showed a signi cant negative correlation with the expression level of KD gene NAA15 (R= -0.6, P= 0.0092) (Fig. 5A-B). Moreover, paclitaxel, as a widely used anticancer drug, exhibited multiple response patterns at the expression level of KD genes. For instance, paclitaxel showed strong resistance in GC cells with upregulated SPOCK1 expression (R= 0.59, P= 0.01), while showing strong sensitivity in GC cells with upregulated HMGB2 (R= -0.67, P= 0.002) (Fig. 5AB and Fig. 5D). Studies have shown that nab-paclitaxel as second-line treatment in locally advanced inoperable or metastatic gastric and gastroesophageal junction carcinoma is an active chemotherapy regimen [51,52].
Interestingly, the KD genes we identi ed have response effects in multiple anticancer drugs (Fig. 5AE). For example, in addition to sorafenib, RAF265 (R= -0.67, P= 0.0023), Nultlin-3 (R= -0.65, P= 0.0037) and L-685458 (R= -0.64, P= 0.0042) all showed strong drug sensitivity in GC cells with upregulated NAA15 expression (Fig. 5AE). However, the small molecule compound ZD-6474 showed signi cant drug resistance at the expression level of NAA15 (R= 0.48, P = 0.045). In addition to irinotecan, multiple drugs showed strong resistance at the expression level of KD gene DERL1. For instance, drug responses, including AZD6244 (R= 0.55, P= 0.019), PD-0325901 (R= 0.54, P= 0.020) and panobinostat (R= 0.49, P= 0.041), were presented signi cant positive correlation with DERL1 (Fig. 5AE). Study shown that panhistone deacetylase inhibitor panobinostat sensitizes gastric cancer cells to anthracyclines via induction of CITED2 [53]. Moreover, the drug response of multiple small molecules exhibited strong drug resistance on KD gene SPOCK1 (response effects from 0.47 to 0.59), while several small molecules showed strong drug sensitivity on KD gene PFAS (effect from 0.48 to 0.51) (Fig. 5AE). Besides, we further evaluated whether KD genes could in uence drug response across 23 gastric cancer cell lines from cancer genome project (CGP). Indeed, the drug response patterns of several KD genes (such as SPCK1, PFAS) in CGP were similar to those in CCLE (Fig. S3). Noteworthy, drug response patterns of multiple KD genes were complementary in two cell line models, such as NAA15, RNF139, and ETFD (Fig.5E, Fig. S3). This result suggested that it is necessary to use a combination of two cell line models to explore drug response [54]. In summary, we used cellular models to study the drug response mechanisms of KD genes on the transcriptional level. The effect of small molecule compounds on KD genes can guide researchers for new drug research and development, and has potential application value.

Clinical application of KD gene signatures in gastric cancer patients
Among the above results, we have identi ed 31 prognostic-related KD genes (Supplementary Table 1). In order to explore the global clinical application value of all KD genes, we constructed sample grouping signatures based on the transcriptional level of KD genes and veri ed it in multiple sets of data. In brie y, we grouped patients based on the expression of KD genes using hierarchical clustering method (see Materials and methods). We found that when 265 GC patients were clustered into 4 groups, the model was best classi ed (Fig. 6A, see Materials and methods). The number of patients in these four groups were 146, 63, 16 and 40 in group 1-4, respectively (Fig. 6B). By calculating the Pearson dissimilarity between the samples, we nd that the distance inside the group was relatively close, and the distance between the groups was far, which was in line with our research (Fig. 6B).
To reveal the clinical bene ts of KD gene signature in GCs, the log rank test were be used to explore the survival outcomes between patient groups. The KM curve showed that the patients within group 2 have the best disease-free survival (DFS) time (Fig. 6C). By observing the patient's genomic variation events, we found that the patients in group 2 mainly carry variations in KIAA0196 (also known as WASHC5), EIF3H, MRPL13, MTSS1, DERL1, and RNF139 (Fig. S4A). Compared with group 2 patients, the patients within group 1 presented signi cantly worse DFS (P= 0.02, log rank test; Fig. 6C), and these patients mainly carried genomic events in TRMT12, TRPA1, TCEB1, MRPS28, PON2 and SHFM1 (Fig. S4A). Also, both group 3 and group 4 have shown signi cantly worse DFS (Fig. 6C). However, the four groups of KD gene signature did not show signi cant prognostic e cacy on the patient's overall-survival (OS) time (Fig.   S6A). In addition, we found that group 3, with worst survival, showed a higher proportion of LAUREN intestinal class (proportion 87.5%) and WHO tubular class (68.8%) (Fig. 6DE). The other three groups all showed a lower proportion of LAUREN intestinal classes (58.9%, 77.8%, 60.0% for group 1, group 2, group 4, respectively) and WHO tubular class (41.1%, 55.6%, 40.0% for group 1, group 2, group 4, respectively).
In contrast, group 3 shows a lower percentage of LAUREN diffuse class and WHO poorly cohesive class (both were 6.25%) (Fig. 6DE). We also counted other clinical features of GCs between these groups, and found that the patients in group 3 showed older age, fewer tumor cells, and more tumor lymphatic in ltration, however, these features did not show statistical signi cance (Fig. S5).
To further elucidate the clinical value of our KD genes in GCs, we used extra data sets for validation. Similarly, through the expression level of KD genes, we subtyped 443 patients into 4 groups based on hierarchical clustering ( Fig. S6BC; see Materials and methods). The KM curve indicated that these four groups of KD gene signature have signi cant prognostic e cacy in patient's overall-survival (OS) time (P= 0.016, log rank test; Fig. 6F (Fig. S6D), and more advanced TNM stage T3 and N2 (Fig. S6E). In summary, we used KD genes to construct the group signatures for gastric cancer patients and validated their association with patient survival in both inherent data sets and extra data sets, which suggested that the KD genes we identi ed can be used as prognostic biomarkers for further study by basic experimental and clinical researchers.

Discussion
To date, although many bioinformatics tools dedicated to driver mutations identi cation have been developed [13,14], distinguishing driver mutations from passenger ones poses a formidable challenge [15]. Therefore, it is quite urgent to identify cancer drivers and understand it from a functional level. In this study, we integrated multi-omics data for identifying the cancer drivers and their dysregulatory factors in patients with gastric cancer. Applying it to 293 patients from TCGA, we identi ed 31 prognostic-related key driver genes (KD genes). By means of functional enrichment analysis of KD genes, we characterized their affected cancer hallmarks and the biological functions their involved, such as programmed cell death and antigen processes. The drug response pattern and transcriptional signatures of KD genes re ect its clinical application value. Combining DNA copy number alteration and mutation can help us avoid the limitations of traditionally identifying driver events. This study proves the integration of multiomics data enable to discover novel driver molecules and their dysregulatory mechanisms during tumorigenesis.
Cancer genes generally induce deregulated by their regulators, and exert driver roles in cancer. Based on the regulatory relationships between regulators and target genes [23], we constructed a transcriptional dysregulatory network of KD genes ( Fig. 3D; see Materials and methods in detail). The dysregulatory factors of KD genes play a crucial role in cell growth and development [33,34]. In our study, KD gene SPOCK1 was regulated by miR-155-5p, which can form a regulatory feedback loop with STAT1 and might trigger cancer immunoediting in order to allow tumor cells to escape from immunosurveillance and even to promote tumorigenesis [36]. In addition, the transcription of KD gene ETFDH were up-regulated by TF SREBF1. Previous researches have shown SREBF1 is a key regulator of fatty acid metabolism and plays a pivotal role in the transcriptional regulation of different lipogenic genes that mediate lipid synthesis, which acts as a cancer promoter in human diseases [42,43]. Our study also shown that both miRNAs and TFs play a "hubs" regulatory role in the dysregulation of KD genes, which suggested that dysregulatory factors play a crucial role in the process of biological metabolism. Therefore, studying these dysregulatory factors may facilitate discovery of biomarkers.
During process of genomic variation and natural selection, several driver events exhibited different combination mutational patterns to drive cancer formation, and formed evolutional dependence. These evolutional dependency drivers are always highly functional associated, such as participating in similar biological processes and mediating pathway crosstalk, and corporately promote clonal expansion or selective sweep [28,55]. Functional enrichment analysis results show that some KD genes are enriched apoptosis-associated (like "Programmed cell death" and "Apoptosis-related network due to altered Notch3 in cancer") and immune-associated function (like "Antigen processing: Ubiquitination & Proteasome degradation"), which was related to the corresponding cancer hallmarks, including "Evading Apoptosis", "Evading Immune Detection" and "Tumor Promoting In ammation", respectively. In addition, these KD genes offered shed new insights into molecular mechanisms and provided novel prognostic and drug response potential for clinical practice. These ndings suggested that the KD genes and dysregulated factors we identi ed by integrating multi-omics data could have important implications for understanding cancer evolution as well as for diagnostic and therapeutic approaches, and might play crucial roles and are worthy to be further explored.

Conclusions
This study integrates multi-omics datato discover novel driver molecules and their dysregulatory mechanisms based on copy number alteration, somatic mutation analysis and transcription level. We revealed clinical application value of KD genes through drug response pattern and transcriptional signatures. These results will pave the way towards understanding the potential mechanisms that govern the GC progression, which will be useful in clinical practice and might prompt the development of novel therapeutic target for GC patients.

Competing interests
The authors declare that there is no con ict of interest regarding the publication of this paper.

Funding
This work was supported by the Key Research Project from The Fourth A liated Hospital of Harbin Medical University (HYDSYYZ201507).
Authors' contributions LWZ and LGW contributed to conceive and design the research. NNL and YW performed data analysis and wrote the paper. WPC and YXW participated in the design of the study and mathematical analysis of data. All authors read and approved the nal manuscript. Figure 1 The overview Step1, The genomic variation spectrum of gastric cancer patients were constructed using TCGA multi-omics data (including copy number alteration, mutation and mRNA expression level), and the candidate genes were screened. Step2, Identi cation of key driver genes (KD genes) related to prognosis in gastric cancer patients based on expression data and clinical data. Step3, The transcriptional dysregulatory network of gastric cancer patients was constructed based on the relationship between known regulatory factors (miRNAs and TFs) and target genes.