Identification of P-DEGs
To explore the key genes affecting the prognosis of GC patients and the roles of these genes play in the mechanism of GC progression. The gene expression matrix data obtained from GEO and TCGA databases were used to conducted multivariate analyses and survival estimation to screen out the genes that were significantly correlated with the prognosis of GC patients (P < 0.05). Subsequently, we obtained 997 and 805 genes related to the prognosis of GC in the GSE84427 and TCGA gene expression matrix data sets, respectively. Therefore 128 common P-DEGs were obtained by mutual validation between the two datasets, which means 128 of 997 genes in GSE84427 and 128 of 805 genes in TCGA databases (Fig. 1A).
Module analysis and centrality analysis of P-DEGs related PPI network
In order to study the molecular mechanism which can affect the prognosis of GC patients from a systematic perspective, we established a PPI network of P-DEGs to explore the molecular mechanism. The results showed that there were 124 nodes and 819 edges in the PPI network. Futhermore, we used the MCODE plug-in in Cytoscape software to analyze the modules existing for exploring more closely related genes in the PPI network. The results showed that there were four modules and 1 non-module in the PPI network, and scores of the four modules were as follows: 8.667 (module 1), 7.455 (module 2), 4.111 (module 3), and 2.667 (module 4), respectively. We found that the first module (module 1) was the most interactive area in the PPI network, which is located in the center of the whole network, including 16 nodes and 65 edges (Fig. 1B). Therefore, the protein interactions in module 1, which ranks the first, maybe the strongest, and most important part of the whole network. Module 1 was considered as the final result of the MCODE analysis. At the same time, to obtain GC prognosis related key genes in this complex PPI network, we use the centrality analysis method to analyze the PPI network. First, we used the CytoNCA plug-in to analyze the score of three parameters of each gene in the PPI network, which were degree, betweenness, and eigenvector. Then, we selected the genes with its score ranked top 5% in three parameters. Finally, we selected these genes which ranked top 5% in three parameters and showed in module 1 as key genes, which were MYLK, MYL9, LUM and CAV1, and they were all in module 1with high centrality (Fig. 1C).
Prognostic Value Of Key Genes In Gc Patients
To analysis the role of key genes in the progression of GC, the survival analyses of four genes of key genes were further analyzed through the K-M method. According to the median expression of the genes matrix, GC patients were divided into the high expression group and low expression group. The survival curve showed that the expression of MYLK, MYL9, LUM and CAV1 were significantly correlated with the 5-year survival rate and overall survival time of GC patients in GEO and TCGA databases (P < 0.05). According to the survival analyses, the median survival time of GC patients with lower expression of MYLK, MYL9, LUM and CAV1 were 1.37, 1.41, 1.35 and 1.42 years, with higher expression of MYLK, MYL9, LUM and CAV1 were 1.06, 1.08, 1.15 and 1.06 years in TCGA database, respectively. Compared with GC patients with lower expression of MYLK, MYL9, LUM and CAV1 (GEO, n = 217; TCGA, n = 190), these patients with high expression of key genes (n = 216, GEO; n = 190, TCGA) had significantly poorer prognosis (P < 0.05, Fig. 2). The results were verified through GEO gene matrix once again. According to the univariate and multivariate Cox regression analyses, the results of independent prognosis of key genes in the GEO and TCGA databases showed that the HR of MYLK, MYL9, LUM and CAV1 all presented as HR > 1, which were 1.15, 1.18, 1.19 and 1.31, respectively (P < 0.05). These results indicated that key genes can independently affect the prognosis of GC patients (Fig. 3). The influence of key genes is of great significance and has potential value as prognostic biomarkers and therapeutic targets for GC patients.
Go And Kegg Enrichment Analyses
To better elucidate the mechanisms of key genes affecting GC prognosis, we performed GO and KEGG enrichment analyses. Results of GO analyses showed that most GO-terms were significantly enriched in the extracellular matrix organization, extracellular structure organization, cell-substrate adhesion, tissue migration, muscle contraction, muscle tissue development, mesenchymal development, etc. (Fig. 4). Besides, the results of KEGG analyses showed that the related pathways were significantly enriched in focal adhesion, PI3K-Akt signaling pathway, ECM receptor interaction, cell adhesion molecules, proteoglycans in cancer, protein digestion and absorption, Cell cycle, calcium signaling pathway, etc (Fig. 5). These results indicated that key genes affect the prognosis of GC patients mainly through influencing the invasion, migration and cell cycle functions of GC cells.
Construction and validation of prognostic risk model of key genes
Based on multivariate Cox regression analysis, key genes (MYLK, MYL9, LUM and CAV1) were integrated, and prognostic risk model of key genes was established according to GEO and TCGA data respectively. The risk score of key genes was calculated using the formula mentioned in the method, processes were as follows: risk score = (HR (MYLK) × MYLK expression level) + (HR (MYL9) × MYL9 expression level) + (HR (LUM) × LUM expressionival rate, risk score and clinical features of GC patients can be estimated based on the total points. The results showed that the risk score is very high in both GEO and TCGA databases (risk score is about 10 in GEO and 9 in TCGA, Fig. 4), which confirmed that key genes may be potential therapeutic targets for prognosis.
To verify the reliability of key genes, GC patients were divided into the low-risk group and high-risk group according to the median risk score in TCGA and GEO databases, respectively. The survival curves showed that the prognosis of the high-risk group was worse than that of the low-risk group (Fig. 5, P < 0.05). With the risk score increasing, the number of patients’ death increases (Fig. 5). Univariate and multivariate Cox regression analyses were performed based on the gene matrix data, the results of which showed that the risk score of key genes was independently correlated with the overall survival rate of GC patients (Table 1, P < 0.05). These results indicated that the key genes can be a significant reference to the prognosis of GC patients. The key genes can be used to guide the next step of treatment after surgery or/and chemoradiotherapy treatment. MYLK, MYL9, LUM and CAV1 can be potential targets to improve the prognosis of GC patients.