Identification of 664 GDEs shared by three GEO profiles
Three GEO cDNA profiles GSE44077, GSE18842 and GSE33532 were picked to analyze the GDEs in cancer vs. normal lung samples. And a whole of 1133, 4459 and 3775 GDEs including 691, 2505, 2351 down-regulated and 442, 1954, 1424 up-regulated genes were identified in GSE44077 (Figure 1A), GSE18842 (Figure 1B) and GSE33532 (Figure 1C) respectively. Additionally, 432 down-regulated and 232 up-regulated GDEs were shared among the three GEO profiles showed by Venn diagram performance (Figure 1D, 1E).
Pathway enrichment analysis of shared GDEs by GO and KEGG
To further understand the pathways 664 GDEs were mainly enriched in, GO and KEGG analysis were conducted. Interestingly, GO analysis showed that the cell components of 232 up-regulated GDEs were enriched in centrosome, microtubule and kinetochore (Figure 2A), and the molecular function were focused on metallopeptidase activity (Figure 2B). The biological process were mostly enriched in cell growth and maintenance, spindle assembly and chromosome segregation (Figure 2C). Moreover, KEGG/biological pathway analysis showed the up-regulated GDEs were mostly involved in cell mitotic and DNA replication (Figure 2D). Three of the four aspects including genes cell component, signaling pathways and biological process suggested the orientation of cell cycle mitotic process, indicating the potential value of cell division process in cancer targeting treatment.
Meanwhile, as for the 432 down regulated GDEs, the cell components were primary focused on cellular plasma membrane (Figure 2E), the molecular function were enriched in receptor activity and cell adhesion molecular activity (Figure 2F), and the biological process were mainly enriched in signal transduction and cell communication (Figure 2G). Additionally, KEGG/biological pathway analysis showed the down-regulated GDEs were mostly participated in hemostasis, cell surface interaction at vascular walls and Epithelial to Mesenchymal transition (EMT) (Figure 2H).
Function module analysis based on PPI network
To identify the potential responsible genes in NSCLC development, the PPI network of 664 GDEs was constructed with STRING, and the function modules of the GDEs were analyzed. Based on the PPI, top three gene modules were identified containing 69, 27 and 28 genes respectively (Figure 3A), and these three modules were named as Gene Cluster1 (Figure 3D), 2 (Figure 3B) and 3 (Figure 3C) accordingly.
GO and KEGG result revealed that most of the Cluster 1 genes were enriched in the cell cycle (31/69), DNA replication (22/69) and Mitotic M-M/G1 (20/69) related signaling (Figure 3E). All the signaling that Cluster1 genes enriched in were sorted in descending order based on the gene counts and FDR value (Table 1). We primarily focused on the top cell cycle regulation related module which matches most GDEs in the network, and we further perform survival analysis on all the 31 genes.
Survival analysis of Cluster 1 module genes
Univariate Kaplan Meier plot overall survival analysis of 31 cell cycle regulation genes in Gene Cluster 1 showed that 17 out of 31 genes statistical significantly correlates with patients overall survival, including 4 spindle assembly checkpoints BUB1 (Figure4A), NDC80 (Figure4C), MAD2L1 (Figure4E), and AURKA (Figure4G). And GEPIA was then used to validate genes’ gaped expression in NSCLC versus normal lung samples, and the results showed the gain of expression of all four genes in cancer comparing to normal samples (Figure 4B, 4D, 4F, 4H).
Further, multivariate cox regression analysis showed that patients age, p-stage, M status and NDC80 expression work as independent prognostic indicators in adenocarcinoma (Table 2), meanwhile, T stage, M status and MAD2L1 expression work as an independent indicators in squamous cell carcinoma (Table 3).
NDC80 and MAD2L1 association with NSCLC clinical features
To explore the clinical association between NDC80 and MAD2L1 expression with LUAD and LUSC clinical features, we used two methods. Firstly, the clinical information of 482 lung squamous cell carcinoma and 223 adenocarcinoma cases were downloaded from TCGA data (same information being used for COX regression analysis), and the results showed that NDC80 expression statistical significantly associates with LUAD patients age, smoking, and stage in adenocarcinoma, the gene tends to express higher in younger (<60years), smoker and higher stage patients (Table 4). And MAD2L1 expression statistical significantly associates with LUSC lympho node and distant metastasis, the expression was higher in patients with lympho node metastasis but no distant metastasis (Table 5).
Secondly, an online analysis service Ualcan which is also based on TCGA data (503 squamous cell carcinoma and 515 lung adenocarcinoma) was also used for data exploration (Figure 5A-5N), the result also revealed that NDC80 expresses higher in smoker than non smokers and the expression increases as the smoking years lasting longer (Figure 5D), and NDC80 tends to be higher in cases with lympho node netastasis (Figure 5G). Interestingly, bigger sample number also yields the discovery that both NDC80 (Figure 5C) and MAD2LI (Figure 5J) express higher in male than female patients, hypothetically, the gender association might be related to the fact that most smokers were man rather than woman.
NDC80 and MAD2L1 centered signaling pathways
The expression profile of NDC80 and MAD2L1 was analyzed in various tumors using GEPIA and we discovered that both NDC80 and MAD2L1 were broad-spectrum up-regulated in multiple human tumors including lung adenocarcinoma and lung squamous cell carcinoma (Figure 6A, 6F).
To understand the potential functions of NDC80 and MAD2L1, we performed GO and KEGG to analyze the biological processes the genes mainly participate in and the signaling pathways they involve. The result revealed an really interesting fact that even in different sub types of lung cancer (LUAD and LUSC), NDC80 and MAD2L1 shared biological functions. Both NDC80 (Table 6) and MAD2L1 (Table 7) were primarily focused on mitotic cell cycle regulation related processes, for instance cell division, chromosome segregation and spindle assembly regulating signaling.
Moreover, NDC80 and MAD2L1 centered PPI network showed a similar result that the genes NDC80 (Figure 6B) and MAD2L1 (Figure 6G) related were both cell cycle regulation involved including BUB1B and AURKA. GEPIA analysis confirmed the correlation between NDC80 and MAD2L1, BUB1B, AURKA in both LUAD (Figure 6C-6E) and LUSC (Figure 6H-6J).
Considering that great proportion of current chemotherapy drugs are developed based on their association with cell mitosis cycle, the correlation between NDC80, MAD2L1 and cell division process indicate the potential value these genes working as two other chemotherapy drug targets. However, more experiments and clinical trials will be needed to validate the hypothesis.