DEGs identification
To identify the DEGs between normal tissues and NPC tissues, we analyze the data of GSE12452, GSE53819 and GSE64634 datasets. Then, the volcano plots of DEGs were shown in Fig.2. A total of 276 (52 upregulated and 224 downregulated), 638 (188 upregulated and 450 downregulated) and 441 (67 upregulated and 374 downregulated) genes were identified, respectively. With the online tool Venny for the integrated analysis, 91 genes were overlapped among three datasets, including 5 upregulated and 86 downregulated genes (Fig.3). These 91 overlapping genes were confirmed as candidate DEGs and employed for further analysis.
GO term and KEGG pathway analysis
The top enriched GO terms and KEGG pathways were listed in Fig. 4. As Fig.4 shows, the 91 overlapping genes were highly associated with pathways including drug metabolism-cytochrome P450 pathway, metabolism of xenobiotics by cytochrome P450 pathway and chemical carcinogenesis pathway. With the analysis of GO terms, these genes were largely enriched in cell component (CC) including ciliary part, motile cilium, axoneme, microtubule and ciliary plasm. The enriched biological process (BP) notations included cilium organization, axoneme assembly and cilium movement.
Protein–protein interaction (PPI) network construction
The DEG expression PPI network was showed in Fig.5. The PPI network contained 91 nodes and 9 DEGs were recognized as hub genes (see Table 1) with node degree≥10. In addition, in this PPI network, we observed three significant models after MCODE analysis in Cytoscape and we choose the most significant module with MCODE score = 7.7 and number of nodes =9. The module revealed that seven of the nine genes (DNALI1, RSPH1, SPAG6, ARMC4, DNAI2, DNAH9, and RSPH4A) were belong to the hub genes. With regards to the functions, these DEGs were closely involved the cilium movement, ciliary part and cilium organization, in which DNALI1, TEKT1,ARMC4, DNAI2, DNAH9, RSPH4A,SPAG6 and RSPH1 were highly enriched19-27. ARMC4 is a top tumor suppressor gene and mutation occurs in breast cancer28. SNTN is a kind of apical structure protein and its abnormal expression was verified to be related to pathological and cancerous phenotypes29. These findings indicated that these tightly interactional genes associated molecular pathways and might activate pathogenesis of NPC.
Classifier construction
We built the RF classifier using the function in “RandomForest” package for R and added these DEGs to the classifier one by one in order of importance (supplementary table 1). For a variable, the higher decrease in Gini and the higher MeanDecreaseAccuracy measure, the greater is its importance in the classification process. The top 30 DEGs obtained from both rankings showed in Figure 6C. As Fig. 6A shows, the top 2, including CLIC6 and CLU, are the best predictors of the classifier, and have a good prediction power(AUC:0.985). The chloride intracellular channel (CLIC) family contains six homologs in human (CLIC1-6) which plays a critical role in the processes of human cancer. A previous study suggested that the expression level of CLICs (CLIC1,4,5 and 6) were significantly correlated with histological tumor grade in breast cancer30. Clusterin (CLU) was regulated by N,N'-Dinitrosopiperazine (DNP) and promotes NPC metastasis31. In the case of unknown clinical information, it is also very good at identifying tumors and healthy tissues based on two-dimensional representation of unsupervised classification analysis (Fig. 6B).
Identification of differentially methylated CpG sites (DMCs)
To our knowledge, the distribution of DMCs and their influence on key functional genomic elements including CpG islands in NPC are poorly defined. We studied the global methylation patterns of 91 DEGs across gene sub-regions and methylated genomic regions. CpG island hypermethylation is a landmark epigenetic modification of cancer. In our study, 3264 DMCs were identified from NPC tumor genomes, among of them, 2599 were hypermethylated CpG sites (hyper-CpGs) and 665 were hypomethylated CpG sites (hypo- CpGs). The number of hyper-CpGs was greater than the hypo- CpGs in NPC tiusses32. As the Fig.7 shows that 1132 hyper-DMCs were enriched the body, and 1959 of hyper-CpGs were distinctly enriched with CpG islands (79.80%), but very few hyper- CpGs located in the shelfs. Our analyses also suggested that the regions close to TSS (TSS1500, TSS200, and 5′UTR) and the CpG islands (shores), in general, have more DMCs in NPC tumor tiusses. In addition, the distribution of few hypo- CpGs were comparatively even among shelfs, s-shore and CpG islands, except for higher in N-shore. Expectedly, we observed that the CpG island and its neighborhood location (shores) were predominantly hypermethylated which is similar with an early study32,33. The promoter regions (TSS, 1stExon and 5′UTR) had nearly twice hyper-CpGs (63.41% vs 36.59%) compared to outside of promoter regions (body and 3′UTR), similar observations were also reported by Nitish Kumar Mishra et.al34. Hyper-CpGs in these promoter regions lead directly to the genetic inactivation of tumour-suppressor genes and silence transcriptional initiation35. This finding indicated that hypermethylation related genes may were highly associated with the cell division, cell cycle and development of NPC. Finally, by comprehensive analysis of DMCs and DEGs, we observed 5 differential methylation genes (DMGs), including CLDN10, CLIC6, LRRC10B, SORBS2 and UBXN10, which both hypermethylated and downregulated. The correlation analysis between the 5 DMGs and DMCs was generalized in Table 2.