3.1 Differentially Expressed Genes (DEGs) and Enriched pathways
A total of 126 tumor and 13 normal matched samples were included in DEGs analysis. DEGs analysis was performed using R package ‘edger’ and 3728 genes (1526 upregulated and 2202 downregulated genes) were identified with P.value<0.05 and log2foldchange>1 as the cutoff criterion (Figure 1A-B). The locations of the 100 DEGs with the lowest P.value were visualized on the human chromosomes using the ‘Rcircos’ package (Figure 1C). All DEGs were used to identify overrepresented GO terms and KEGG pathways (Table 1). GO analysis results showed that the most overrepresented GO terms in biological processes were enriched in mitotic nuclear division and chromosome segregation. In addition, the most enriched GO terms in molecular function and cellular component were DNA-dependent ATPase activity and chromosomal region, respectively. KEGG results showed that the most enriched KEGG pathway terms were Cell cycle, DNA replication, p53 signaling pathway, Human T-cell leukemia virus 1 infection, and Viral carcinogenesis.
3.2 Weighted Gene Co-expression Network Analysis (WGCNA)
After removed the outlier tumor samples, a total of 117 tumor samples were selected to do the WGCNA (Figure S1A). According to the scale-free topology fitting indices R2 and mean connectivity generated from 3728 DEGs expressions of 117 samples, threshold soft power β was set to be 5 (Figure 2A-B). All DEGs were submitted to WGCNA to construct the gene co-expression network and assigned to different modules by clustering dendrogram trees. We got 10 modules After setting the parameters and the module size were range from 127 to 825 (Figure 2C). The relationships between clinical-trait and gene modules were presented in Figure 2D. In the WGCNA results, the brown module showed a significant relationship to tumor grade and contains 263 genes. The genes from the brown module showed positive correlation with the tumor grade (Figure S1B).
3.3 Identification of protein-protein interactions (PPIs) and hub genes
In addition, PPIs of genes from the brown module were examined using the STRING database. We found that 40 genes formed a complex functional network, indicating that each of them has at least one functionally similar or interacted gene as the neighbor (Fig. 3). Remarkably, six genes (BUB1, CCNB2, CDC6, CDC20, CDK1, and MCM2) tended to be in the central hub of the network generated using the STRING database, thereby demonstrating the importance of these genes. We also evaluated the prognostic significance of six hub genes in patients from TCGA dataset. The K-M survival analysis revealed that the higher expression levels of BUB1, CCNB2, CDC6, CDC20, CDK1, and MCM2 were associated with the worse overall survival (Fig. 4A-F). The boxplots demonstrated that these hub genes were all highly expressed in tumor samples compared with the normal samples (Figure S2). The hub genes were also positively correlated with the tumor grades (Figure S3). Overall, these results suggested that the hub genes might play an oncogenic role in TSCC occurrence and development.
3.4 Virtual screening of compounds
The 3D structure of the protein is crucial for its interaction with other molecules and biological functions. In the current study, the 3D structures of hub proteins (CDC20, CDK1, and MCM2) were obtained from 5LCW, 4YC3, and 6XTX of Protein Data Bank dataset (Fig. 5). The active sites of CDC20, CDK1, and MCM2 were defined with SER377:HIS380, ALA145:PHE147, and PRO525:GLN531. A total of 1615 FDA approved drugs were taken from the ZINC15 database. After virtual screening by Libdock, 541, 841, and 1591 drugs were found to be eligible to bind stably with CDC20, CDK1, and MCM2, respectively. The top 3 drugs with the highest Libdock score of hub proteins were listed in Table 2.
3.5 Pharmacologic properties of compounds
Pharmacologic properties of all 9 selected compounds in Table 2 were predicted using the ADME module of Discovery Studio 2019, including aqueous solubility level, blood-brain barrier level, CYP2D6 binding, hepatotoxicity, human intestinal absorption level, and plasma protein binding properties (Table 3). The aqueous solubility prediction (defined in water at 25°C) indicated that all the compounds were soluble in water. Only 4 compounds were predicted to be inhibitors with CYP2D6, which was an essential enzyme in drug metabolism. As to hepatotoxicity, 3 compounds were found to be toxic. For human intestinal absorption, 6 compounds have good absorption level.
Toxicity results indicated that 3 compounds did not have developmental toxicity potential (Table 4). Considering all the results in Table 3 and Table 4, ZINC000100052685, ZINC000008214703 and ZINC000085537014 were identified to be the ideal leading compounds with high Solubility level, non-CYP2D6 inhibitors, less rodent carcinogenicity, together with Ames mutagenicity, and developmental toxicity potential compared with other compounds. Therefore, ZINC000100052685, ZINC000008214703, and ZINC000085537014 were confirmed as safe drug candidates and selected for the subsequent research.
3.6 Visualization of docking results from Libdock
There are 2 hydrogen bond interactions observed in the complex of ZINC000100052685 with CDC20 (Figure 6A) and 14 van der Waals interactions offered by Figure 7A. Four hydrogen bond interactions were observed in the complex of ZINC000008214703 with CDK1 (Figure 6B), as well as 14 van der Waals interactions offered by Figure 7B. Five hydrogen bond interactions were observed in the complex of ZINC000085537014 with MCM2 (Figure 6C), as well as 14 van der Waals interactions offered by Figure 7C.