Weighted co-expression network construction and key module analysis
We first performed the initial quality assessment using the average linkage method. Two outlier samples were removed after clustering. The remaining 339 cancer samples and 44 control samples with clinical information of LOI were used in subsequent analysis. The 2,601 variant genes in samples were with the largest variance via average linkage hierarchical clustering.
To estabilish a scale-free network, the scale index (Fig. 1a) and mean connectivity (Fig. 1b, c) were calculated. We found that the power value of β = 7 for the scale‑free topology for which the fitting index reached 0.85 (Fig. 1d). Different genes were subsequently grouped into modules according to the association of expression. Moreover, genes with similar expression patterns could be placed into different modules via average linkage clustering. Finally, a total of 10 modules were identified (Fig. 2). The correlations between MEs and LOI trait were explored. As shown in Fig. 3a, the result indicated that 10 co-expression modules were correlated with LOI phenotypes. Figure 3B showed 10 co-expression modules was associated with cancer status, especially turquoise and pink key modules. Then, scatter diagrams of GS for MM vs. LOI status in the turquoise and pink modules were plotted, respectively (Fig. 3c, d), which showed the genes in two modules were significantly related with LOI status. The correlation values were 0.4 (turquoise) and 0.59 (pink), P-values were 1.4 × 10−30 (turquoise) and 1.8 × 10−8 (pink), which indicated the turquoise and pink modules were high correlations with LOI status.
Enrichment analysis of key co-expression modules
To know the function of genes in the key co-expression modules, GO and KEGG analysis were performed. GO analysis showed that the turquoise module was associated with DNA replication, mitotic nuclear division, chromosome segregation, nuclear division and DNA−dependent DNA replication. KEGG analysis found that the turquoise module was associated with cell cycle, DNA replication, mismatch repair and p53 signaling pathway (P < 0.05) (Fig. 4a, b). Similarly, GO analysis indicated that the pink module was not only involved in squamous cell function, such as, epidermal cell differentiation, keratinocyte differentiation skin development, epidermis development and cornification (P < 0.05), but also involved in regulation of protein secretion, such as, peptidase activity, negative regulation of proteolysis, negative regulation of peptidase activity and negative regulation of endopeptidase activity (P < 0.05) (Fig. 4c). These results indicated that the turquoise module and pink modules played an important role in LOI of HNSCC.
PPI analysis and hub genes
To know the hub genes in the key modules, PPI analysis of STRING database were analyzed. Connect threshold was used to define the hub genes, 89 genes including top 5 genes, KIF18B, BUB1, BUB1B, KIF4A and EXO1 in turquoise module (connect threshold > 0.25) and 38 genes including top 5 genes, KRT78, CNFN, SLURP1, PRSS27 and CRCT1 in pink module (connect threshold > 0.10) were screened as candidate hub genes (Fig. 5 and Additional file 4: Table S4 and Additional file 5: Table S5). In addition, connect degree (>6) was further used to define the hub genes and then 24 genes (18 genes in turquoise module and 6 genes in pink module) were defined as hub genes.
Prognosis and expression analysis of hub genes
After excluding samples with no survival information/survival time less than 1 month, 339 cancer samples were used to evaluate the prognosis of 24 hub genes. The prognosis analysis showed that HNSCC with LOI had a poor clinical outcome than HNSCC without LOI (P < 0.05), which indicated that LOI was an important histological characteristic of HNSCC (Fig. 6a). The univariate survival analysis was further performed among hub genes by using R-package survival and the result indicated that CNFN was associated good survival (Fig. 6a), nevertheless, KIF18B, KIF23, PRC1, CCNA2, DEPDC1 and TTK were associated poor survival in LOI of HNSCC (P < 0.05) (Fig. 6c-h).
To determine the mRNA expression level of seven hubgenes (CNFN, KIF18B, KIF23, PRC1, CCNA2, DEPDC1 and TTK), we used the GEPIA database to validate the mRNA expression and found that CNFN was significantly downregulated in HNSCC and KIF18B, KIF23, PRC1, CCNA2, DEPDC1 and TTK were significantly upregulated (P < 0.05) (Fig. 6i). To further explore the protein expression of seven genes (CNFN, KIF18B, KIF23, PRC1, CCNA2, DEPDC1 and TTK), we further performed the protein expression analysis using the HPA databse (Fig. 6j). Statistical analysis indicated that CNFN expression was low and not detected (100%; n=4); KIF18B (66.7%; n=3), KIF23 (100%; n=4), PRC1 (75.0%; n=4), CCNA2 (66.7%; n=3), DEPDC1 (100%; n=3) and TTK (66.7%; n=3) were significantly medium and high expression (Fig. 6k).
Establishment of prognostic risk score formula
Using the LASSO method and multivariable Cox regression analysis, two mRNAs (CNFN and DEPDC1) were identified as integrated prognostic biomarkers for HNSCC patients. We then established a prognostic risk score formula based on the expression profiles of two prognostic mRNAs and their regression coefficients. The prognostic risk score formula was as follows: risk score = EXPDEPDC1 * 0.32636 + EXPCNFN * (-0.07544). The risk scores were calculated for all patients and split patients into a high-risk group (n = 165) and a low-risk group (n = 165) by using the median risk score as the cutoff value (Additional file 6: Table S6). The distribution of risk scores and survival status of the patients were shown in Fig. 7a and b.
We then assessed the prognostic value of the above risk formula by using Kaplan-Meier analysis. We found that the low-risk group had a better OS than the high-risk group (P < 0.001) (Fig. 7c). Moreover, time-dependent ROC analysis was also utilized to evaluate the prognostic capacity of the risk formula. The areas under the ROC curve at 1, 3 and 5 years were 0.582, 0.634 and 0.636, respectively, which suggested that the integrated two-mRNA signature had better utility than each single one (Fig. 7d).
Identification of small molecular agents
To know the small molecular agents targeted LOI in the turquoise and pink modules, we searched all drug-gene interaction in DrugBank database. Connection degree > 2 and P < 0.05 were used to screen the drug-module, five drug-module interactions (XL844, AT7519, AT9283, Alvocidib and Nelarabine) in turquoise module and three drug-module interactions (Benzamidine, L-Glutamine and Zinc) in pink module may be used to target LOI (P < 0.05) (Table 2). To further know the clinical application of 8 small molecular agents on head and neck cancer or solid tumor, we analyzed clinical trial registration of these small molecular agents using the clinicaltrials (https://clinicaltrials.gov/ct2/home). And as shown in Additional file 7: Table S7, although the study on Benzamidine was still unexplored, three clinical trials of L-Glutamine (NCT03015077, NCT02282839, NCT00006994) and three clinical trials of Zinc (NCT00036881, NCT03531190, NCT02868151) on head and neck cancer had been conducted. Meanwhile, XL844 (NCT00475917), AT7519 (NCT00390117, NCT02503709), AT9283 (NCT00443976, NCT00985868), Alvocidib (NCT00080990) and Nelarabin (NCT01376115) on solid tumor or cancer had also been studied. These results indicated that XL844, AT7519, AT9283, Alvocidib, Nelarabine, Benzamidine, L-Glutamine and Zinc may provide us new approach to block metastasis of lymph nodes.