A key genomic subtype associated with lymphovascular invasion in head and neck squamous cell carcinoma

Objective: Lymphovascular invasion (LOI), a key pathological feature of head and neck squamous cell carcinoma (HNSCC), predicts poor survival. However, the associated clinical characteristics remain uncertain, and the molecular mechanisms are largely unknown. Methods: Weighted gene co-expression network analysis was performed to construct gene co-expression networks and investigate the relationship between modules and LOI clinical trait. Functional enrichment and KEGG pathway enrichment analysis were performed for differentially expressed genesusing DAVID database. The protein-protein interaction network was constructed using Cytoscape software, and module analysis was performed using MCODE. Survival analysis and unsupervised hierarchical clustering were used to evaluate the relationships among LOI-associated genomic subtype, clinicopathological features and patient outcomes. And the potential targeted LOI molecular agents were identiﬁed with DrugBank. Results: 10 co-expression modules in two key modules (turquoise and pink) associated with tumor LOI were identiﬁed. Functional enrichment and KEGG analysis identified turquoise and pink modules played signiﬁcant roles in the progression of HNSCC. The 24 genes in two modules were identiﬁed as hub genes. Clustering analysis with seven hub genes set further divided cases into subtypes 1 and 2, which were signiﬁcantly associated with pathology-determined LOI status in both cohorts. The 10-year overall survival of subtype 2 was signiﬁcantly worse than that of subtype 1. Conclusions: Our research revealed the key co-expression modules and identiﬁed seven prognostic biomarkers, including CCNA2, CNFN, DEPDC1, KIF18, KIF23, PRC1, TTK, which provide some new insights into LOI of HNSCC. Additionally, the small molecular agents may be a candidate drug for treating LOI. LOI: Lymphovascular invasion (LOI), HNSCC: head and neck squamous cell carcinoma, DEGs: differentially expressed genes, MEs: GS: gene signiﬁcance, module membership, Weighted gene co-expression network analysis, MCODE: Molecular Complex PET-CT: positron emission tomography-computed


Abstract
Objective: Lymphovascular invasion (LOI), a key pathological feature of head and neck squamous cell carcinoma (HNSCC), predicts poor survival. However, the associated clinical characteristics remain uncertain, and the molecular mechanisms are largely unknown.
Methods: Weighted gene co-expression network analysis was performed to construct gene coexpression networks and investigate the relationship between modules and LOI clinical trait.
Functional enrichment and KEGG pathway enrichment analysis were performed for differentially expressed genesusing DAVID database. The protein-protein interaction network was constructed using Cytoscape software, and module analysis was performed using MCODE. Survival analysis and unsupervised hierarchical clustering were used to evaluate the relationships among LOI-associated genomic subtype, clinicopathological features and patient outcomes. And the potential targeted LOI molecular agents were identified with DrugBank.
Results: 10 co-expression modules in two key modules (turquoise and pink) associated with tumor LOI were identified. Functional enrichment and KEGG analysis identified turquoise and pink modules played significant roles in the progression of HNSCC. The 24 genes in two modules were identified as hub genes. Clustering analysis with seven hub genes set further divided cases into subtypes 1 and 2, which were significantly associated with pathology-determined LOI status in both cohorts. The 10-year overall survival of subtype 2 was significantly worse than that of subtype 1.
Conclusions: Our research revealed the key co-expression modules and identified seven prognostic biomarkers, including CCNA2, CNFN, DEPDC1, KIF18, KIF23, PRC1, TTK, which provide some new insights into LOI of HNSCC. Additionally, the small molecular agents may be a candidate drug for treating LOI.

Background
Head and neck squamous cell carcinoma (HNSCC), one of the most common pathological subtypes, nearly reach 90% of head and neck cancer [1]. Metastasis is the main cause of treatment failure and a important factor that affects the prognosis of HNSCC [2]. Thus, understanding the genomic changes of metastasis may be a valuable way to reduce the metastasis of lymph nodes.
In HNSCC, advanced TNM stage, histological grade and lymph node status, which indicate several high risk factors of metastatic disease, poor overall survival and disease free survival, are poor prognosis indicators [3,4,5]. Lymphovascular invasion (LOI) has been found to be associated with lymph node metastasis of HNSCC [6,7,8]. Thus, knowing effective molecular prognosticators of LOI can be a useful way to decrease the metastasis risk in HNSCC.
According to the recent studies, the clinical characteristics and parameters about LOI are still uncertain. Such as, the incidence of LOI in HNSCC was not consistent, which varied from 14 to 47% [9,10]. This huge difference could be due to the small sample sizes, distribution difference and the

Methods
Patient selection and data pre-processing Data information of HNSCC patients was downloaded from TCGA database. As shown in Table 1, RNA expression profiles and clinical survival data of 500 patients were obtained. Among these 500 patients, clinical prognosis data of 339 patients is available. According to the difference multiples (|logFC|>1) and the significance threshold (P < 0.05), 2248 genes that met the criteria were screened out as differentially expressed genes (DEG) (Additional file 1: Table S1). The intersection of DEGs from the NCBI-gene database (Additional file 2: Table S2) and OMIM database (Additional file 3: Table S3) was performed using the Venn Diagram package in R language. The study was approved by the

Construction of co-expression module network
The module-trait associations were considered as important clinical characteristics between the clinical phenotype and module eigengenes (MEs). As the method used by previous studies [11,12],

Enrichment of key co-expression modules analysis
As the method used by previous studies [12,13], the aberrantly expressed mRNAs in the key gene modules were selected to perform GO function analysis and KEGG pathway analysis. For GO analysis, the corresponding genes were divided into by biological process (BPs) analysis. Using the KEGG analysis, genes of the key co-expression modules were used to detect the gene modules functions, and P < 0.05 was considered as statistically significant.
PPI network analysis and identification of hub genes As method used by previous studies [14,15], the key gene co-expression module was further explored to predict gene function correlation using STRING database with a confidence score > 0.9.
Cytoscape was employed to screen significant gene pairs in the PPI network [16]. We further screen the modules of the PPI network by molecular complex detection (MCODE) analysis. The criteria of MCODE were as follows: degree cutoff = 2, node cutoff = 0.2, maximum depth = 100 and k-core = 2. At last, the 24 genes were selected as hub genes.
Survival analysis of Hub genes 24 hub genes were analyzed using univariate survival analysis. Seven genes with significant prognostic differences were selected as characteristic genes by P < 0.05. According to the expression profiles of characteristic genes, unsupervised hierarchical clustering was used to classify clinical samples, and Kaplan Meier was further used to explore the prognostic differences of the classified samples.

Identification of small molecular drugs
DrugBank is a comprehensive and systematic resource to explore detailed drug-target interaction information [17]. The turquoise and pink modules in PPI network were mapped onto the DrugBank database. The |connectivity score|>2 was used as cutoff value to identify molecular drugs targeted with HNSCC.

Statistical Analysis
Univariate analysis was performed using SPSS 17.0 (SPSS Inc., Chicago, IL, USA). Cumulative survival time was calculated and analyzed by the Kaplan-Meier and log-rank test. The difference between two groups was tested by a chi-square test or Fisher's exact test. P-value < 0.05 was considered statistically significant.

Weighted co-expression network construction and key module analysis
We first performed the initial quality assessment using the average linkage method. Two outlier samples were removed after clustering. The remaining 339 cancer samples and 44 control samples with clinical information of LOI were used in subsequent analysis. The 2,601 variant genes in samples were with the largest variance via average linkage hierarchical clustering. To estabilish a scale-free network, the scale index (Fig. 1a) and mean connectivity (Fig. 1b, c) were calculated. We found that the power value of β = 7 (scale free R 2 = 0.85) was optimum to perform further analysis (Fig. 1d).
Moreover, genes with similar expression patterns could be placed into different modules via average linkage clustering. Finally, a total of 10 modules were identified (Fig. 2). The correlations between MEs and LOI trait were explored. As shown in Fig. 3a, the result indicated that 10 co-expression modules were correlated with LOI phenotypes. Figure 3B showed 10 co-expression modules was associated with cancer status, especially turquoise and pink key modules. Then, scatter diagrams of GS for MM vs. LOI status in the turquoise and pink modules were plotted, respectively (Fig. 3c, d), which showed the genes in two modules were significantly related with LOI status. The correlation values were 0.4 (turquoise) and 0.59 (pink), P-values were 1.4 × 10 − 30 (turquoise) and 1.8 × 10 − 8 (pink), which indicated the turquoise and pink modules were high correlations with LOI status.

Enrichment analysis of key co-expression modules
To know the function of genes in the key co-expression modules, GO and KEGG analysis were performed. GO analysis showed that the turquoise module was associated with DNA replication, mitotic nuclear division, nuclear division and DNA − dependent DNA replication. KEGG analysis found that the turquoise module was associated with cell cycle, DNA replication, mismatch repair and p53 signaling pathway (Fig. 4a, b). Similarly, GO analysis indicated that the pink module was involved in peptidase activity, negative regulation of proteolysis, negative regulation of peptidase activity and negative regulation of endopeptidase activity (Fig. 4c). These results indicated that the turquoise module and pink modules played an important role in LOI of HNSCC.

PPI analysis and hub genes
To know the hub genes in the key modules, PPI analysis of STRING database were analyzed. 89 genes in turquoise module and 38 genes in pink module were screened as candidate hub genes ( Fig. 5 and Additional file 4: Table S4 and Additional file 5: Table S5). In addition, connect threshold ( 6)  prognostic differences were screened out as characteristic genes by screening P < 0.05 (Fig. 6).
To further validate these results, hierarchical clustering was conducted on the TCGA cohort using these seven genes, the heatmap showed patients of HNSCC were clustered into two groups: cluster 1 (n = 82) and cluster 2 (n = 248) (Fig. 7a). Compared with cluster 2, Cluster 1 had better survival probability (Fig. 7b). These results indicated that the seven genes can differentiate different LOI subtypes and seven genes maybe serve as the biomarker for LOI of HNSCC.

Identification of small molecular agents
To know the small molecular agents targeted LOI, we searched the DrugBank database and found that five drug-module interactions in turquoise module (XL844, AT7519, AT9283, Alvocidib and Nelarabine) and three drug-module interactions in pink module (Benzamidine, L-Glutamine and Zinc) may be used to target LOI ( Table 2). These results indicated that XL844, AT7519, AT9283, Alvocidib, Nelarabine, Benzamidine, L-Glutamine and Zinc may provide us new approach to block metastasis of lymph nodes.

Discussion
Metastasis is the main cause of HNSCC treatment failure [18]. Nodal metastatic disease is considered as a independent factor for poor survival in HNSCC [19,20,21]. Several clinicopathologic parameters have validated as associated with nodal metastasis, such as tumor size [9], tumor depth [22], tumor differentiation [23], histological grade [24] and LOI [4]. In this study, we performed a systematic analysis of LOI in HNSCC from the molecular level to the clinical level by comprehensive integrating genomic analysis. We found a key genomic subtype associated with lymphovascular invasion in HNSCC and small molecular agents including XL844, AT7519, AT9283, Alvocidib, Nelarabine, Benzamidine, L-Glutamine and Zinc may provide us new approach to block LOI.
With the application of sequencing techniques, genomic studies have transformed from individual genes aberrant expression to systematically integrating genomics mutation and chromatin remodel study. However, the molecular mechanisms of LOI are still unclear. TCGA database has listed several genomic landscape study of HNSCC from worldwide, which provides us a chance to integrate genomics data and understand the molecular changes of LOI. In this study, we conducted a coexpression network module of HNSCC and found the turquoise and pink modules were significantly associated with LOI. Function enrichment analysis indicated that the key gene modules function was mainly involved in the BPs of DNA replication, mitotic nuclear division, nuclear division and DNA − dependent DNA replication. Pathway enrichment analysis validated that the genes in the key module were enriched in cell cycle, DNA replication, mismatch repair and p53 signaling pathway, which indicated that the key gene modules play an important role in LOI in HNSCC.
It is essential to early diagnosis of LOI, as HNSCC patients with LOI may require more timely treatment [25,26]. Despite the development and application of MRI and PET-CT to assess LOI in HNSCC, the detection rate of early-stage LOI is still rarely low [27]. In this study, the hub genes in the key modules related to LOI were screened and hierarchical clustering showed that subtypes including seven genes (CCNA2, CNFN, DEPDC1, KIF18, KIF23, PRC1 and TTK1) could differentiate the risk of LOI.
And the subtypes of seven genes would be potential biomarkers of LOI.
However, the targeted treatment for LOI is lacking and unreliable. DrugBank provides comprehensive molecular information about drugs and their targets for the treatment of LOI. Based on interactions of drug and key modules, we found 8 small molecular agents that could target LOI, including Benzamidine, L-Glutamine, Zinc, XL844, AT7519, AT9283, Alvocidib and Nelarabine. A recent study found that AT7519 and Alvocidib, a cyclin-dependent kinase inhibitor, has been shown to have potential for anticancer effects for cancer treatment [28,29,30,31,32]. XL844, a specific inhibitor of Chk1 and Chk2 kinase, has been found to effectively sensitize cancer cells to induce cell cycle arrest [33]. These results indicated the 8 small molecular agents may target hub genes of hub genes and can treat LOI of HNSCC.

Conclusions
In summary, the current study was intended to identify LOI with comprehensive bioinformatics analysis and we found a key subtype including CCNA2, CNFN, DEPDC1, KIF18, KIF23, PRC1 and TTK1 may be potential biomarkers and predict progression in LOI of HNSCC. Benzamidine, L-Glutamine, Zinc, XL844, AT7519, AT9283, Alvocidib and Nelarabine may be a candidate drug for treating LOI.

Consent for publication
Not applicable.

Availability of data and materials
All of the data of this study has been downloaded from TCGA database

Competing interest
The authors declare that they have no competing interests.     The prognostic value of seven hub genes in HNSCC. Ten years survival analysis of seven hub genes (CCNA2, CNFN, DEPDC1, KIF18, KIF23, PRC1 and TTK1).