Built and verification of prognostic metabolic gene signatures
TCGA database contained 529 patients and 327 metabolic genes (198 upregulated and the largest logFC is CA9; 129 downregulated and the largest logFC is CA6; Table S1) were involved in TCGA‐LIHC to train the prognostic model. Meanwhile, the GSE65858 dataset contained 270 HNSCC tissue samples. The univariate Cox regression shown 51 survival‐related genes and we found the most high-risk gene was LCLAT1 (HR=1.144, 95% CI=1.044~1.251); the most low-risk gene was CHDH (HR=0.580, 95% CI=0.400~0.839) Table 1. Then Lasso‐penalized Cox analysis found 30 genes to build the prognostic model. The cohorts were then divided into high- and low-risk groups using the median risk score value as a cut-off.
Survival results and multivariate examination
OS analysis demonstrated that HNSCC with high risk group had a more terrible prognosis than that with low risk group (P<0.01) Fig. 1. Detailed prognostic signature information of HNSCC groups is visualized in Fig. 2, which indicated that the mortality rate was higher for HNSCC patients in the high-risk group and related to the lower OS. The result of univariate and multivariate Cox regression analysis showed that our prognostic model is an independent prognostic factor for OS Fig. 3. We next study whether metabolic-related genes patterns could serve as an early predictor of incidence of HNSCC by ROC curve and the model demonstrated an AUC of 0.745 in TCGA and 0.618 in GEO. Taken together, these results indicate that the prognostic model has moderate sensitivity and specificity. Meanwhile, the predictor of clinicopathological in HNSCC was also analyzed and we found the AUC for age, gender, grade, stage, T, M and N were 0.520, 0.495, 0.568, 0.606, 0.577, 0.476 and 0.673, respectively in TCGA and the AUC for age, gender, stage, T, M, N, smoking and HPV16-pos were 0.599, 0.531, 0.622, 0.606, 0.616, 0.550, 0.614, 0.519 and 0.397 respectively in GEO Fig 4.
Constructing and validating a predictive nomogram in TCGA and in GEO cohort
Nomogram was construct by involving clinicopathological and the prognostic model. Through the LASSO logistic regression algorithm, the most important prediction markers are selected through the training data set, and a strong contribution is made to the final prediction model. The model ultimately included 7 features in TCGA: age, gender, grade, stage, T, M and N and 8 features in TCGA: age, gender, stage, T, M, N, smoking and HPV16-pos Fig 5. Integrating our prognostic model with clinicopathological raise the forecasting sensitivity and specificity for forecasting 1‐, 3‐, and 5‐year OS and bring some net benefit which might useful for clinical management.
Gene set enrichment analyses
Then samples were divided into high‐ and low‐risk groups as training set to distinguish the potential function and elucidate the significant survival difference utilizing GSEA. Annotated gene sets c2.cp.kegg.v6.0.symbols.gmt was selected as the reference gene sets, which includes terms with NOM<0.05. Gene set permutations were executed multiple times for every examination. Great majority the metabolism‐related pathways were enriched in the high‐risk group (such as the metabolism of galactose, nicotinate/nicotinamide and pantothenate/COA biosynthesis or metabolic disease related such as the maturity‐onset diabetes of the young), whereas most of the non-metabolism related pathways were enriched in the low‐risk group (base excision repair, spliceosome, homologous recombination, nucleotide excision, DNA replication) Fig 6 and Table 2.
Online database analysis
We used multidimensional survey ways to explore CA6, CA9, LCLAT1 and CHDH based on thousands of variations in copy numbers or gene expressions in patients whom with HNSCC in detail to provide new insights into the potential functions, expression patterns, molecular mechanisms and distinct prognostic. Just like our results, CA6 were significantly under-expressed, while CA9 were significantly overexpressed in HNSCC cancer both in the TIMER database (Figure 7) Oncomine (Figure 8). In spite of lack in the Oncomine datas, the mRNA expression of LCLAT1 was also obvious overexpressed and CHDH under-expression in HNSCC in the TIMER database. Representative protein expressions of CA9, LCLAT1, and CHDH were explored in the HPA database, as shown in Figure 9. However, CA6 cannot be found on the website. CA9 has the most frequent genetic changes (10%), and amplified mutations are the most common changes (Figure 10). In summary, the abnormal expression of these genes has been further verified in HNSCC, and genetic changes may help explain the abnormal expression of these genes to a certain extent.