DOI: https://doi.org/10.21203/rs.2.23081/v1
Background: Head and Neck Squamous Cell Carcinoma (HNSCC) is a common head and neck malignancy that originates from lips, mouth, paranasal sinuses, oropharynx, larynx, nasopharynx, and other pharyngeal cancers. Since high-throughput expression data is now available, it is feasible to use global gene expression data to analyze the relationship between metabolic-related gene expression and clinical outcomes in HNSCC patients.
Method: In this study, we used RNA sequencing (RNA-seq) data from cancer genomic maps (TCGA) and validated in the GEO dataset to shine the dysfunctional metabolic microenvironment and given potential biomarkers for metabolic therapy.
Results: TCGA database contained 529 patients and 327 metabolic genes (198 upregulated and the largest logFC is CA9; 129 downregulated and the largest logFC is CA6. Cox regression model found 51 prognosis‐related genes and we found the most high-risk gene was LCLAT1 (HR=1.144, 95% CI=1.044~1.251); the most low-risk gene was CHDH (HR=0.580, 95% CI=0.400~0.839). We next study whether metabolic-related genes patterns could serve as an early predictor of incidence of HNSCC by ROC curve and the model demonstrated an AUC of 0.745 in TCGA and 0.618 in GEO. Meanwhile, the predictor of clinicopathological in HNSCC was also analyzed and we found the AUC for age, gender, grade, stage, T, M and N were 0.520, 0.495, 0.568, 0.606, 0.577, 0.476 and 0.673, respectively in TCGA and the AUC for age, gender, stage, T, M, N, smoking and HPV16-pos were 0.599, 0.531, 0.622, 0.606, 0.616, 0.550, 0.614, 0.519 and 0.397 respectively in GEO.
Conclusion: Our research found a novel metabolic signature gene for HNSCC prognosis prediction based on the TCGA dataset. Our signatures may reflect metabolic microenvironment disorders and provide useful biomarkers for metabolic therapy and response treatment prediction.
Head and Neck Squamous Cell Carcinoma (HNSCC) is a head and neck malignancy that originates from lips, mouth, paranasal sinuses, larynx, nasopharynx, and other pharyngeal cancers1. As the sixth most common type of malignant tumors, with more than 655,000 new cases and 90,000 deaths every year2. Currently, smoking, drinking, and human papillomavirus (HPV) infection are considered risk factors for HNSCC occurrence and prognosis3. Unfortunately, the 5-year survival rate is still below 50% due to the usual lack of early symptoms when HNSCC is detected early, while the survival rate is reduced to 35% due to local recurrence and metastasis4. The occurrence and development of HNSCC is a complex process including multiple molecules. LASP1, a chaperone of heat shock protein family A member 1A (HSPA1A) can lead to colony formation, cell proliferation, invasion and cell cycle G2/M phase transition in HNSCC5. Kim et al reported that mechanistic and functional roles of CXCR7 as a key regulator of oncogenic TGF-β1/Smad2/3 signaling in HNSCC. In addition, Hsu et al6 notes the driver-oncogene role of atypical cadherin 1 (FAT1) in mediation of proliferation, cell-death evasion, chemoresistance and oncogenicity in oral squamous cell carcinoma (OSCC). As for the histological type of HNSCC and multiple anatomical sites, tumor markers vary widely. Therefore, it is possible to identify more valuable HNSCC drug targets by screening for changes in gene function networks linked to tumor formation and progression.
Loss of tumors suppressors and activation of oncogenes contribute to metabolic reprogramming in cancer, leading to improve nutrient uptake to provide energetic and biosynthetic pathways8. In the 1920s, Otto Warburg first reported that tumors took up distinctly more levels of glucose when compared with normal tissues indicating that these cells were shuttling glucose via the glycolytic fermentation pathway9. Recent studies have shown that immune cells have unique metabolic characteristics that affect their immune function. For example, macrophage polarization is associated with unique metabolic characteristics related to iron metabolism, energy metabolism, and lipid metabolism10,11. More and more studies have investigated the prognostic role of these metabolic genes in cancer, but the role and mechanism of metabolism are still unclear. Hu et al12 found that mutationally activated KRAS strikingly increased the glutathione biosynthesis and intracellular cystine level in lung adenocarcinoma. Yoo et al found that the SLC1A5 variant is a mitochondrial glutamine transporter used for metabolic reprogramming of pancreatic cancer and the knockout and overexpression of the SLC1A5 variant alters the growth of cancer cells and tumors, thereby supporting carcinogenesis13. However, there is no systematic assessment of metabolic-related genes and predicting the overall survival (OS) of HNSCC patients or the characteristics of response to immunotherapy. Since high-throughput expression data is now accessible, it is realistic to use global gene expression data to analyze the relationship between metabolic-related gene expression and clinical outcomes in HNSCC patients. In this study, we used RNA sequencing (RNA-seq) data from cancer genomic maps (TCGA) and validated in the Gene Expression Omnibus (GEO) dataset to show the dysfunctional metabolic microenvironment and given useful biomarkers for metabolic therapy.
The targeted records of RNA-seq data in LUAD patients was get from TCGA (http://www. cancergenome.nih.gov), a web-based resource, provides a user-friendly interface for detailed views of alternative mRNA data. validated dataset downloads from GSE65858 data set from the GEO database. Through the metabolic pathways included in the Gene set enrichment analysis (GSEA) database, all metabolic related genes are extracted. One millionth transcript normalization and log2 transformation were used for expression profiling. The selection of metabolic genes for succeeding prognostic analysis not only satisfies consistent expression patterns in the TCGA cohort, but is also listed in the GSE65858 data set.
Lasso-penalized Cox regression and univariate Cox regression analysis were selected to explore the prognosis metabolic-related genes and build the gene signature14. The prognostic gene signature was demonstrated as risk score = (CoefficientmRNA1 × expression of mRNA1) + (CoefficientmRNA2 × expression of mRNA2) + ⋯ + (CoefficientmRNAn × expression mRNAn). Related clinical data of HNSCC patients were also downloaded and enrolled in our study. Basis its median number, each parameter has been isolated low-risk (< median number) groups and high-risk (≥ median number). We used Kaplan–Meier survival analysis to analyze the survival significance between the study and control groups.
Nomograms are proverbially used for cancer prognosis, primarily owing to their ability to decrease statistical predictive models into a single numerical assess of the probability of an event, such as recurrence or death, that is tailored to the profile of an individual patient15. A time-dependent receiver operating characteristic (ROC) curve was carried out to assess the predictive correctness of prognostic signatures in patients with HNSCC. Univariate and multivariate Cox regression analyses was used to analyze the relationship between immune-related genes clinicopathological parameters.
Validated dataset downloaded from the GSE65858 dataset in the GEO database. We not only assessed the risk scores of patients with genetic characteristics, but also calculated the ROC analysis, Kaplan-Meier analysis, and the construction and verification of the nomogram were the same as the TCGA-HNSCC cohort. To reveal the underlying basic approach to gene signatures in the Kyoto Encyclopedia of Genes and Genomes (KEGG), GSEA was used to look up the rich terms in C2 in the TCGA-HNSCC or GSE65858 cohort. P < 0.05 and false discovery rate q < 0.25 were statistically significant. The mRNA level (Oncomine database and TIMER database) and protein level (The Human Protein Atlas database) further verified the expression of prognostic genes in gene signatures. We used CBioportal to study genetic alterations in gene signatures.
Use list-by-list delete techniques to handle missing data, and if any single value is missing, that data will exclude the entire sample from the survey. Benjamini–Hochberg's method of converting P values to FDR. Data were checked using R (version 3.5.3) and R Bioconductor software packages. We use Perl language for data matrix and data processing based on P less than 0.5.
Built and verification of prognostic metabolic gene signatures
TCGA database contained 529 patients and 327 metabolic genes (198 upregulated and the largest logFC is CA9; 129 downregulated and the largest logFC is CA6; Table S1) were involved in TCGA‐LIHC to train the prognostic model. Meanwhile, the GSE65858 dataset contained 270 HNSCC tissue samples. The univariate Cox regression shown 51 survival‐related genes and we found the most high-risk gene was LCLAT1 (HR=1.144, 95% CI=1.044~1.251); the most low-risk gene was CHDH (HR=0.580, 95% CI=0.400~0.839) Table 1. Then Lasso‐penalized Cox analysis found 30 genes to build the prognostic model. The cohorts were then divided into high- and low-risk groups using the median risk score value as a cut-off.
Survival results and multivariate examination
OS analysis demonstrated that HNSCC with high risk group had a more terrible prognosis than that with low risk group (P<0.01) Fig. 1. Detailed prognostic signature information of HNSCC groups is visualized in Fig. 2, which indicated that the mortality rate was higher for HNSCC patients in the high-risk group and related to the lower OS. The result of univariate and multivariate Cox regression analysis showed that our prognostic model is an independent prognostic factor for OS Fig. 3. We next study whether metabolic-related genes patterns could serve as an early predictor of incidence of HNSCC by ROC curve and the model demonstrated an AUC of 0.745 in TCGA and 0.618 in GEO. Taken together, these results indicate that the prognostic model has moderate sensitivity and specificity. Meanwhile, the predictor of clinicopathological in HNSCC was also analyzed and we found the AUC for age, gender, grade, stage, T, M and N were 0.520, 0.495, 0.568, 0.606, 0.577, 0.476 and 0.673, respectively in TCGA and the AUC for age, gender, stage, T, M, N, smoking and HPV16-pos were 0.599, 0.531, 0.622, 0.606, 0.616, 0.550, 0.614, 0.519 and 0.397 respectively in GEO Fig 4.
Constructing and validating a predictive nomogram in TCGA and in GEO cohort
Nomogram was construct by involving clinicopathological and the prognostic model. Through the LASSO logistic regression algorithm, the most important prediction markers are selected through the training data set, and a strong contribution is made to the final prediction model. The model ultimately included 7 features in TCGA: age, gender, grade, stage, T, M and N and 8 features in TCGA: age, gender, stage, T, M, N, smoking and HPV16-pos Fig 5. Integrating our prognostic model with clinicopathological raise the forecasting sensitivity and specificity for forecasting 1‐, 3‐, and 5‐year OS and bring some net benefit which might useful for clinical management.
Gene set enrichment analyses
Then samples were divided into high‐ and low‐risk groups as training set to distinguish the potential function and elucidate the significant survival difference utilizing GSEA. Annotated gene sets c2.cp.kegg.v6.0.symbols.gmt was selected as the reference gene sets, which includes terms with NOM<0.05. Gene set permutations were executed multiple times for every examination. Great majority the metabolism‐related pathways were enriched in the high‐risk group (such as the metabolism of galactose, nicotinate/nicotinamide and pantothenate/COA biosynthesis or metabolic disease related such as the maturity‐onset diabetes of the young), whereas most of the non-metabolism related pathways were enriched in the low‐risk group (base excision repair, spliceosome, homologous recombination, nucleotide excision, DNA replication) Fig 6 and Table 2.
Online database analysis
We used multidimensional survey ways to explore CA6, CA9, LCLAT1 and CHDH based on thousands of variations in copy numbers or gene expressions in patients whom with HNSCC in detail to provide new insights into the potential functions, expression patterns, molecular mechanisms and distinct prognostic. Just like our results, CA6 were significantly under-expressed, while CA9 were significantly overexpressed in HNSCC cancer both in the TIMER database (Figure 7) Oncomine (Figure 8). In spite of lack in the Oncomine datas, the mRNA expression of LCLAT1 was also obvious overexpressed and CHDH under-expression in HNSCC in the TIMER database. Representative protein expressions of CA9, LCLAT1, and CHDH were explored in the HPA database, as shown in Figure 9. However, CA6 cannot be found on the website. CA9 has the most frequent genetic changes (10%), and amplified mutations are the most common changes (Figure 10). In summary, the abnormal expression of these genes has been further verified in HNSCC, and genetic changes may help explain the abnormal expression of these genes to a certain extent.
Proliferating cancer cells must maintain sufficient energy and a library of metabolic intermediates to build the macromolecules required for proliferation, including DNA, proteins, and lipids16. Because the metabolic profile of tumor cells distinguishes them from normal cells and is essential for their growth and survival, metabolic signaling pathways have become ideal targets for therapeutic intervention in cancer patients. In this study, we identified a novel and effective metabolic-related prognostic signature gene based on the TCGA dataset, and verified its validity in the GSE65858 dataset, indicating that the signature has strong prognostic value. Our signature may represent the metabolic status of patients with HNSCC and provide potential biomarkers and targets for metabolic signaling pathways therapeutic intervention.
In our study, we downloaded transcriptome data from the TCGA database and verification in GEO dataset, metabolic-related genes extract from the GSEA metabolic signaling pathways. We first study the connections between differentially expressed of RNA sequencing, immune-related genes and transcription factors in LUAD patients. Univariate Cox regression model found 51 survival-related genes and we found the most high-risk gene was LCLAT1, the most low-risk gene was CHDH. Lysocardiolipin acyltransferase 1 (LCLAT1, also known as LYCAT; AGPAT8; ALCAT1) whichis an Acyl-CoA:lyso-CL acyltransferase and an isoform of MLCLAT117. Cardiolipin (CL) types of polyunsaturated fatty acids, especially DHA (C22: 6n3), increased in ALCAT1-expressing cells, while C16-C18 fatty acids decreased significantly18. A recent study found that ALCAT1 is critical for coupling mitochondrial respiration and metabolic plasticity. Wang et al reported that forced expression of ALCAT1 in primary hepatocytes led to multiple defects including steatosis, defective autophagy, and mitochondrial dysfunction20. Until now, there was no study about the role of ALCAT1 in tumors, thus we suppose ALCAT1 plays a regulatory role in cardiolipin remodeling in response to oxidative stress and stimulate mitochondrial activity in HNSCC cancers and needs more relate studies to conduct. Choline dehydrogenase (CHDH) encoded a choline dehydrogenase that localizes to the mitochondrion and variations in this gene can affect susceptibility to choline deficiency. CHDH strongly predicted clinical outcome in breast cancer patients receiving tamoxifen monotherapy21. Choline is an essential nutrient required for methyl group metabolism and CHDH associated with an increased risk of breast cancer22. While there was no study focus on the role of CHDH in HNSCC and more researches are needed.
We next study whether metabolic-related genes patterns could serve as an early predictor of incidence of HNSCC by ROC curve and the model demonstrated an AUC of 0.745 in TCGA and 0.618 in GEO. Integrating our prognostic model with clinicopathological raise the predicting sensitivity and specificity for predicting 1-, 3‐, and 5‐year OS and bring some net benefit which might help clinical management. Then we try to distinguish the potential function and elucidate the significant survival difference utilizing GSEA. Great majority of the metabolism‐related pathways were enriched in the high‐risk group (such as the metabolism of galactose, nicotinate/nicotinamide and pantothenate/COA biosynthesis or metabolic disease related such as the maturity‐onset diabetes of the young), whereas most of the non-metabolism related pathways were enriched in the low‐risk group (base excision repair, spliceosome, homologous recombination, nucleotide excision, DNA replication). Galactose is essential for human metabolism and plays a definitive role in energy transfer and galactosylation of complex molecules. Nicotinamide adenine dinucleotide (NAD) plays a central role in energy metabolism and integrates cell metabolism with signaling and gene expression23. NAD biosynthesis is dependent on nicotinamide/nicotinate single nucleotide adenylate transferase24. From this perspective, high-risk patients may benefit from metabolic therapy, while low-risk patients may benefit from non-metabolic therapy. However, much effort is needed to study the relationship between signatures and metabolic microenvironments and metabolic therapies. On the other hand, the results provide a promising direction for elucidating the underlying molecular mechanisms of signatures. In conclusion, our signatures may reflect metabolic microenvironment disorders and provide potential biomarkers for metabolic therapy and treatment response prediction.
TCGA database contained 529 patients and 327 metabolic genes (198 upregulated and the largest logFC is CA9; 129 downregulated and the largest logFC is CA6. Carbonic anhydrases (CAs) are a large class of zinc metal enzymes that catalyze the reversible hydration of carbon dioxide. They are involved in various biological processes, including bone resorption, respiration, calcification, and acid-base balance. Study found that pancreatic ductal adenocarcinomas (PDACs) cells that express activated KRAS increase expression of CA9, via stabilization of hypoxia-inducible factor 1 subunit alpha (HIF1A) and HIF2A, to regulate pH and glycolysis25. Similarly, CA9 is an independent prognostic factor for OSCC patients and therefore a potential therapeutic target26. Carbonic anhydrase 6 (CA6, also known as CA-VI; GUSTIN) encodes several isoenzymes of carbonic anhydrase, which are only found in salivary glands. Saliva and proteins may play a role in the reversible hydration of carbon dioxide. CA6 is a specific marker of salivary gland serous acinar cells and acinar cell carcinoma (AciCC). CA6 has the same sensitivity and specificity as DOG1 in the differential diagnosis of AciCC and breast analogs (MASC)27. However, the role of CA9 and CA6 in human HNSCC prognostic remains to be elucidated. Thus, carbonic anhydrases may help us understand the protein disease pathogenesis and progression in future study. Our study identified a novel metabolic signature gene for HNSCC prognosis prediction base on TCGA data set. Our signatures may reflect metabolic microenvironment disorders and provide potential biomarkers for metabolic therapy and treatment response prediction. However, signatures must be verified in more independent cohorts and functional experiments of predictive metabolic genes. At the same time, our study has limitations. On the one hand, our results have not been verified in clinical samples, and on the other hand, because of the relatively small number of patients, our results cannot provide accurate clinical data. We will carry out a lot of related research in the follow-up work, and finally translate it into the development of new strategies for precision cancer medicine.
Our research found a novel metabolic signature gene for HNSCC prognosis prediction based on the TCGA dataset. Our signatures may reflect metabolic microenvironment disorders and provide useful biomarkers for metabolic therapy and response treatment prediction.
HNSCC:head and neck squamous cell carcinoma; TCGA:cancer genome atlas; HPV:human papillomavirus; ROC:receiver operating characteristic; GSEA:Gene set enrichment analysis; GEO:Gene Expression Omnibus; OS:over survival.
Ethics approval and consent to participate
Not applicable.
Consent to public
Not applicable.
Availability of data and materials
Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.
Competing interests
The authors declare that they have no competing interests.
Funding
This work is not supported by grants.
Authors’ contributions
W.Z.H. and Z.W. designed and analyzed the research study; W.Z.H. wrote and revised the manuscript, W.Z.H. collected the data and all authors have read and approved the manuscript.
Acknowledgements
Not applicable
Table 1. Prognostic related metabolic genes, HR> 1 is a high-risk gene, and HR <1 is a low-risk gene.
Id |
HR |
HR.95L |
HR.95H |
pvalue |
HEXB |
1.038 |
1.006 |
1.071 |
0.019 |
ACAT1 |
1.087 |
1.034 |
1.143 |
0.001 |
P4HA1 |
1.019 |
1.007 |
1.030 |
0.001 |
GNPDA1 |
1.065 |
1.013 |
1.119 |
0.013 |
POLE2 |
0.882 |
0.791 |
0.982 |
0.023 |
ACACB |
0.649 |
0.445 |
0.945 |
0.024 |
SMS |
1.007 |
1.002 |
1.011 |
0.001 |
AGPS |
1.053 |
1.010 |
1.097 |
0.014 |
PYGL |
1.009 |
1.003 |
1.014 |
0.003 |
ACAA1 |
0.890 |
0.802 |
0.988 |
0.029 |
MTHFD2 |
1.035 |
1.017 |
1.052 |
4.745 |
PLCB3 |
1.027 |
1.008 |
1.045 |
0.005 |
POLE |
0.857 |
0.764 |
0.960 |
0.008 |
POLD2 |
1.013 |
1.003 |
1.022 |
0.006 |
MINPP1 |
1.082 |
1.015 |
1.153 |
0.016 |
KYNU |
1.074 |
1.014 |
1.136 |
0.013 |
PIK3C2B |
0.825 |
0.729 |
0.932 |
0.002 |
CHDH |
0.580 |
0.400 |
0.839 |
0.003 |
G6PD |
1.003 |
1.000 |
1.006 |
0.033 |
POLD1 |
0.959 |
0.924 |
0.994 |
0.025 |
PTDSS1 |
1.016 |
1.001 |
1.030 |
0.031 |
ENTPD1 |
0.813 |
0.710 |
0.930 |
0.002 |
LCLAT1 |
1.144 |
1.044 |
1.251 |
0.003 |
PFKP |
1.012 |
1.001 |
1.021 |
0.024 |
PIP4K2A |
0.927 |
0.859 |
0.998 |
0.046 |
DNMT1 |
0.959 |
0.932 |
0.986 |
0.003 |
ADK |
1.040 |
1.017 |
1.061 |
0.001 |
NEU1 |
1.033 |
1.001 |
1.066 |
0.041 |
GATM |
0.897 |
0.827 |
0.972 |
0.008 |
TXNDC12 |
1.022 |
1.000 |
1.043 |
0.046 |
ADA |
1.036 |
1.019 |
1.053 |
2.829e-05 |
PAFAH1B2 |
1.043 |
1.013 |
1.074 |
0.004 |
PLA2G2D |
0.864 |
0.778 |
0.958 |
0.005 |
DGKQ |
0.908 |
0.845 |
0.974 |
0.007 |
NAGK |
0.922 |
0.869 |
0.976 |
0.005 |
FTH1 |
1.002 |
1.000 |
1.002 |
0.010 |
ADH7 |
1.006 |
1.000 |
1.010 |
0.030 |
ACOX3 |
0.897 |
0.830 |
0.969 |
0.005 |
SHMT1 |
0.923 |
0.861 |
0.988 |
0.022 |
ASNS |
1.030 |
1.011 |
1.049 |
0.001 |
HPRT1 |
1.029 |
1.016 |
1.041 |
5.217e-06 |
ATIC |
1.038 |
1.016 |
1.061 |
0.001 |
LDHA |
1.002 |
1.000 |
1.003 |
0.007 |
PRPS1 |
1.035 |
1.014 |
1.055 |
0.001 |
NADSYN1 |
1.037 |
1.001 |
1.074 |
0.040 |
GSTO1 |
1.003 |
1.000 |
1.006 |
0.049 |
TXNRD1 |
1.005 |
1.000 |
1.010 |
0.034 |
RDH11 |
1.029 |
1.006 |
1.052 |
0.012 |
PAICS |
1.024 |
1.005 |
1.043 |
0.010 |
|
|
|
|
|
Table 2. Gene sets enriched in phenotype high and low.
Gene set name |
SIZE |
NES |
NOM P-val |
|
|
KEGG_GALACTOSE_METABOLISM |
23 |
1.62 |
0.010 0.034 0.046 0.031 0.033 0.000 0.002 0.020 0.022 0.013 |
|
|
KEGG_NICOTINATE_AND_NICOTINAMIDE_METABOLISM |
19 |
1.45 |
|
||
KEGG_RENAL_CELL_CARCINOMA |
64 |
1.41 |
|
||
KEGG_PANTOTHENATE_AND_COA_BIOSYNTHESIS |
16 |
1.59 |
|
||
KEGG_MATURITY_ONSET_DIABETES_OF_THE_YOUNG |
25 |
1.53 |
|
||
KEGG_BASE_EXCISION_REPAIR |
33 |
-1.88 |
|
||
KEGG_SPLICEOSOME |
109 |
-1.82 |
|
||
KEGG_HOMOLOGOUS_RECOMBINATION |
24 |
-1.65 |
|
||
KEGG_NUCLEOTIDE_EXCISION_REPAIR |
44 |
-1.64 |
|
||
KEGG_DNA_REPLICATION |
36 |
-1.63 |
|
||
NES: normalized enrichment score; NOM: nominal; Gene sets with NOM P-val<0.05 are considered as significant. |
|
||||