Identification of Cuproptosis-related Differentially Expressed LncRNAs in HNSCC
The complete clinical information and corresponding RNA-Seq data of 501 HNSCC patients were downloaded from the TCGA database, and 16876 lncRNAs and 19938 mRNAs were used to obtain 448 DElncRNAs with adj. P < 0.05 and |logFC| ≥ 1 (Fig. 2A). We obtained 19 validated human cuproptosis-related genes from the literature. Then, the correlation between lncRNAs in the TCGA database and the cuproptosis-related mRNAs was analyzed by the Pearson correlation coefficient method, and 783 CRlncRNAs were identified by Pearson correlation coefficients with P < 0.001 and |R| > 0.4. Finally, we identified 39 cuproptosis-related DElncRNAs as potential prognostic lncRNAs (Fig. 2B).
The LncRNA–mRNA Co-Expression Network
To evaluate the relationship between these 39 cuproptosis-related DElncRNAs and the representative cuproptosis-related genes, we established a co-expression network using Cytoscape. There were 10 mRNAs associated with 39 lncRNAs (P < 0.001, |R| > 0.4; Fig. 2C). A Sankey diagram was established to explore the relationship between cuproptosis-related DElncRNAs and cuproptosis-related genes (Fig. 2D).
Identification of Prognostic Cuproptosis-related Differentially Expressed lncRNAs
In this study, we randomly segregated these HNSCC cases into a training set of 351 cases and a validation set of 150 cases at an approximate ratio of 7:3. The epidemiological and clinical characteristics of each HNSCC patient in the two sets are shown in Table 1. These cuproptosis-related DElncRNAs with prognostic capacity were evaluated using Cox univariate regression analysis of the OS data of HNSCC patients in the training set, and, 8 prognostic cuproptosis-related DElncRNAs in HNSCC were identified (Fig. 3A). Six cuproptosis-related DElncRNAs were “protective” genes, while AC092115.3 and LINC01063 could be treated as “risk” genes.
Table 1
The clinical characteristics of HNSCC patients in the training, validation and overall sets.
Characteristics
|
Overall
|
Validation
|
Training
|
P
|
Age
|
|
|
|
|
≤ 65
|
326(65.07%)
|
93(62.00%)
|
233(66.38%)
|
0.401
|
> 65
|
175(34.93%)
|
57(38.00%)
|
118(33.62%)
|
Gender
|
|
|
|
|
female
|
133(26.55%)
|
35(23.33%)
|
98(27.92%)
|
0.3399
|
male
|
368(73.45%)
|
115(76.67%)
|
253(72.08%)
|
Grade
|
|
|
|
|
G1
|
61(12.18%)
|
18(12.00%)
|
43(12.25%)
|
0.9277
|
G2
|
299(59.68%)
|
87(58.00%)
|
212(60.40%)
|
|
G3
|
119(23.75%)
|
36(24.00%)
|
83(23.65%)
|
|
G4
|
2(0.40%)
|
1(0.67%)
|
1(0.28%)
|
|
unknow
|
20(3.99%)
|
8(5.33%)
|
12(3.42%)
|
|
T classification
|
|
|
|
|
T1
|
47(9.38%)
|
15(10.00%)
|
32(9.12%)
|
0.6357
|
T2
|
151(30.14%)
|
49(32.67%)
|
102(29.06%)
|
|
T3
|
118(23.55%)
|
37(24.67%)
|
81(23.08%)
|
|
T4
|
185(36.93%)
|
49(32.66%)
|
136(38.74%)
|
|
T classification
|
|
|
|
|
N0
|
214(42.71%)
|
65(43.34%)
|
149(42.45%)
|
0.9677
|
N1
|
76(15.17%)
|
21(14.00%)
|
55(15.67%)
|
|
N2
|
198(39.52%)
|
59(39.33%)
|
139(39.60%)
|
|
N3
|
9(1.80%)
|
3(2.00%)
|
6(1.71%)
|
|
unknow
|
4(0.80%)
|
2(1.33%)
|
2(0.57%)
|
|
M classification
|
|
|
|
|
M0
|
486(97.00%)
|
145(96.67%)
|
341(97.15%)
|
0.4521
|
M1
|
4(0.80%)
|
0(0.00%)
|
4(1.14%)
|
|
unknow
|
11(2.20%)
|
5(3.33%)
|
6(1.71%)
|
|
Construction and Validation of a Prognostic Model
To reduce multicollinearity and further identify the 8 prognostic cuproptosis-related DElncRNAs that were significantly correlated with the prognosis of HNSCC patients, through LASSO Cox regression analysis, three cuproptosis-related DElncRNAs (AL109936.2, MIR9-3HG and LINC01063) were used to establish a risk score to predict the OS of HNSCC patients in the TCGA training set. The cvfit and lambda curves are shown in Figs. 3B, C. In this model, the risk score of each patient was determined based on the following formula: Risk score = AL109936.2 * (-0.30836) + MIR9-3HG * (-0.20797) + LINC01063 * 0.29029. Then, the correlation heatmap further summarized the correlations between these 3 CRlncRNAs and 20 representative cuproptosis-related genes (Fig. 3D).
Next, we continued to estimate the prognostic values of the model. The risk score for all HNSCC patients in each set was calculated according to the formula mentioned above and ranked in ascending order. Then, HNSCC patients in each set were divided into high-risk and low-risk groups based on their high and low risk scores in the training set, respectively. Each patient’s survival outcome, risk status and lncRNA expression levels in the training set were shown in Fig. 4A, D and G, and we found that the mortality rate of patients gradually increased with increasing risk scores. Kaplan–Meier survival analysis demonstrated that the OS for the high-risk group was evidently worse than that for the low-risk group in the training set (Fig. 4J). ROC curve analysis was used to measure the sensitivity and specificity of this model and it demonstrated an AUC of 0.646 (training set), which was more than that of the other factors scrutinized demonstrating the robustness of our model in predicting HNSCC survival (Fig. 4M). Moreover, we found that the area under the ROC curves (AUC) of the prognostic model for OS was 0.646 at 1 year, 0.696 at 3 years, and 0.633 at 5 years (Fig. 4P). To better assess its predictive accuracy, the above results were replicated in both the validation set (Fig. 4B, E, H, K, N, Q) and the overall set (Fig. 4C, F, I, K, N, Q). In addition, the results of PCA (Fig. 4S, T, U) also suggested heterogeneity between the high- and low-risk subgroups. These results guarantee the credibility and practicality of our data.
To assess whether the risk score could be used as an independent risk factor for HNSCC, we performed Cox univariate and multivariate regression analyses in combination with clinical factors. Univariate and multivariate analyses of risk scores based on the training set (Fig. 5A, B), the validation set (Fig. 5C, D) and the overall set (Fig. 5E, F) all showed that the risk score of this model could be independent of other clinical factors and effectively predict the survival of patients (P < 0.05). In addition, we constructed a nomogram based on the risk score and clinical factors, which is more practical for predicting the survival of patients in clinical practice (Fig. 6A). We built calibration curves to evaluate the performance of the nomogram, and the results showed good calibration (Fig. 6B).
Gene Set Enrichment Analysis
According to the median risk score, GSEA was conducted to sort out significantly enriched pathways between the high-risk and low-risk groups. The GSEA results showed that there were 43 enriched pathways that were significantly enriched in the low-risk group, while there was no considerable enrichment of pathways in the high-risk group. Among these pathways were immune-associated and cancer-associated pathways, including T-cell receptor signaling pathway, B-cell receptor signaling pathway, Fc epsilon RI signaling pathway, leukocyte transendothelial migration, natural killer cell-mediated cytotoxicity, chemokine signaling pathway, cell adhesion molecules (CAMs), tight junction and apoptosis (Fig. 7).
Analysis of the Tumor Environment and Immune Infiltration
While studying the immune cell infiltration of all HNSCC patients in the TCGA database, our results showed that the main components of the immune environment were T lymphocytes and macrophages as obtained by the CIBERSORT algorithm (Fig. 8A). The stromal score (substrate cells in tumor tissue) and immune score (immune cell infiltration in tumor tissue) were determined with the ESTIMATE algorithm to explore differences in immune cell infiltration between the low- and high-risk groups. Surprisingly, we found differences in the immune score, but not the stromal score or ESTIMATE score (Fig. 8B). Comparing the proportion of immune cells between the high-risk and low-risk groups, we found significant differences in B cells, CD8 + T cells, immature dendritic cells, macrophages, mast cells, follicular helper T cells, T helper cell type 2 (Th2) cells, and tumor infiltrating lymphocytes between the two groups (Fig. 8C). Moreover, the association between immune cells in HNSCC indicated interactions within immune microenvironments (Fig. 8D).
ssGSEA results revealed a conspicuous alteration between the low- and high-risk groups in terms of T-cell functions. We found that there were significant differences in immune function scores including checkpoint, HLA, inflammation promoting, T-cell co-stimulation and Type II IFN response scores between the two groups (Fig. 9A). Subsequently, we checked the expression changes of the immune checkpoint genes in the high-risk and low-risk groups. As shown in Fig. 9B, a total of 31 tested common checkpoint genes displayed significant differences between the two groups. Many tested immunotherapy targets, such as PDCD1 (PD-1), IDO1, LAG3 and CTLA4 were highly expressed in the low-risk group.
Analysis of the Correlation Between the Risk Score and Chemotherapeutics
Cisplatin, cetuximab, docetaxel and paclitaxel are chemotherapy drugs commonly used for the treatment of HNSCC(Pfister et al., 2020), so we attempted to identify correlations between the risk score and the effectiveness of four common anticancer drugs. We found that high-risk patients were more sensitive to cisplatin and cetuximab than low-risk patients, while they were more resistant to docetaxel and paclitaxel than low-risk patients (Fig. 10A, B, C, D). These results can be helpful for the precise treatment of HNSCC.