1. Responder and non-responder in LUSC
We conducted the study as described in the flow chart (Fig. 1). The RNA-sequencing data of 502 LUSC patients were extracted from TCGA (Supplementary Tables 1A, 1B, and 1C). In the TIDE algorithm prediction of immunotherapy response, the 502 LUSC patients were divided into responder (n = 147) and non-responder (n = 355) groups. (Fig. 2A, Supplementary Tables 2). We then used CIBERSORT to calculate the abundance of cell infiltration in the tumor microenvironment for each LUSC RNA expression data (Supplementary Table 3), such as B cells naïve, B cells memory, plasma cells, T cells CD8, T cells CD4 memory resting, T cells CD4 memory activated et al. In addition, we also found significant differences in the distribution of macrophages M2, plasma cells, and T cells follicular helper in LUSC between the two groups (Fig. 2B). The ESTIMATE R package calculated the ESTIMATE scores of the two groups, and the results showed that the ESTIMATE scores of the non-responding group were higher than the responding group (p < 0.0001, Fig. 2C), indicating higher tumor purity and distribution of infiltrating stromal and immune cells in the responding group. Furthermore, the responder group showed a significantly higher TMB score (p = 0.0186, Fig. 2D, Supplementary Tables 4).
2. The identification of DEGs
To explore the potential biological feature under the responder group and non-responder group, a total of 44 DEGs were obtained between the two subtypes in LUSC (Supplementary Table 5). Four upregulated DEGs (including PLA2G12B, ANKS4B, HMGCS2, and C18orf26) and forty downregulated DEGs (including MYH13, LOC284688, C21orf96, RXFP3, CLDN19, OPN4, ADCYAP1R1, C1orf68, OBP2A, DIRC1, FGF5, MMP20, FAM71E2, CDH4, HTR1B, KCNA1, IGFL3, SPATA8, SPINK6, INSRR, IGF2AS, ZCCHC5, LOC100130386, TRIML2, EFNA2, C9orf11, CGB5, CCL27, GUCA1A, KLHL34, C3orf20, FOXB1, GALNT9, FAM19A3, C9orf144B, CASP14, NTN3, RPL3L, MAFA, and IRGM) were revealed (Fig. 3A).
3. Enrichment analysis of DEGs
Then GO, and KEGG enrichment analyses were performed on these DEGs. We got 20 enrichments according to BP and 18 enrichments according to CC (Supplementary Table 6). The less p -value and more significant enrichment were shown with the greater node size. The same color indicated the same function group. As shown in Fig. 3B, DEGs were mainly enriched in response to lipid (ADCYAP1R1, DYNAP, HTR1B, IRGM, and TRIML2), axonogenesis (CDH4, EFNA2, FOXB1, and NTN3), positive regulation of phosphorus metabolic process (ADCYAP1R1, CLDN19, DYNAP, GUCA1A, INSRR, IRGM), organic cyclic compound metabolic process (FOXB1, GUCA1A, HMGCS2, and MAFA), sensory perception (CLDN19, GUCA1A, KCNA1, OBP2A, and OPN4), nucleus (CASP14, CLDN19, FAM205A, FOXB1, MAFA, RTL3, and TRIML2), integral component of the plasma membrane (ADCYAP1R1, CDH4, HTR1B, INSRR, KCNA1, OPN4, and RXFP3), intracellular (ADCYAP1R1, C3orf20, CASP14, CLDN19, DYNAP, EQTN, FAM205A, FOXB1, GALNT9, HMGCS2, HTR1B, IRGM, KCNA1, MAFA, MYH13, NTN3, RPL3L, RTL3, and TRIML2). KEGG enrichment analysis obtained 19 typical pathways, including CDO in myogenesis, myogenesis, GPCR ligand binding, Class A/1 (Rhodopsin-like receptors), developmental biology, G alpha (i) signaling events, cell-cell junction organization, Ras signaling pathway, formation of the cornified envelope, cell junction organization, neuroactive ligand-receptor interaction, GPCR downstream signaling, signaling by GPCR, Keratinization, cell-cell communication, cell adhesion molecules (CAMs), axon guidance, tight junction, peptide ligand-binding receptors (Fig. 3C and Supplementary Table 7).
4. Regulatory network of Transcription factors for DEGs
A total of 318 TF gene expressions from the TCGA database were matched with the Cistrome Cancer database (Supplementary Table 8). The associations between TFs and DEGs were based on the correlation coefficient filter > 0.4 and p-value filter < 0.05. After the final screening, 19 pairs of TFs-DEGs were identified based on co-expression analysis (Supplementary Table 9 and Fig. 3D). Including 17 positive correlation coefficients (EBF1 and HTR1B, EBF1 and ZCCHC5, ETS1 and ZCCHC5, FLI1 and ZCCHC5, FOXA2 and PLA2G12B, FOXP3 and ZCCHC5, GATA3 and CASP14, GATA3 and FAM71E2, HNF1B and PLA2G12B, HNF4A and ANKS4B, MEF2C and ZCCHC5, NFE2 and PLA2G12B, RBP2 and PLA2G12B, RUNX1T1 and HTR1B, RUNX1T1 and ZCCHC5, SNAI2 and RPL3L, TCF21 and ZCCHC5) and two negative correlation coefficients (FOXA1 and C1orf68, PPARG and FAM71E2).
5. Overall survival analysis of DEGs between responder and non-responder groups in LUSC
The Kaplan-Meier plot analysis was performed to clarify the relation between the DEGs and LUSC overall survival. Among the 44 immune-related DEGs, 13 were significantly associated with overall survival of LUSC (p < 0.05). including ZCCHC5 (HR = 0.7, p = 0.023), FAM71E2 (HR = 1.59, p = 0.0079), DIRC1 (HR = 1.59, p = 0.0066), INSSR (HR = 1.35, p = 0.016), C1orf68 (HR = 1.39, p = 0.0086), EFNA2 (HR = 1.53, p = 0.0093), ESkine (HR = 1.14, p = 0.28), GUCA (HR = 1.51, p = 0.0018), HTR1B (HR = 1.31, p = 0.025), KLHL34 (HR = 1.41, p = 0.031), OBP2A (HR = 1.4, p = 0.04), RXFP3 (HR = 1.42, p = 0.0065), and SPINK6 (HR = 0.71, p = 0.048) in LUSC (Fig. 4).
6. Construction of prognostic model for LUSC based on Lasso analysis
We obtained the DEGs between the responder and non-responder groups and discovered their relation with LUSC overall survival from the above analysis. After the selection of lasso regression, when log (lambda) was between − 3 and − 4, 11 out of 44 DEGs were defined as an ideal element of the immune-related DEGs signature model, including MMP20, C18orf26, CASP14, FAM71E2, OPN4, CGB5, DIRC1, C9orf11, SPATA8, C9orf144B, and ZCCHC5(Fig. 5A and B). The risk score for predicting prognostic risk in LUSC patients was then calculated with the following formula: Risk score = (-0.08×Exp MMP20) + (-0.039×Exp C18orf26) + (0.016×Exp CASP14) + (-0.005×Exp FAM71E2) + (-0.145×Exp OPN4) + (0.0952×Exp CGB5) + (-0.167×Exp DIRC1) + (-0.163×Exp C9orf11) + (0.0249×Exp SPATA8) + (0.0805×Exp C9orf144B) + (0.231×Exp ZCCHC5). Subsequently, the LUSC patients were divided into high (n = 247) and low-risk score (n = 247) groups according to the mean value of risk scores (Supplementary Table 10). Figure 5C and D revealed that the low-risk score group obtained a significantly more favorable overall survival than the high-risk score group (Fig. 5C and D). The heatmap showed the different expressions of the identified genes in the prognosis model between high and low-risk score groups (Fig. 5E).
Furthermore, the prediction value of this signature model was evaluated by PCA and ROC. The results showed that the AUC is 0.67 and all LUSC samples can be well divided into high-risk and low-risk groups (Fig. 6A and B). In addition, we assessed the distribution of immune cells between high and low-risk score groups in LUSC, B cells memory, B cells naïve, neutrophils, and NK cells activated were significantly differentially expressed. (Fig. 6C).
7. Validation of the Prognostic Signature
The clinical data were obtained from the TCGA database (Supplementary Table 11), including gender (male and female), age (aged < = 65 and > 65), anatomic subdivision (L-lower, L- middle, L-upper, R-lower, R-middle, and R-upper), follow-up outcome (partial remission/response, complete remission/response, progressive disease, and stable disease), number pack-years smoked (packs from 0.15 to 240), pathologic T (tumor size, including T1, T2, T3, T4, and TX ), pathologic M (tumor metastasis, including M0, M1, and MX), pathologic N (tumor lymph node metastasis, including N0, N1, N2, and NX), pathologic stage (Stages I, II, III, and IV), person neoplasm cancer status (tumor or tumor-free), radiation therapy (no or yes), targeted molecular therapy (no or yes), and status (alive or dead). The univariate Cox regression analysis revealed that age_at_initial_diagnosis, pathologic_M, pathologic_T, pathologic_stage, cancer status, and risk score were correlated significantly with overall survival (Fig. 6D). The multivariate Cox regression analysis revealed that age_at_initial_diagnosis, pathologic_M, cancer status, and risk score possibly acted as an independent risk factor in LUSC (Fig. 6E). The heatmap showed risk group had a significant association with clinical features, including pathologic T, pathologic M, and pathologic stage (Fig. 7).