Construction and Validation of the IRGPI
Table 1 summarized the clinical characteristics of patients enrolled in this study. 1179 IRGs were detected by the platforms mentioned in the sTable1 in the Supplement. 694431 IRGPs were constructed and 97.4% of them were excluded. The log-rank test was used to assess the association between the remaining 18396 IRGPs and the overall survival of patients in the derivation cohort. 401 IRGPs with a p-value of less than 0.01 were selected to fit a Cox proportional hazards regression model with the least absolute shrinkage and selection operator (LASSO) (Fig 1.a, b). The final 68 IRGPs and LASSO coefficients were shown in sTable2 in the Supplement. 68 IRGPs (123 unique immune-related genes) were used to construct IRGPI via L1-penalized Cox proportional hazards regression in the derivation data set. Based on the time-dependent ROC curve analysis, the optional cutoff for IRGPI was 1.32 (sFig2 in the Supplement). Survival curves of low- and high-risk groups were estimated by using the Kaplan–Meier method and were compared by using the log-rank test in the derivation and validation cohorts (Fig 1.c and e). In the TCGA derivation cohort, the AUC of time-dependent ROC curves at 1, 3 and 5 years was 0.854, 0.953 and 0.944, respectively (Fig 1.d). In the GSE37745 validation cohort, the AUC of time-dependent ROC curves at 1, 3 and 5 years was 0.764, 0.728 and 0.710, respectively (Fig 1.f).
Validation of IRGP as a prognostic factor of Early-stage LUSC
Hazard ratios between high-risk and low-risk groups based on IRGPI were shown in the forest plot (Fig 2). The IRGPI stratified patients with early-stage (stage I and II) LUSC into different prognostic groups. In early-stage (stage I and II) LUSC, the hazard of death among high-risk group was 10.51 times that of low-risk group (HR, 10.51; 95%CI, 6.96-15.86; p<0.001) in the TCGA derivation cohort. In early-stage (stage I and II) LUSC, the hazard of death among high-risk group was 2.26 times that of low-risk group (HR, 2.26; 95%CI, 1.2-4.25; p=0.009) in the GSE37745 validation cohort and was 3.2 times that of low-risk group (HR, 3.2; 95%CI, 0.98-10.4; p=0.042) in the GSE41271 validation cohort. In stage I LUSC, the hazard of death among high-risk group was 9.77 times that of low-risk group (HR, 9.77; 95%CI, 5.86-16.3; p<0.001) in the derivation cohort and was 2.33 times that of low-risk group (HR=2.33, 95%CI, 1.11-4.88; p=0.021) in the GSE37745 validation cohort. In stage II LUSC, the IRGPI remained prognostic accuracy for the derivation cohort (HR=11.39; 95%CI, 5.66-22.9; p<0.001) and GSE41271 validation cohort (HR=9.8; 95% CI, 0.98-98.2; p=0.02). Table 2 demonstrated the univariate and multivariate analyses of IRGPI and clinical characteristics. In the GSE37745 validation cohort, IRGPI (HR, 2.95; 95%CI, 1.5-5.79; p=0.002) and old age (HR, 1.05; 95%CI, 1.01-1.1; per year increase; p=0.026) were independent risk factors for poor prognosis.
Immune Infiltration related to IRGPI
Gene ontology (GO) of the unique 123 LUSC IRGs was shown in the sFig1 in the Supplement. Most of the biological processes were cytokine biosynthetic, secreted and metabolic processes. The molecular function of GO concentrated on cytokine receptor binding, receptor-ligand activity, receptor regulator activity, etc. In the early-stage LUSC TCGA cohort, the percentages of 22 immune cells infiltration were shown in Fig 3.a and Fig 3.b. Patients in high-risk group had higher proportions of neutrophils, monocytes and activated mast cells infiltrations in their tumors (1.63% vs 0.72%, p=0.001; 0.57% vs 0.35%, p=0.041; 1.64% vs 1.03%, p=0.007, respectively). But the infiltrations of CD8+ T cells and T follicular helper cells were lower in the high-risk group, as compared with the low-risk group (6.94% vs 9.63%, p=0.004; 2.15% vs 3%, p=0.002, respectively) (Fig 3.c).
Comparison of biomarkers for LUSC
We summarized current biomarkers for LUSC and compared the biomarkers in Table 3. Li et al. constructed a model with 6 lncRNAs from TCGA LUSC cohort and the area under the curve (AUC) of the 6-lncRNA signature associated with 3-year survival was 0.672 in the training cohort[11]. Hu et al. constructed a 3-lncRNA signature for LUSC and the AUC of this model associated with 3-year survival was 0.629 in the training cohort[12]. Qi et al. identified 12 miRNAs closely related to the overall survival of patients with LUSC[13]. Yang et al. identified the diagnostic role of miRNA-486-5p in TCGA LUSC cohort[26]. Li et al. found that 60 genes were statistically related to the overall survival rate in LUSC patients[10]. Shi et al. identified 6 methylation biomarkers for LUSC diagnosis[27]. Li et al. created prognostic predictors based on alternative splicing events for NSCLC patients and the AUC for prognostic predictor was over 0.8 in TCGA LUSC cohort[9]. Choi et al. found that MLL2 mutations predicted poor prognosis in both TP53 mutant and wild-type LUSC[8]. Gao et al. identified a prognostic model contained 5 genes and the AUC of the model for predicting the survival at 1, 3, and 5 years was 0.692, 0.722, and 0.651, respectively[28]. For TCGA derivation cohort, the AUC of IRGPI in at 1, 3 and 5 years was 0.854, 0.953 and 0.944, respectively. For GSE37745 validation cohort, the AUC of IRGPI at 1, 3 and 5 years was 0.764, 0.728 and 0.710, respectively.