The prevalence of C>T transitions was observed in ADC and C>A transitions in SCC(Fig 1A). An enrichment of C>A transversions was associated with smoking status that was observed in other cancers for which smoking was a significant risk factor. In SCC, the most frequent mutations were TP53 (41/47, 87.2%), TTN (38/47, 80.8%), CSMD3 (248/47, 51.1%), KMT2D (20/47,42.6%), RYR2 (19/47, 40.4%) and CDKN2A (19/47, 40.4%; Fig 1B). EGFR (27/44, 61.4%), TP53 (25/44, 56.8%), BLLAF1 (16/44, 36.4%), RHPN2 (13/44, 30.0%), SIGLEC10 (12/44, 27.3%) and ANKRD38C (11/44, 25.0%) were more frequent in ADC subgroup (Fig 1C). Most patients (44/47, 93.6) were former or current smokers in SCC subgroup and almost half of patients had smoking histories in ADC subgroup, thus we detected the genomic characters within smokers and non-smokers in ADC subgroup. Frequency of C>T transition was detected in non-smokers compared to the enrichment of C>A transversion in smokers (Supplemental Fig. 1A). We also found that TP53 (12/20, 60.0%), BCLAF1 (8/20, 40.0%), EGFR (8/20, 40.0%), RHPN2 (7/20, 35.0%), and USH2A (7/20, 35.0%) were frequent in tobacco smokers, whereas EGFR (19/24, 79.2%), TP53 (13/24, 54.2%), BCLAF1 (8/24, 33.3%), ORTE24 (7/24, 29.2%), ERLL11 (7/24, 29.2%), and SIGLEC10 (7/24, 29.2%) were common in patients without smoking history (Supplemental Fig. 1B). There were 61.4% of patients with an EGFR mutation (vs. 27% EGFR mutations in Caucasian NSCLC patients)[5] and 4% of ADC patients with an K-ras mutation (vs. 32% of Caucasian patients[5], suggesting an extraordinary distinct genotypes in different races of patients[24]. Moreover, the C>T was the most frequent type of transitions in ADC patients, whereas the C>A transversion occurred more frequently in non-EGFR mutated NSCLC patients (Supplemental Fig. 1C). The most comment co-mutations in ADC patients with EGFR were TP53 (16/27, 59.3%), BCLAF1 (12/27, 44.4%), RHPN2 (9/27, 33.3%) and ASTE1 (8/27, 29.6%), whereas the high frequent mutations in ADC without EGFR mutation were TP53 (9/17, 52.9%), SIGLEC10 (7/17, 41.1%), DST (5/17, 29.4%), LRP1B (5/17, 29.4%), MUC16 (5/17, 29.4%), NAV3 (5/17, 29.4%), OBSCN (5/17, 29.4%), TRIM48 (5/17, 29.4%), TTN (5/17, 29.4%), ZFHX3 (5/17, 29.4%), and ZFHX4 (5/17, 29.4%; Supplemental Fig. 1D).
We then analyzed the association of individual gene mutations with DFS of patients and found that mutation of NELL1, HERC2, LTN1, CYHR1, MUC5B, CUBN, OR4C15, PDE4DIP, PI3KCA, NBPF10, EYS, and GPR32 was inversely associated with DFS of SCC patients, whereas USH2A mutation associated with better DFS in SCC patients (Supplemental Fig. 2A). Moreover, mutation of RELN, HMCN1, OR2L8, and NALCN was associated with better DFS in ADC patients, whereas mutated MUC5B was associated with poor DFS in ADC patients (Supplemental Fig. 2B), although there was no association of any mutated genes with DFS after correction of the false discovery rate with the multi-tests.
Furthermore, we assessed the ten oncogenic signaling pathways that were genetically altered at high frequency in ADC and SCC[25]. We found that there were more mutated genes involved in these oncogenic signaling pathways in SCC compared to those of ADC. Among them, the RTK-RAS (82% in SCC and 84% in ADC) and TP53 pathways (93% in SCC and 59% in ADC) were frequently altered. Moreover, ARHGAP35 (5/47, 10.6%), FLT3 (5/47, 10.6%), IRS2 (4/47, 8.5%), KSR2 (4/47, 8.5%), and NF1 (4/47, 8.5%) in the RTK-RAS signaling were mutated in SCC, whereas alterations of EGFR (27/44, 61.4%), IRS2 (3/44, 6.8%), ARHGAP35 (2/44, 4.5%), ROS1 (2/44, 4.5%), ERBB4 (2/44, 4.5%) and RASGRP4 (2/44, 4.5%) occurred more frequently in ADC. The HIPPO and NOTCH signaling were the third and fourth signaling pathways that were enriched with genetic variants in NSCLC in our study. In the HIPPO pathways, the most frequent mutated genes were FAT1 (12/47, 25.5%), FAT3 (9/47, 19.1%), CRB1 (8/47, 17.0%), DSCHS1 (7/47, 14.9%), and FAT4 (7/47, 14.9%) in SCC, whereas mutation of FAT3 (6/44, 13.6%), HMCN1 (6/44, 13.6%), DCHS2 (5/44, 11.4%), FAT2 (3/44, 6.8%), and DCHS1 (2/44, 4.5%) occurred in ADC (Fig. 2C). In the NOTCH pathways, predominate alterations of FBXW7 (7/47, 14.9%), SPEN (5/47, 10.6%), and CNTN6 (5/47, 10.6%) occurred in SCC, whereas NUMBL (3/44, 6.8%), MAML3 (3/44, 6.8%), and THBS2 (3/44, 6.8%) were mutated in ADC (Fig. 2E). In addition, variants of the RTK-RAS and TP53 signaling pathways were predominant in non-smoking ADC patients and smoking ADC patients, respectively (Supplemental Fig. 3A). However, a quarter of patients (5/20, 25.0%) harbored variants of the TGF-β signaling pathway genes in ADC with smoking history, but only one patient (1/24, 4.2%) had mutated genes in the TGF-β signaling pathway in non-smoking ADC subgroup (Supplemental Fig. 3A). Again, ADC patients with wild type EGFR had a higher ration with variant in the WNT signaling genes (7/17, 41.2% vs. 7/17, 25.9% in EGFR-mutated ADC) and MYC signaling genes ( 3/17, 17.6% vs. 2/27, 7.3% in EGFR-mutated ADC)(Supplemental Fig. 3B). After that, we analyzed the associations of each pathway gene mutations with survival of patients, but unfortunately, we did not find any association of the altered single oncogenic pathway with DFS in both SCC and ADC (Supplemental Fig. 4A and B).
Association of a comprehensive immunogenomic profiling with DFS of ADC and SCC patients
After that, we performed a comprehensive analysis for the immunogenic profiling using the WES data for both ADC and SCC patients, including HLA-I number, HLA LOH, TMB, DNA repair pathway, and antigen presentation machinery and the calculation to predict the neoantigen to determine the HLA-binding affinity (< 500 nM of peptides derived from somatic SNVs and Indels). We found a similar HLA-I distribution of allele frequency.net of China Jiangsu Han (HLA-A/B, n=3238) & China South Han pop 2 (HLA-C, n=1098; Fig. 3A). Most patients had six HLA-I loci in both SCC (32/47, 68.1%) and ADC (34/44, 77.3%), while near a half of SCC patients had HLA LOH, but only a third ADC patients had HLA LOH (Fig. 3B). Moreover, there was no significant difference in HLA numbers and HLA LOH from EGFR mutation or tobacco smoking, respectively (Supplemental Fig. 5A-B). Our survival analysis revealed that neither HLA number nor HLA LOH status was associated with DFS of SCC or ADC patients (Supplemental Fig. 5C-D and Table 2).
The DNA damage repair (DDR) signaling includes eight pathways, i.e., the check point factors (CPF); homologous recombination repair (HRR); Fanconi anemia (FA); nucleotide excision repair (NER); mismatch repair (MMR); base excision repair (BER); nonhomologous end-joining (NHEJ); and DNA translation synthesis (TLS)[26]. Genomic variants in CPF (93% in SCC and 61% in ADC) and the HRR pathway (43% in SCC and 25% in ADC) were more obvious in both SCC and ADC. Specifically, only two SCC patients had no mutation in the DDR pathway, whereas one SCC patient had mutations in seven pathways of the DDR pathway. Overall, there was as high as 42.6% of SCC patients harboring genomic alterations in at least three pathways of the DDR. In contrast, ten ADC patients (10/44, 22.7) had no DDR related mutations and a quarter of ADC patients had deficiency in at least one DDR pathway (Fig. 4C), and mutations in at least two DDR pathways were observed in 45.5% ADC patients (20/44; Fig. 4D). Indeed, SCC patients showed a higher prevalence of gene alterations in the DNA repair pathway and the antigen presentation machines compared to those of ADC patients (Fig. 4C). Nevertheless, the frequency of the mutated genes participated in the DNA repair pathway and antigen presentation machinery were equally distributed in ADC patients with or without EGFR mutations (Supplemental Fig. 6A). However, interestingly, patients with tobacco smoking history had a distinct distribution of gene mutations involved in the DDR signaling, and serious deficiency of the DDR signaling was associated with TMB and NAL (Supplemental Fig. 6B).
The deficiency in the antigen presentation plays a key role in impairment of tumor neoantigen production and contributes to immune escape in lung cancer [23]. In this study, we explored the mutations of the antigen presentation machinery (APM)-related genes for association with prognosis of these Chinese lung cancer patients, but there was no positive or useful finding (Supplemental Fig. 7C-D). When tumors were stratified into dichotomy (low DDR index with gene mutations in < 3 pathways vs. high DDR index with gene mutations in ≥ 3 pathways), the low DDR index was associated with poor DFS of SCC patients (Fig. 4A and Table 2), although the DDR index did not have a predictable value for chemotherapy of SCC patients (Fig. 4A). However, interestingly, the DDR index was significantly associated with prediction of the neoantigen load, although there was no correlation observed between the DDR index and TMB in SCC patients, suggesting a key role of the DDR index in neoantigen production in SCC (Supplemental Fig. 8A). However, the HLA number and LOH were no differences between the low and high DDR index groups (Supplemental Fig. 8B). Furthermore, we observed a significant difference in TMB and NAL between low and high DDR index groups in ADC patients (Supplemental Fig. 8C), although there was no association of the DDR index in prediction of DFS in ADC (Supplemental Fig. 8D). There was also no significant difference in HLA number and LOH between low and high DDR indexes in ADC (Supplemental Fig. 8E).
Next, we found an average of 4.48 somatic mutations per Mb and 2.44 predicted neo-antigens identified in SCC samples, whereas 2.28 somatic mutations and 0.99 predicted neoantigens per Mb in ADC. Specifically, SCC had a higher level of TMB and NAL than those of ADC (Fig. 3C). However, there was no significant difference in TMB and NAL between mutated and wide type EGFR in ADCs (TMB, 0.46 mutations per Mb to 32.3 mutations per Mb; the median=1.66 mutations per Mb; NAL, 0.26 neoantigens per Mb to 21.82 neoantigens per Mb with the median of 0.98 neoantigens per Mb), but a narrow range of TMB and NAL in EGFR-mutated ADC (TMB, 1.11 mutations per Mb to 5.27 mutations per Mb with the median of 2.31 mutations per Mb; NAL, 0.52 neoantigens per Mb to 1.64 neo-antigens per Mb with a median of 1.45 neoantigens per Mb) (Supplemental Fig. 6B and Supplemental Fig. 9B). Similarly, there was no significant difference in TMB and NAL observed between former or current smokers and non-smokers (Supplemental Fig. 6A and Supplemental Fig. 9A).
A high TMB number per Mb (i.e., 4 mutations per Mb) was associated with better DFS of these 91 NSCLC patients (Supplemental Fig. 9C). However, there was no association observed between TMB number and DFS of ADC (Supplemental Fig. 9D) or SCC patients (Supplemental Fig. 9E). We then used NAL number (> 2 neoantigens per Mb as a cut-off point) and further analyzed these data and still not find any statistical significance in ADC (Supplemental Fig. 9F). However, there was an association of low NAL number with poor DFS (months, Hazard Ratio(HR)=2.56, 95% CI: 1.15-5.68, p=0.021) in SCC patients (Fig 4B and Table 2). Our multivariate analysis also showed that NAL number was an independent prognostic predicator for SCC patients, while the DDR index was not included in the multivariate analysis considering the strong association between the DDR index and NAL (see supplemental Table 1). Furthermore, we also found the benefit of the adjuvant chemotherapy in improvement of DFS in SCCs with a lower NAL number (Fig 4B). These results allowed us to explore the detailed association of DDR pathway and NAL numbers in SCC. As shown in Fig 4C, patients with high NAL numbers were enriched in high DDR index group and CPF, FA, and HRR pathways were the most frequently mutated among the high DDR index and NAL groups in SCC, indicating that the DNA damage repair pathway contributed to the neoantigen productions and neoantigen-directed immune surveillance favored SCC patients. However, the immune escape may be adapted by other immune escape mechanisms in ADC patients (Fig. 4C). In addition, we analyzed the different effects of TMB and NAL number in NSCLC and found that one half of oncogenic mutations did not create neoantigen, and indels variants created 1.75 folds neoantigens compared to SNV in our study among ADC and SCC, suggesting TMB number was not a good surrogate marker of the immunogenic neoantigen (Fig. 4D).
In addition, we assessed the different genomic features stratified by NAL numbers in SCC. Frequency of C>A transition was detected in high NAL compared to the enrichment of C>T transversion in low NAL number of SCC patients (Supplemental Fig. 10A). Our data showed that the frequent mutations enriched in the high NAL SCC subgroup were TP53 (26/28, 92.9%), TTN (25/28, 89.3%), CSMD3 (20/28, 71.4%), USH2A (15/28, 53.6%), RYR2 (14/28, 50.5%), and CDKN2A (13/28, 46.4%); Supplemental Fig. 10B), whereas mutations of TP53 (17/19, 89.5%), TTN (13/19, 68.4%), KMT2D (7/19, 36.8%), BCLAF1 (6/19, 31.6%), FLG (6/19, 31.6%), and LRP1B (6/19, 31.6%) were more frequent in the low NAL SCC subgroup (Supplemental Fig. 10B). Among the different signaling pathways, the RTK-RAS and TP53 pathways exhibited frequent alterations in both high and low NAL SCC cases. Interestingly, the higher frequent alteration of these ten signal pathways was enriched in the high NAL SCC group compared to that of the low NAL SCC group (Supplemental Fig. 10C and Supplemental Table 2). However, there was no significant difference in HLA number and LOH between high and low NAL SCC groups (Supplemental Fig. 10D and Supplemental Table 2). In line with previous results, patients with high NAL presented more frequent mutations involved in DNA damage repair pathway, and positively associated with DDR index, suggesting the deficiency in the DDR signaling contributed to production of neoantigens (Supplemental Fig. 10E and Supplemental Table 2). However, mutations involved in antigen presentation machine were equally in high and low neoantigen burden subgroup, indicating the deficiency of processing of antigens may be achieved by other mechanisms rather than gene mutations (supplement Fig 10F and Supplement Table 2).
In summary, predicted neoantigen load acted as a more useful indicator of immunogenicity than TMB, and provided effective stratification variable in prognosticating disease outcome and benefits from adjuvant chemotherapy for patients with lung squamous cell carcinoma.