Identification of predictive biomarkers from differential DFS
171 patients from the ADJUVANT trial with available baseline surgical specimens have been enrolled for genomic profiling (Fig. 1). The basic characteristics of the patients included in this exploratory cohort have been summarized in Supplementary Table 1. Comprehensive genomic profiling of 422 cancer-related genes revealed comparable frequencies of the highest mutated genes between the two treatment groups (Supplementary Fig. 1). EGFR 19del (49% vs. 45%), L858R (47% vs. 53%), and copy number gain (CN gain, 17% vs. 26%) were equally distributed in the adjuvant gefitinib and VP groups. Other co-mutations, including TP53 (70% vs 64%), MCL1 (30% vs 16%), RB1 (25% vs 15%), NKX2-1 (20% in both), CDKN2A (16% vs 19%), PIK3CA (14% vs 17%), MDM2 (14% vs 9%), and CTNNB1 (7% vs 18%) also presented similar frequencies between the two cohorts. Of note, total 76/171 (44%) patients carried TP53 DNA binding domain missense mutations (exons 4-8). However, co-drivers frequently found in advanced diseases, e.g. BRAF mutations, amplifications of ERBB2, or MET 13, 15, 26, were not as prevalent in our early-stage cohort.
We adopted the popular approach of testing DFS-based gene-by-treatment interaction effects to identify predictive genetic biomarkers for guiding treatment selection 22, 27. We evaluated the predictive power of each mutated gene, and identified the following five predictive markers with significant treatment interactions (Table 1 and Methods): RB1 alterations [interaction hazard ratio (iHR) 4.07, 95% confidence interval (CI) 1.56-10.58, P=0.004], NKX2-1 CN gain [iHR 0.26 (95% CI 0.10-0.68), P=0.006], CDK4 CN gain [iHR 0.14 (95% CI 0.03-0.77), P=0.024], TP53 exon4/5 missense mutations [iHR 0.33 (95% CI 0.12-0.93), P=0.035], and MYC CN gain (iHR 0.10 (95% CI 0.01-0.98), P=0.048). Here, negative iHR indicated relative better survival with adjuvant TKI while positive iHR indicated relative benefit with adjuvant chemotherapy. Importantly, the treatment interactions remained significant for these five predictors even after adjusting for clinical parameters (Supplementary Table 2). The negative adjuvant TKI predictor, RB1 alterations, combined RB1 mutations and RB1 CN loss, since they were functionally similar and both presented marginal significance of treatment interaction due to small sample size of each category (Supplementary Table 3). Besides, as missense mutations on different TP53 exons might show distinct prognostic or predictive effects28, 29, these exons were analyzed separately. Like RB1 alterations, both TP53 exon 4 and 5 missense mutations (but not exons 6-8) showed marginal significance for treatment interactions and were therefore combined as a single predictive factor. Further, for prognostic analysis, we found that TP53 exon4/5 missense mutations [multivariate HR 2.69 (95% CI 1.60-4.52), P<0.001] and TP53 nonsense mutations [multivariate HR 1.69 (95% CI 1.08-2.65, P=0.022)] were both significantly correlated with worse outcomes irrespective of treatment arms, in concordance with TP53 as a factor for negative prognosis (Supplementary Fig. 2, 3a, 3b). Other genetic aberrations that were significantly associated with prognosis were summarized in Supplementary Figure 3 and Supplementary Table 4.
Integrated MINERVA score via genomic signature
Each of the five biomarkers individually can predict the treatment outcomes for patient subgroups harboring each specific genetic alteration, although, a multigene signature integrating all mutational events at patient level is essential for estimating a patient’s overall response to the molecular heterogeneity of early-stage NSCLC. We, therefore, constructed a MINERVA score to quantitatively assess individual tumors and their corresponding treatment responses by summing z scores from individual treatment-by-interaction test of the five selected genes. The resultant MINERVA scores of all the 171 tumors ranged from -7.09 to 2.88, including the 81 tumors (47.4%) that did not carry any alterations in the predictive genes (score=0) (Supplementary Fig. 4 and Methods). Based on the score distribution, we chose cutoffs between -0.5 and 0.5 to categorize the patients into three subgroups with lower score representing better response to adjuvant TKI. In the pre-categorized population, gefitinib significantly prolonged the median DFS, and increased the 2-year DFS rate, similar to the intention-to-treat (ITT) and modified ITT populations6 (Fig. 2a). Remarkably, after categorization by MINERVA, the three subgroups demonstrated distinct treatment responses and underlying molecular profiles (Fig. 2b, c). The Highly TKI-Preferable group [HTP, N=60, 35% (score≤-0.5)] expressed significant superiority with adjuvant gefitinib [HR 0.21 (95% CI 0.10-0.44)], and was enriched with copy number gain of NKX2-1, CDK4 and MYC, and TP53 exon 4/5 missense mutations. The TKI-Preferable group [TP, N=87, 51% (score -0.5 to 0.5)] showed improved DFS among the pre-categorized and ITT populations [HR 0.61 (95% CI 0.35-1.07)]. Besides, this subgroup was characterized by the absence of most predictive biomarkers, except for sporadic co-existence of NKX2-1 and RB1 alterations, with contrasting effects due to opposing iHRs (Table 1). Moreover, a small subset of patients, the Chemo-Preferable Group [CP, N=24, 14% (score≥0.5)], despite having EGFR-positive tumors, showed greater response and enhanced DFS [HR 3.06 (95% CI 0.99-9.53)] under VP treatment, and harbored RB1 alterations (Fig. 2c).
In the TP group, the Kaplan-Meier estimate depicted similar curvatures as those observed in the pre-stratified and ITT populations 6 (Fig. 2a, e), indicating that adjuvant gefitinib achieved a superior DFS. Importantly, the survival curves of the post-categorized HTP and CP populations did not converge at any point (Fig. 2d, f). In HTP, the Kaplan-Meier curves separated widely as early as six months, with a slow descent of the adjuvant gefitinib arm (median DFS, 34.5 months; P<0.001). Conversely, a drastic drop of the VP arm towards a median DFS of 9.1 months was observed with all recurrence by 36 months. Therefore, the relative benefit of gefitinib was represented by a 6.4-fold increase in the 2-year DFS rate [70.3% (95% CI, 55.8-88.7) vs 11.0% (3.1-38.7)] and a 25.4-month longer median DFS (Fig. 2d). In the CP group, Kaplan-Meier curves diverged at 18 months with an immediate sharp decline of the gefitinib arm towards a median of 19.3 months. Meanwhile, 70% of the VP arm continued to benefit after 24 months (median DFS, 34.2 months, P=0.041). The superiority of adjuvant VP was reflected by a 1.7-fold increase in the 2-year DFS rate [69.2% (48.2-99.5)], including a 14.9-month longer median DFS, compared to the 41.6% 2-year DFS rate for gefitinib (95% CI 19.9-86.8) (Fig. 2f).
Stratification of OS benefit by MINERVA score
OS is generally considered as the standard endpoint for clinical trials. Although adjuvant gefitinib has shown significantly improved DFS relative to adjuvant VP, the DFS benefits in the ITT population did not translate into a significant difference in OS of the ADJUVANT trial 30, probably due to the combined influences of downstream treatment crossovers and the genetic heterogeneity among the patient population. Hence, we further used MINERVA in an attempt to achieve stratification of OS.
As expected, OS of the 171 pre-categorized patients involved in this study showed no difference between the two treatment groups (median, 76.9 months in the gefitinib group vs 67.1 months in the VP group; HR 0.87 (95% CI 0.57-1.35), P=0.54) (Fig. 3a and Supplementary Fig. 5). Promisingly, MINERVA successfully demonstrated the stratification of OS benefit as well. In HTP, gefitinib treatment led to significantly longer OS [median, not reached in the gefitinib group vs 48.7 months in the VP group; HR 0.43 (95% CI 0.21-0.88), P=0.018] with a clear and early separation of the Kaplan-Meier curves (Fig. 3b, c). Conversely, adjuvant VP treatment substantially improved OS in the CP group after 18 months [median, 36.4 months in the gefitinib group vs not reached in the VP group; HR 2.47 (95% CI 0.76-8.02), P=0.12] (Fig. 3b, e). OS in TP mirrored that of the pre- categorized cohort, suggesting no differences between the treatments (Fig. 3a, d). Likewise, the 2-, 3- and 5-year survival rates of the categorized subgroups demonstrated similar trends, with the survival differences between the two treatments in both HTP and CP groups widened over time (Supplementary Fig. 6). The 5-year OS rates of gefitinib-treated HTP patients and VP-treated CP patients were 67.3% (95% CI 52.4-86.4) and 61.5% (95% CI 40.0-94.6), respectively, both of which were significantly higher than those attained in the pre-categorized cohort [gefitinib, 55.7% (95% CI 46.2-67.0); VP, 51.5% (95% CI 41.2-64.3)].
Internal validation of MINERVA score
We employed both ten-fold cross validation as well as LOOCV methods (as internal validation procedures) to evaluate the robustness of our MINERVA score. A relatively superior survival with adjuvant gefitinib treatment was observed in both HTP and TP subgroups, with an average of 3.5- and 1.9-fold increase in the 2-year DFS rate, respectively (Fig. 4a). The median DFS in these two subsets also increased by an average of 20 and 15 months, respectively (Fig. 4b), while the 2-year gefitinib-to-VP DFS ratio was less than 1, and the median DFS difference negative for all repeats in the CP group, suggesting greater survival benefit by adjuvant VP in this population. Among the 100 mock MINERVA score generated, 75% demonstrated significant treatment interaction with P-values <0.05, while 86% demonstrated interaction P-values <0.1 (Fig. 4c). We further validated the functionality of the original MINERVA score by LOOCV method. Adjuvant VP treatment in the HTP group was associated with markedly reduced DFS and OS (Fig. 4d, g). Meanwhile, adjuvant gefitinib treatment in the CP group was evidently inferior, similar to previously estimated results in Figures 2 and 3.