Genomic Signatures De ne Three Subtypes of EGFR-Mutant Stage II-III NSCLC With Distinct Adjuvant Therapy Outcomes


 The ADJUVANT study reported the comparative superiority of adjuvant gefitinib over chemotherapy in disease-free survival (DFS) of resected EGFR-mutant stage II-IIIA non-small cell lung cancer (NSCLC). However, not all patients experienced favorable clinical outcomes with TKI, raising the necessity for further biomarker assessment. By comprehensive genomic profiling of 171 tumor tissues from the ADJUVANT trial, five predictive biomarkers were identified (TP53 exon4/5 mutations, RB1 alterations, and copy number gains of NKX2-1, CDK4, and MYC. Then we integrated them into the Multiple-gene INdex to Evaluate the Relative benefit of Various Adjuvant therapies (MINERVA) score, which categorized patients into three subgroups with relative disease-free survival and overall survival benefits from either adjuvant gefitinib or chemotherapy (Highly TKI-Preferable, TKI-Preferable, and Chemotherapy-Preferable groups). This study demonstrates that predictive genomic signatures could potentially stratify resected EGFR-mutant NSCLC patients and provide precise guidance towards future personalized adjuvant therapy.


Introduction
Cisplatin-based adjuvant chemotherapy currently constitutes the standard-of-care after curative surgery for stage IIA-IIIB resected non-small cell lung cancer (NSCLC) 1,2 . However, the 5-year survival rate still remains unsatisfactory, with alarming levels of grade 3 toxicity observed in more than 60% of the patients 3 . Hence, alternative adjuvant regimens with epidermal growth factor receptor (EGFR) tyrosine kinase inhibitors (TKI) have been studied through several prospective trials 4,5 . The randomized phase III ADJUVANT study has actually presented signi cant prolonged disease-free survival (DFS) in EGFRmutant NSCLC, after adjuvant ge tinib, as compared to the DFS after chemotherapy with vinorelbine and cisplatin (VP) 6 . Two phase 2 trials, SELECT and EVAN, have shown improved 2-year DFS with erlotinib. Early revelations of the ADAURA trial also presented remarkable improvements of DFS with the third generation EGFR-TKI, osimertinib 7,8,9 . However, approximately 19% to 40% of TKI-treated patients still relapse after these trials 6, 7 , suggesting the inadequacy of EGFR-sensitizing mutants alone as a biomarker for adjuvant treatment selection.
The mixed clinical responses of NSCLC with targeted therapy can be attributed to molecular heterogeneity caused by different clonal populations, aggregated in a particular tumor, undergoing stagespeci c evolution. Resulting selective pressure then further induces subclonal mutations, and promotes tumor expansion 10, 11, 12 13 The most prevalent co-mutations, such as alterations in the TP53, RB1 and NKX2-1, usually cooperate to promote a local growth advantage, and support clonal expansion throughout tumor development 10,11,15 . In advanced EGFR mutant NSCLC, tumors with concurrent TP53 or RB1 mutations then further disrupted genome stability and exerted higher risks for histological transformation and TKI resistance 16 . In addition to gene level alterations, co-mutations on the exon levels can also affect patient outcomes 17 .
As early-stage NSCLC also shows a high degree of intratumor heterogeneity with divergent evolutionary lineages 14 , the established norm of estimating only a single driver oncogene through randomized trials for adjuvant targeted therapies fails to address the underlying complications of intratumor molecular heterogeneity. In this regard, the development of next-generation sequencing (NGS) technology has accelerated the analysis and integration of huge bulks of genomic signatures, thereby increased the focus on developing multi-gene predictive models for therapeutic decisions 18, 19 . Currently, in most singlearmed cohort studies, biomarkers were analyzed for their prognostic effects by comparing survival differences between mutant and wildtype patients. However, the more challenging question is whether these biomarkers result in distinct outcomes under different treatments to ultimately guide therapeutic decisions. Therefore, it is important to distinguish predictive markers from the prognostic ones at rst. The frequently used term "predict the prognosis" in many biomarker studies may confuse readers of the accurate de nition for these two types of biomarkers. Speci cally, a predictive biomarker differentiates treatment-speci c survival bene ts in biomarker-positive or negative patients 20 and further improves patients' treatment outcomes, while a prognostic biomarker discriminates good or poor survival of patients regardless of treatment. For example, aberrations in the tumor suppressor TP53 gene are known to correlate with worse prognosis comparing to TP53 wildtype cancers 17,21 .
Moreover, to reduce the complications in choosing the appropriate statistical tests, a standard and reliable analytical method has been endorsed by Rothwell 22 and applied in numerous studies 23,24 . As suggested, testing subgroup-treatment effect interaction is a prerequisite in reporting the predictive signi cance other than subjective observations of the survival curves. Subsequently, a linear discriminant using summation of all predictive values over the set of selected biomarkers is usually adopted for composite score development 25 . Herein in this study, we conducted a thorough explorative analysis of cancer-related genes through NGS of tumor tissues from the EGFR-mutant patients of the ADJUVANT trial, in an attempt to address important co-mutations and identify key predictive biomarkers for adjuvant treatment. We also integrate them into a robust predictive score that can categorize patients into subgroups with distinct survival bene ts under either adjuvant ge tinib, or chemotherapy for precision care.

Results
Identi cation of predictive biomarkers from differential DFS 171 patients from the ADJUVANT trial with available baseline surgical specimens have been enrolled for genomic pro ling (Fig. 1). The basic characteristics of the patients included in this exploratory cohort have been summarized in Supplementary Table 1. Comprehensive genomic pro ling of 422 cancerrelated genes revealed comparable frequencies of the highest mutated genes between the two treatment groups ( Supplementary Fig. 1). EGFR 19del (49% vs. 45%), L858R (47% vs. 53%), and copy number gain (CN gain, 17% vs. 26%) were equally distributed in the adjuvant ge tinib and VP groups. Other comutations, including TP53 (70% vs 64%), MCL1 (30% vs 16%), RB1 (25% vs 15%), NKX2-1 (20% in both), CDKN2A (16% vs 19%), PIK3CA (14% vs 17%), MDM2 (14% vs 9%), and CTNNB1 (7% vs 18%) also presented similar frequencies between the two cohorts. Of note, total 76/171 (44%) patients carried TP53 DNA binding domain missense mutations (exons 4-8). However, co-drivers frequently found in advanced diseases, e.g. BRAF mutations, ampli cations of ERBB2, or MET 13,15,26 , were not as prevalent in our early-stage cohort. We adopted the popular approach of testing DFS-based gene-by-treatment interaction effects to identify predictive genetic biomarkers for guiding treatment selection 22,27 . We evaluated the predictive power of each mutated gene, and identi ed the following ve predictive markers with signi cant treatment interactions ( Importantly, the treatment interactions remained signi cant for these ve predictors even after adjusting for clinical parameters (Supplementary Table 2). The negative adjuvant TKI predictor, RB1 alterations, combined RB1 mutations and RB1 CN loss, since they were functionally similar and both presented marginal signi cance of treatment interaction due to small sample size of each category (Supplementary  Table 3). Besides, as missense mutations on different TP53 exons might show distinct prognostic or predictive effects 28, 29 , these exons were analyzed separately. Like RB1 alterations, both TP53 exon 4 and 5 missense mutations (but not exons 6-8) showed marginal signi cance for treatment interactions and were therefore combined as a single predictive factor. Further, for prognostic analysis, we found that TP53 exon4/5 missense mutations [multivariate HR 2.69 (95% CI 1.60-4.52), P<0.001] and TP53 nonsense mutations [multivariate HR 1.69 (95% CI 1.08-2.65, P=0.022)] were both signi cantly correlated with worse outcomes irrespective of treatment arms, in concordance with TP53 as a factor for negative prognosis ( Supplementary Fig. 2, 3a, 3b). Other genetic aberrations that were signi cantly associated with prognosis were summarized in Supplementary Figure 3 and Supplementary Table 4.
Integrated MINERVA score via genomic signature Each of the ve biomarkers individually can predict the treatment outcomes for patient subgroups harboring each speci c genetic alteration, although, a multigene signature integrating all mutational events at patient level is essential for estimating a patient's overall response to the molecular heterogeneity of early-stage NSCLC. We, therefore, constructed a MINERVA score to quantitatively assess individual tumors and their corresponding treatment responses by summing z scores from individual treatment-by-interaction test of the ve selected genes. The resultant MINERVA scores of all the 171 tumors ranged from -7.09 to 2.88, including the 81 tumors (47.4%) that did not carry any alterations in the predictive genes (score=0) (Supplementary Fig. 4 and Methods). Based on the score distribution, we chose cutoffs between -0.5 and 0.5 to categorize the patients into three subgroups with lower score representing better response to adjuvant TKI. In the pre-categorized population, ge tinib signi cantly prolonged the median DFS, and increased the 2-year DFS rate, similar to the intention-to-treat (ITT) and modi ed ITT populations 6 (Fig. 2a). Remarkably, after categorization by MINERVA, the three subgroups demonstrated distinct treatment responses and underlying molecular pro les (Fig. 2b, c). The Highly TKI-Preferable group [HTP, N=60, 35% (score≤-0.5)] expressed signi cant superiority with adjuvant ge tinib [HR 0.21 (95% CI 0.10-0.44)], and was enriched with copy number gain of NKX2-1, CDK4 and MYC, and TP53 exon 4/5 missense mutations. The TKI-Preferable group [TP, N=87, 51% (score -0.5 to 0.5)] showed improved DFS among the pre-categorized and ITT populations [HR 0.61 (95% CI 0.35-1.07)]. Besides, this subgroup was characterized by the absence of most predictive biomarkers, except for sporadic coexistence of NKX2-1 and RB1 alterations, with contrasting effects due to opposing iHRs (Table 1). Moreover, a small subset of patients, the Chemo-Preferable Group [CP, N=24, 14% (score≥0.5)], despite having EGFR-positive tumors, showed greater response and enhanced DFS [HR 3.06 (95% CI 0.99-9.53)] under VP treatment, and harbored RB1 alterations (Fig. 2c).
In the TP group, the Kaplan-Meier estimate depicted similar curvatures as those observed in the pre- Strati cation of OS bene t by MINERVA score OS is generally considered as the standard endpoint for clinical trials. Although adjuvant ge tinib has shown signi cantly improved DFS relative to adjuvant VP, the DFS bene ts in the ITT population did not translate into a signi cant difference in OS of the ADJUVANT trial 30 , probably due to the combined in uences of downstream treatment crossovers and the genetic heterogeneity among the patient population. Hence, we further used MINERVA in an attempt to achieve strati cation of OS.
As expected, OS of the 171 pre-categorized patients involved in this study showed no difference between the two treatment groups (median, 76.9 months in the ge tinib group vs 67.1 months in the VP group; HR 0.87 (95% CI 0.57-1.35), P=0.54) ( Fig. 3a and Supplementary Fig. 5). Promisingly, MINERVA successfully demonstrated the strati cation of OS bene t as well. In HTP, ge tinib treatment led to signi cantly longer OS [median, not reached in the ge tinib group vs 48.7 months in the VP group; HR 0.43 (95% CI 0.21-0.88), P=0.018] with a clear and early separation of the Kaplan-Meier curves (Fig. 3b, c). Conversely, adjuvant VP treatment substantially improved OS in the CP group after 18 months [median, 36.4 months in the ge tinib group vs not reached in the VP group; HR 2.47 (95% CI 0.76-8.02), P=0.12] (Fig. 3b, e). OS in TP mirrored that of the pre-categorized cohort, suggesting no differences between the treatments (Fig.  3a, d). Likewise, the 2-, 3-and 5-year survival rates of the categorized subgroups demonstrated similar trends, with the survival differences between the two treatments in both HTP and CP groups widened over time ( Supplementary Fig. 6) Internal validation of MINERVA score We employed both ten-fold cross validation as well as LOOCV methods (as internal validation procedures) to evaluate the robustness of our MINERVA score. A relatively superior survival with adjuvant ge tinib treatment was observed in both HTP and TP subgroups, with an average of 3.5-and 1.9-fold increase in the 2-year DFS rate, respectively (Fig. 4a). The median DFS in these two subsets also increased by an average of 20 and 15 months, respectively (Fig. 4b), while the 2-year ge tinib-to-VP DFS ratio was less than 1, and the median DFS difference negative for all repeats in the CP group, suggesting greater survival bene t by adjuvant VP in this population. Among the 100 mock MINERVA score generated, 75% demonstrated signi cant treatment interaction with P-values <0.05, while 86% demonstrated interaction P-values <0.1 (Fig. 4c). We further validated the functionality of the original MINERVA score by LOOCV method. Adjuvant VP treatment in the HTP group was associated with markedly reduced DFS and OS (Fig. 4d, g). Meanwhile, adjuvant ge tinib treatment in the CP group was evidently inferior, similar to previously estimated results in Figures 2 and 3.

Discussion
To date, several prospective clinical trials, including our ADJUVANT trial, have presented the superiority of adjuvant TKI in early-stage EGFR-mutant NSCLC. The ADJUVANT trial had reached a median OS of 75.5 months by the database lock date, which is one of the best survival outcomes ever recorded for this patient population 31 . However, ge tinib's DFS superiority started declining after 36 months, and did not ultimately translate into OS bene t, raising concerns over achieving clinical cure by adjuvant TKI 32 . The heterogeneity in time-to-recurrence and OS observed within the ADJUVANT cohorts suggested high intertumor molecular heterogeneity in early-stage EGFR-mutant NSCLC 11 , necessitating additional predictive biomarkers to rede ne personalized adjuvant therapy.
In this rst biomarker exploration of adjuvant TKI in resected NSCLC, we selected ve genes that could independently predict the relative bene t between adjuvant TKI and chemotherapy. The multigene MINERVA score then integrated these biomarkers and effectively compensated for individuals' molecular heterogeneity. Notably, the three risk groups separated using this score counteracted the controversial impermanence of DFS bene t with exciting strati cation of overall survival bene ts as well. For each risk groups, we also found characteristic enrichment of biomarkers, possibly explaining their differential responses to adjuvant treatments.
In the CP group, RB1-altered/EGFR-mutant patients showed better survival with adjuvant VP rather than ge tinib. In advanced NSCLC, TKI-resistant RB1-inactivated/EGFR-mutated adenocarcinoma clones have been found to transdifferentiate into small-cell lung cancer (SCLC) and become more responsive to chemotherapy 33,34 . One of the potential mechanisms of SCLC transformation might be the disruption in expression of cell-state-determining factors due to RB1 inactivation 16, 35 . The resulting lineage plasticity then converts the therapy-dependent cancer cells to those that express neuroendocrine lineage markers, making them refractory to targeted treatments 36, 37 . RB1 often demonstrates mutual exclusivity with cell cycle pathway genes, and re ects chemosensitivity in rapidly progressing tumors 38 , which is in line with our CP population. In hope to eradicate this subclone, researchers have developed an upfront trial in which patients with advanced stage were assigned to receive TKI and small-cell directed chemotherapy (platinum plus etoposide) alternately (NCT03567642). Further research is required to explore whether TKI insensitivity of RB1-inactive/EGFR-mutant patients in the adjuvant setting also arise from early histological transformation events.
Despite the small VP-favoring population, patients in the HTP subgroup presented signi cant bene ts from adjuvant ge tinib therapy. Genomic analysis showed the enrichment of other four biomarkers, among which copy number gain of NKX2-1 received nearly equal weightage as RB1, but in an opposite predictive direction that favors the choice of ge tinib. NKX2-1 copy number gain is a widely prevalent oncogene found in up to 30% of EGFR-mutant patients 10 . NKX2-1 ampli cation has been more frequently observed in TKI-treated patients with extended progression-free survival (≥24 months) 15 . These ndings were consistent with its enrichment in the HTP population with relative ge tinib bene t.
Previous studies mainly reported favorable prognosis with NKX2-1 expression in mixed onco-driver, tumor stages and pathological backgrounds 39,40 , while our study is the rst to demonstrate the predictive value of NKX2-1 copy number gain to favor adjuvant TKI treatment in a more de ned population.
TP53 as a tumor suppressor occurred in more than 50% of NSCLC, with mutations of complicated functional properties 17 . Unlike most tumor suppressor genes, missense mutations in the critical DNA binding domain (exons 4 to 8) are the most common variants in TP53, which are associated with the lower disease control rate and poorer survival outcomes with TKI treatment in contrast to TP53 wildtypes in EGFR-mutant NSCLC 28, 41 . Apart from poor prognosis associated with loss-of-function TP53 mutations, studies have also revealed varied prognostic effects of missense mutations on different TP53 exons, suggesting possible functional divergence 29,43,44,45,46 . Interestingly, these missense variants could also be predictive for worse adjuvant chemotherapy outcomes compared to observation in resected NSCLC 42 . Along these lines, in this study, the predictive power of TP53 variants was also assessed by exons. In the multigene model, the positive predictivity of TP53 exons 4/5 missense mutations suggested that patients harboring these TP53 mutations would relatively bene t more from ge tinib than VP. We did not consider exons 6-8 because they lacked predictive signi cance for adjuvant therapies under the treatment-interaction test. In concert with the consensus on TP53's negative prognostic effect, we did observe signi cantly lesser outcomes in TP53-positive patients despite the treatment they received.
Ideally, the development of a multi-gene clinical predictor requires a well-designed prospective validation with appropriate assumptions and sample size structured to address the underlying molecular heterogeneity. However, like most exploratory data analyses of clinical studies, our study lacked an external validation cohort. This challenge is unique in our case as there were no available equivalent public or clinical datasets of adjuvant TKI-treated patients with regular follow-up of survival outcome at the time of this study, and any prospective validation trials might span over another decade to reach maturity. Moreover, this score incorporated a relatively small training cohort, which may introduce biased biomarker selection, or an over tted model. Therefore, it is important to exploit stringent statistical procedures to minimize cherry-picking during post hoc analyses. Cross-validating all the steps of model construction allowed us to evaluate whether the current algorithm could be uniformly applied to the entire cohort. Besides, only baseline specimen was examined in our study, while dynamic minimal residual disease (MRD) detection might provide additional information for the application of precise adjuvant TKI. However, consensus opinion on MRD's de nition and detection technologies needed to be settled rst.
Other limitations include insu cient tissue availability for retrospective genomic analysis of all the enrolled participants. However, both clinical characteristics and survival outcomes of the pre-categorized patients in this study were matched with those of the ITT population from the ADJUVANT trial.

Conclusions
The present exploratory retrospective analysis of the ADJUVANT trial unraveled the genetic constructs of EGFR co-mutations in stage II and III resected NSCLC. Further, the interplay between identi ed predictive markers and clinical outcomes was carefully examined, and incorporated into a multi-gene predictive score to aid the adjuvant paradigm. Our evidential MINERVA score presents a fresh perspective for future studies to examine its clinical validity, thereby guiding the development of more personalized adjuvant therapies, and their transition from bench to bedside.

Adjuvant Treated patients
All patients had Stage II-IIIA (N1-N2), EGFR-mutant NSCLC, underwent complete surgical resection and were randomized and treated in the ADJUVANT/CTONG1104 trial (NCT01405079) 6 . All except 27 patients initiated either adjuvant ge tinib or intravenous vinorelbine plus cisplatin between September 2011 and April 2014. All patients provided written informed consent for participating in ADJUVANT/CTONG1104 and this prede ned biomarker study. EGFR-activating mutations were evaluated using ampli cation-refractory mutation system PCR at time of enrollment.
Per protocol, patients were assessed for disease-free survival by chest CT scan and abdominal ultrasound every 3 months, brain MRI every 6 months, bone scan every 12 months from baseline until disease relapse or death (whichever comes rst) for up to 3 years. The survival after 3 years will be followed up semi-annually with telephone. Secondary endpoints, including overall survival, 3-year DFS rate, 5-year DFS rate, and 5-year OS rate, were also evaluated. Patients who were alive or lost to follow-up were censored on the last day they con rmed survival. The baseline demographics, clinical characteristics, and the primary end point data of ADJUVANT-CTONG1104, including the DFS, were collected from the previous publication 6 . Overall survival (OS) was updated either by phone interview, or on-site follow-up of the enrolled patients 30 . The study was approved by the research ethics boards of the participating hospitals, and was conducted in accordance with the ethical principles of the Declaration of Helsinki.

Sample details
Archived formalin-xed para n-embedded (FFPE) tumor tissue specimens of 175 patients from the ADJUVANT/CTONG1104 trial were obtained from 25 centers (Figure 1). Only patients with su cient and quali ed tumor tissue samples who had been treated by either adjuvant arms were included in this study.
Of these, a total of 171 patients positive for EGFR by NGS were subjected to further biomarker analyses (including 95 from the ge tinib group and 76 from the VP group).

DNA Extraction and Sequencing Library Preparation
Samples were sent to the CAP/CLIA (College of American Pathologists and Clinical Laboratory Improvements Amendments) accredited central laboratory at Nanjing Geneseeq Technology Inc. (Nanjing, China) for genomic DNA extraction and hybridization capture-based targeted NGS of 422 cancer-relevant genes. Protocols from previous publication were followed for both experimental procedures as well as sequence data analysis 48 . In detail, ve to eight 10 µm FFPE sections were rst de-para nized with xylene and then used for genomic DNA (gDNA) extraction by QIAamp DNA FFPE Tissue Kit (Qiagen) following the manufacturer's instructions. The extracted gDNA samples were quanti ed on Qubit 3.0 uorometer (Thermo Fisher Scienti c) and its purity was measured on Nanodrop 2000 (Thermo Fisher Scienti c), following by fragmentation to a size around 350 bp by using Covaris M220 sonication system (Covaris) and then puri ed by size selection with Agencourt AMPure XP beads (Beckman Coulter).
Fragmented gDNA were used to prepare DNA libraries with KAPA hyper library preparation kit (KAPA Biosystems) according to the manufacturer's protocol. Libraries were then subjected to PCR ampli cation and puri cation with Agencourt AMPure XP beads before targeted enrichment.
Libraries with different sample indices were rst pooled together to a total DNA amount of 2 µg and then subjected for targeted enrichment with IDT xGen Lockdown Reagents and a customized enrichment panel (Integrated DNA Technologies) covering the exonic regions of 422 genes and the introns of 16 fusion genes. The captured library was further ampli ed with Illumina p5 (5' AAT GAT ACG GCG ACC ACC GA 3') and p7 (5' CAA GCA GAA GAC GGC ATA CGA GAT 3') primers in KAPA Hi HotStart ReadyMix (KAPA Biosystems, Wilmington, MA) and puri ed with Agencourt AMPure XP beads. Sequencing libraries were quanti ed by qPCR with KAPA Library Quanti cation kit (KAPA Biosystems) and its size distribution was examined on Bioanalyzer 2100 (Agilent Technologies). The nal libraries were sequenced on Illumina Hiseq 4000 platform for 150 bp paired-end sequencing according to the manufacturer's instructions. All experimental procedures were performed using validated assays.

Sequencing Data Analysis
Raw sequencing data was analyzed by a validated automation pipeline. In brief, raw data were rst subjected to bcl2fastq for demultiplexing and then Trimmomatic 49 for FASTQ le quality control to remove low quality (base phred score below 30) and N bases. Reads were aligned to the reference human genome hg19 by Burrows-Wheller Aligner (BWA-mem, v0.7.12) 50 . PCR duplicates were removed by Picard.
Genome Analysis Toolkit (GATK 3.4.0) was employed to apply the local realignment around indels and recalibrate base quality score. Single-nucleotide variations (SNVs) and insertion/deletion mutations were called using VarScan2 with the following parameters: 1) for mutations with more than 20 recurrences in COSMIC, minimum variant allele frequency (VAF) = 0.01 with at least 3 minimum variant supporting reads; 2) for others, minimum VAF = 0.02 with at least 5 minimum variant supporting reads; in addition, all variants also need to meet the standards of minimum read depth = 20, minimum base quality = 25, variant supporting reads mapped to both strands, and strand bias no greater than 10%. CNV detection was using a self-developed pipeline, which has been validated in 38 samples against their droplet digital polymerase chain reaction (ddPCR) results as "gold standard". The system noise in copy  Smaller values of MINERVA correspond to greater chance of bene ting from TKI than from chemotherapy as an adjuvant treatment. Based on the distribution of MINERVA scores, the patients could be effectively categorized into three subgroups with scores of <-0.5, -0.5 to 0.5, and >0.5, with distinct DFS outcomes.
The relative survival bene t was compared using a 2-year DFS probability ratio and median DFS difference between ge tinib and VP arms 51 .
Internal Validation of the MINERVA score Ten-fold cross validation (CV) was used to evaluate the performance of our predictive model. In each iteration, interaction effect was assessed by cox proportional hazard model based on 90% of the samples. An identical modeling procedure as described above was performed for genetic marker selection and model construction. Mock MINERVA scores were then calculated for the remaining 10% samples based on the model developed in the training samples. A complete set of scores can be computed through ten repeated 10-fold CV and patients were assigned to one of the three MINERVA subgroups accordingly (HTP <-0.5, TP -0.5~0.5, or CP >0.5). Predictive effect of the mock MINERVA scores was then evaluated with interaction p-value. This process was repeated 100 times. Relative adjuvant treatment bene t within each subgroup was compared using 2-year disease-free survival probability and median disease-free survival difference between ge tinib and VP arms.
Next, we performed leave-one-out cross validation (LOOCV) to obtain individual scores. Each patient was omitted in alternation and then scored by MINERVA constructed on the remaining 170 patients (including both gene marker selection and model construction). The patient was then assigned to one of subgroups with the same score cutoffs. The entire procedure was repeated until each patient was left out once, scored and grouped. Kaplan-Meier curves were estimated for each subgroup to evaluate relative bene t of adjuvant ge tinib to VP therapy using 2-year disease-free survival ratio, median disease-free survival difference and the log-rank p values.

Statistical analysis
Univariate and multivariate analyses of the association of biomarkers, treatment interaction, and clinical factors with DFS were performed with the Cox proportional hazard regression model. Kaplan-Meier curves of DFS and OS were estimated for each subgroup, and statistically compared using the log-rank test. A two-sided P-value <0.5 was considered statistically signi cant. All statistical analyses were performed using R software (version 3.5.0).

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.