Development cohort
We collected a development cohort of 283 consecutive cytology samples from patients with lung adenocarcinoma. The samples were acquired via endobronchial ultrasound–guided FNA (64.7%, n=183), thoracentesis and paracentesis (16.6%, n=47 [46 pleural samples]), computed tomography–guided FNA (13.4%, n=38), and ultrasound-guided FNA (5.3%, n=15); most samples were cell block preparations (97.8%, n=277). Metastases accounted for 82.7% (n=234) of cases, and the majority were to lymph nodes (66.7%, n=156), followed by pleural fluid (20.1%, n=47), soft tissue (3.8%, n=9), bones (3.4%, n=8), adrenal glands (3.0%, n=7 [5 on the left]), liver (2.1%, n=5), and other sites (0.9%, n=2). The relevant demographic characteristics and clinicopathologic data for the development cohort are summarized in Table 1. The cohort was composed primarily of older individuals with a median age of 65.4 years (range: 27.5-90.2 years) and 151 (53.4%) women. Most patients were white, current or former smokers, and had stage IV disease at the time of data collection. All cases underwent FISH testing for ALK, ROS1, MET, and/or RET. The FISH results were negative in 250 (88.3%) cases. The rest of the cases were positive for rearrangements or amplification of ALK (5.7%, n=16), MET (1.8%, n=5), RET (0.7%, n=2), or ROS1 (0.3%, n=1), or were indeterminate (3.2%, n=9). Aneuploidy, defined as an increase or decrease in the number of fluorescent signals observed in a cell, was present in 193 (68.2%) cases, not present in 60 (21.2%), indeterminate in 27 (9.5%), and not assessed in three (1.1%) cases.
Mutational analysis, including NGS and PCR-based methods, was performed in 273 (96.5%) cases. NGS was performed in 77% (n=218) of the specimens, yielding positive mutations in 188 (86.2%) cases. PCR-based testing was performed in 55 (20%) cases, yielding positive mutations in 15 (27.3%) (9 in EGFR, 6 in KRAS, and 0 in BRAF). Of the cases with PCR-based testing, two cases had inadequate DNA material, and 38 cases were negative for single-gene testing. These 40 cases as well as 10 cases in which mutational analysis was not performed were excluded from further analysis, leaving 233 cases. Mutations were most frequent in TP53 (46.0%, n=107), KRAS (33.5%, n=78), and EGFR (26.2%, n=61). According to our proposed classification, 34 (14.6%) cases were classified as sTRU, 43 (18.5%) as sPP, and 46 (19.7%) as sPI. Cases with co-mutations included 26 (11.2%) with EGFR/TP53 and 34 (14.6%) with KRAS/TP53. There were 21 (9%) cases with mutations in genes other than EGFR, KRAS, and TP53 (non-TRUPPPI subtype) and 29 (12.4%) cases with no mutations detected.
The simplified molecular subtypes were statistically significantly associated with age, race/ethnicity, smoking status, and aneuploidy (Table 2). To identify further associations, we compared variables between patients within a given molecular subtype and the remaining patients. The sTRU subtype was associated with Asian race/ethnicity (23.5% vs. 7.6%, p=0.027) and never-smoker status (52.9% vs. 18.7%, p<0.001). The sPP subtype was associated with white race/ethnicity (86.0% vs. 73.0%, p=0.042). The sPI subtype was associated with male sex (63.0% vs. 41.2%, p=0.008). The EGFR/TP53 subtype was associated with younger age (mean age 56.9 vs. 65.8 years, p<0.001), Asian (19.2% vs. 8.7%) and Hispanic race/ethnicity (19.2% vs. 6.3%, p=0.026), never-smoker status (46.2% vs. 20.9%, p=0.016), and lack of aneuploidy (60.0% vs. 80.1%, p=0.038). The KRAS/TP53 subtype was associated with current smoking (55.9% vs. 21.7%, p<0.001). The non-TRUPPPI subtype was not associated with any of the covariates, and the no-mutation subgroup was associated with never-smoker status (41.4% vs. 21.2%, p=0.045) and aneuploidy (95.8% vs. 75.3%, p=0.019).
Validation cohort
To validate these findings and determine the impact of our subtypes on prognosis, we used a validation cohort (n=428) composed of core-needle biopsy samples or resection specimens from lung adenocarcinoma patients with available data on treatment and follow-up. Histomorphologic subtypes (e.g., mucinous, lepidic, acinar, and solid) were reported in 28.3% (n=121) of the pathology reports. The mutational data for this cohort were based only on NGS because all three target genes were not assessed in cases where PCR-based single-gene analysis was performed. Also, we included the 63 patients from the cytology cohort in the GEMINI database with treatment and follow-up data available. NGS results were available for 85.7% (n=54) of these cases.
Mutational profiling of lung adenocarcinoma patients in the validation cohort
Sequencing data were available for 484 (98.6%) patients in the combined validation cohort. NGS and PCR analyses yielded a total of 835 mutations/variants in 421 patients (87.0%). The median tumor percentage was 40% (range: 20 to 95 % tumor cells). Most of the genomic alterations were missense mutations (75%, n=618), followed by in-frame deletions (7%, n=58), nonsense (6.6%, n=55) and frameshift (5%, n=40) mutations, duplications (2.1%, n=18), complex mutations/indels (1.8%, n=15), splice mutations (1.4%, n=12), and gene amplifications (1.1%, n=10). Transversions included G>T (27%, n=222), T>G (7%, n=60), C>A (1.5%, n=13), and A>C (1%, n=8), and transitions included G>A (13%, n=106), C>T (12%, n=99), A>G (4%, n=34), and T>C (1%, n=9). The most common protein alterations were KRAS-G12C (n=58), EGFR-L858R (n=51), EGFR-E746_A750del (n=45), KRAS-G12V (n=36), KRAS-G12D (n=27), and EGFR-T790M (n=27). The mutational data for all 491 cases in the validation cohort, stratified by simplified molecular subtype, are summarized in Figure 1.
Clinical and histomorphologic associations according to simplified molecular subtypes
The simplified molecular subtypes were significantly associated with age, race/ethnicity, sex, smoking status, stage, histomorphology and FISH results (Table 3). As in the development cohort, variables were compared between patients within a given molecular subtype and the remaining patients. The sTRU subtype was associated with slightly older age (mean age 66.4 vs. 63.2 years, p=0.015), never-smoker status (62.2% vs. 26.3%, p<0.001) Asian race/ethnicity (17.6% vs. 6.6%, p=0.022), metastatic tumors (61.5% vs. 41.7%, p=0.013), non-mucinous (95.5% vs. 70.7%, p=0.014) and lepidic histology (63.6% vs. 36.4%, p=0.030), and stage IV disease (55.4% vs. 41.6%, p=0.007). The sPP subtype was associated with slightly older age (65.9 vs. 63.1 years, p=0.026), lower likelihood of never-smoker status (10.8% vs. 37.4%, p<0.001), black race/ethnicity (10.8% vs 6.5%, p=0.033), tumors with mucinous (42.9% vs. 17.4%, p=0.005) and non-acinar (80.0% vs. 58.1%, p=0.035) histology, non-metastatic tumors (69.1% vs 50.9%, p=0.016), negativity for ALK/ROS1/MET/RET abnormalities (96.8% vs 86.6%, p=0.003), and stage II disease (18.6% vs. 7.1%, p<0.001). The sPI molecular subtype was associated with male sex (55.4% vs. 39.4%, p=0.01), lower likelihood of never-smoker status (16.9% vs. 34.9%, p=0.003), and stage III disease (32.5% vs. 20.2%, p=0.047). Like the sTRU subtype, the EGFR/TP53 subtype was associated with younger age (mean age 59.1 vs. 64.5 years, p<0.001), Asian race/ethnicity (18.9% vs. 6.3%, p<0.001), never-smoker status (55.4% vs. 27.6%, p<0.001), and smaller tumors (mean tumor size 3.27 cm vs. 3.96 cm, p=0.042). In contrast, the KRAS/TP53 subtype was associated with Hispanic race/ethnicity (12.0% vs. 5.8%, p=0.035), lower likelihood of never-smoker status (6.0% vs. 34.8%, p<0.001), and solid (45.5% vs. 11.8%, p=0.011) and non-lepidic (90.9% vs. 55.5%, p=0.026) histology. The non-TRUPPPI subtype was associated with mucinous histology (62.5% vs. 22.1%, p=0.022), whereas the no-mutation subtype was associated with never-smoker status (46.0% vs. 29.7%, p=0.035) and acinar histology (57.1% vs. 31.0%, p=0.043). No significant associations were identified between subtype and alcohol intake.
Prognostic associations according to simplified molecular subtype classification
We assessed overall survival in the validation cohort as previously described. The median follow-up time was 1.87 years (interquartile range: 0.9-3.5 years). The median survival time was 5.93 (95% CI: 4.57-not reached) years. We fitted a multivariate Cox proportional hazards regression model to assess for associations between OS and the covariates of age, sex, alcohol intake, stage, molecular subtype, and treatment (surgery, radiation, and/or chemotherapy), selected on the basis of univariate analyses with a cutoff p value of 0.25. We observed that patients with older age (hazard ratio [HR]=1.03, 95% CI: 1.01-1.05) and stage IV disease (HR=6.51, 95% CI: 2.49-17.04, p=0.006) had worse OS. Although the difference was not statistically significant, patients in the sTRU subtype had better OS (HR=0.42, 95% CI: 0.18-1.00, p=0.051) whereas patients in the KRAS/TP53 subtype had worse OS (HR=2.15, 95% CI: 1.02-4.53, p=0.043) than those in the other subtypes (Figure 2A & B). Patients who underwent surgical resection (HR=0.33, 95%CI: 0.18-0.60, p<0.001) had better OS than those who did not have surgery. Figure 3A shows significant differences in OS within this subset of patients. We observed statistically significant differences in OS between the sTRU, sPP, and sPI subtypes (Figure 3B), however, differences between these subtypes when categorized in early stages I and II or late stages III and IV did not reach statistical significance (not shown). Interestingly, when compared with patients who underwent surgery, no significant differences in OS were observed in patients who received chemotherapy, and OS was significantly worse in those who received radiation therapy, regardless of molecular subtype (HR=1.87, 95% CI: 1.18-2.96, p=0.007). Notably, OS did not significantly differ between the sTRU and EGFR/TP53 subtypes (log-rank test p=0.84), either in all patients (not shown) or in the patients who underwent surgery (Figure 3C), suggesting that these subtypes could represent a single group. Conversely, the KRAS/TP53 subtype showed the poorest OS both among all patients (HR=2.15, 95% CI: 1.02-4.53, p=0.043) (Figure 2B) and among those who underwent surgery (HR=1.935, 95% CI: 0.923-4.058) (Figure 3D).