Patient characteristics
A total of 41 treatment-naïve patients with NSCLC consented to participate in the study. The cohort included a majority of males (68.3%, 28/41) with a median age of 65, ranging from 36 to 81 years. A majority (78.0%; 32/41) was diagnosed with lung adenocarcinoma, 7 patients were diagnosed with squamous cell carcinoma, and 2 patients with neuroendocrine tumor. A majority of the patients (92.7%; 38/41) had locally-advanced to advanced disease (stage IIIB to IV), the remaining 3 patients had stage IIIA disease. Table 1 lists the baseline clinicopathologic features of the cohort.
Table 1
Baseline clinicopathologic features of the cohort
Clinicopathologic features | n = 41; n(%) |
Age |
Median(range) | 65(36 ~ 81) |
Gender |
Male | 28 (68.3%) |
Female | 13 (21.7%) |
Smoking status |
Smoker | 27 (65.9%) |
Never smoker | 14 (34.1%) |
Histology |
Lung adenocarcinoma | 32 (78.0%) |
Lung squamous cell carcinoma | 7 (17.1%) |
Other NSCLC | 2 (4.9%) |
Degree of cellular differentiation of sputum cytology |
Low | 20 (48.9%) |
Medium | 9 (22.0%) |
High | 7 (17.1%) |
NA | 5 (12.2%) |
Location of primary tumor |
Central | 22 (53.7%) |
Peripheral | 19 (46.3%) |
Stage |
≤ IIIA | 3 (7.3%) |
IIIB-IIIC | 7 (17.1%) |
IV | 31 (75.6%) |
Abbreviations: NA, not applicable; NSCLC, non-small-cell lung cancer |
Sample distribution and quality control of the samples
Matched tumor, blood, and sputum were collected from all the patients; however, some samples were unavailable for mutation profiling due to insufficient sample volume (n = 23), and inadequate DNA quality for library construction (n = 2). Of them, tumor samples were available for 38 patients. Blood samples were available for 39 patients. Sputum supernatant and sputum sediment samples were available from 29 and 35 patients, respectively. Table S1 summarizes the distribution of the cohort according to sample type.
DNA was extracted from a total of 141 available samples, with an average DNA yield of 1571.8 ng for tumor, 121.9 ng for plasma, 2766.0 ng for sputum supernatant, and 8144.5 ng for sputum sediment (Figure S1A). The distribution of the library complexity and insert size of all the sequenced samples revealed similar distribution for tumor biopsy samples and sputum sediments, which was distinct from plasma and sputum supernatant samples (Figure S1B). A majority of the tumor and sputum sediment samples had insert sizes between 150 to 250 base pairs; while most of the plasma and sputum supernatant samples had insert sizes between 150 to 175 base pairs (Figure S1B). The sequencing achieved a median depth of 1,275⋅ for tumor samples, 16,326⋅ for plasma samples, 10,549⋅ for sputum supernatant, and 16,660⋅ for sputum sediment (Figure S1C).
We then compared the detection rates and maximum allelic fraction (maxAF) of plasma, sputum supernatant, and sputum sediment samples using matched tumor as reference. In general, relative to tumor samples, the detection rates for the 168-gene panel were 76.9% for plasma (n = 39), 72.4% for sputum supernatant (n = 29), and 65.7% for sputum sediment samples (n = 35) (Fig. 1A). Meanwhile, the detection rates for the 8 oncogenic driver genes and TP53 (9 genes) were 71.8% for plasma, 62.1% for sputum supernatant, and 51.4% for sputum sediment samples (Fig. 1B). Using tumor tissue samples as reference, sputum supernatant, sputum sediment, and plasma samples achieved a positive predictive value (PPV) of 80.4%, 55.6%, and 73.3% and sensitivity of 36.9%, 31.3%, and 50%, respectively, when considering the 168-genes similarly included in the panels. Meanwhile, all the samples achieved a PPV of 85.7%, 86.7%, and 90.9% and sensitivity of 50.0%, 39.4%, and 51.3%, respectively, when only considering the 8 classic oncogenic genes. The maxAF was significantly higher in tumor samples as compared to plasma (P < 0.001), sputum supernatant (P < 0.001), and sputum sediment (P < 0.001) in either the 168 genes or the 9 genes (Fig. 1C; Fig. 1D). However, maxAF was similar in plasma and sputum supernatant samples in either the 168 genes (P = 0.81; Fig. 1C) or the 9 genes (P = 0.55; Fig. 1D).
These data indicate that the DNA extracted from sputum supernatant and sputum sediment samples have adequate quality and sufficient quantity for NGS-based genomic profiling.
Mutation detection in sputum supernatant and sediment
Mutation profiling of sputum supernatant samples detected a total of 106 mutations in 52 genes from 21 patients, revealing a detection rate of 72.4% (Fig. 2A). Of these mutations, 81 were missense mutations, 4 were Indels, 6 were frameshift, 4 were splice-site variants, 2 were stop-gained, 3 were CNVs, and 6 were genomic rearrangements. The most frequent mutations detected from sputum supernatants were TP53 (31.0%), EGFR (13.8%), KRAS (13.8%), and ALK (13.8%). Among the eight classic NSCLC oncogenic driver genes, actionable mutations were detected in 14 patients, including EGFR mutations (p.L858R, n = 1; p. E746_A750del n = 3), EML4-ALK fusions (n = 4), KRAS G12V/C/D/Q61H mutations (n = 4), ERBB2 A622S mutation (n = 1), and CD74-ROS1 fusion (n = 1). No mutations were detected in BRAF, MET, and RET from our cohort.
Meanwhile, a total of 276 mutations in 75 genes were detected from matched sputum sediment samples from 14 patients, revealing a detection rate of 60.9%. Detection rates were comparable between sputum supernatant samples and its matched sputum sediment (78.3% vs 65.2%; P = 0.51). By considering the overall number of mutations detected from 23 patients with both sample types using the 168 gene panel, the mutations detected from both sputum supernatant and sediment samples were 47.8% concordant. Figure 2B illustrates the mutations detected in both or either of the sputum supernatant and/or sediment samples from the 23 patients. Based on the distribution of mutation types among the 168 genes, fusions achieved a concordance rate of 80.0%, SNVs and Indels had a concordance rate of 30.4%, while CNVs had very low detection rate, with only 1 sputum supernatant sample and no sputum sediment samples were detected with CNV (Table S2). Actionable mutations were detected from the matched sputum supernatant and sediment from 8 patients, including EML4-ALK fusions (n = 3), EGFR exon 19 deletion (n = 2), KRAS G12C/Q61H mutations (n = 2), and CD74-ROS1 fusion (n = 1). Moreover, a significantly higher maxAF was observed in sputum supernatant samples than their corresponding sediment samples (P < 0.001). The median maxAF in sputum supernatant was 1.26% (range: 0.0%-9.2%); while the median maxAF in sputum sediment was 0.79% (range 0.0%-14.1%).
These data suggest that DNA extracted from both sputum supernatant and sediment could be utilized for mutation profiling; however, the abundance of mutations is significantly higher in sputum supernatant than its corresponding sediment samples.
Concordance of sputum supernatant and sputum sediment with the matched tumor sample
Figure 3A illustrates the mutation profile of tumor tissue samples. Comparing the mutation profile of 26 patients with both the sputum supernatant and tumor samples, 41 mutations were detected from both samples (Fig. 3B), revealing a concordance rate of 69.2%. The detection of fusions from both sputum supernatant and matched tumor samples were highly concordant, achieving 75.0%. SNVs and Indels were only 34.0% concordant, while CNVs were only detected from 2 sputum supernatant samples resulting in a concordance rate of 7.7% (Table S3). It is worth noting that actionable fusions including EML4-ALK and CD74-ROS1 can be detected in sputum supernatant samples with a high concordance of 83.3% (5/6) relative to tumor samples (Table S3). Meanwhile, analysis of the mutation profile of 32 patients with both sputum sediment and tumor samples demonstrated the detection of 45 mutations from both samples (Fig. 3C), revealing a concordance rate of 37.8%. The concordance rates relative to tumor samples of sputum supernatants were significantly higher than sputum sediments (69.2% vs. 37.8%; P = 0.031).
These data indicate that sputum supernatant samples are better than their sediment fraction in reflecting tumor-related mutations.
Concordance of sputum supernatant with their matched plasma sample
Furthermore, Fig. 4A illustrates the mutation profile of plasma samples. Comparing the mutation profile of 28 patients with both the sputum supernatant and plasma samples, 32 mutations were detected from both samples, revealing a concordance rate of 53.6% (Fig. 4B). Sputum supernatant and plasma samples had comparable detection rates (71.4% vs. 67.9%; P = 1) but significantly higher median allelic fraction than its matched plasma sample (P = 0.034). Sputum supernatant and matched plasma samples were highly concordant in detecting fusions, achieving 83.5%. SNVs and Indels were 31.0% concordant, while CNVs were only detected from 3 sputum supernatant samples resulting in a concordance rate of 7.7% (Table S4).
These data indicate that sputum supernatant samples have comparable detection rates as plasma samples, suggesting its utility as an alternative sample for comprehensive genomic profiling, particularly for non-CNV mutations.
Sputum supernatant from smokers and non-smokers
Next, we investigated the clinical factors that are associated with a better detection rate for induced sputum samples. All the clinical features analyzed including age, gender, disease stage, smoking history, and histology, were not statistically correlated with mutation detection rate in either sputum supernatant or sediment samples (Table S5). However, significantly higher maxAF (P = 0.018; Table S5) and AF (P = 0.021; Figure S2) were observed in the sputum supernatant samples from smokers than from non-smokers.
These data suggest that sputum supernatant samples, particularly from smokers, could provide valuable genetic information.
Case Vignette
Of the 13 patients evaluable for sputum cytology, 38.5% (5/13) were identified with malignant cells. Figure 5 illustrates the apparent heterogeneous cell nuclei in the sputum cytology of samples from three patients. The three patients had stage IVA-IVB lung cancer of various histologies.
Figure 5A was the sputum cytology findings for Patient P23, who was a 56-year-old female non-smoker diagnosed with stage IVA pulmonary sarcomatoid carcinoma. TP53 c.993 + 1G > C and PIK3CA p.H1047R were detected from both the matched tissue and sputum supernatant samples but were undetected from the sputum sediment sample. With no actionable mutations detected, she received pemetrexed, carboplatin, and bevacizumab as front-line therapy.
Figure 5B was the sputum cytology findings for Patient P27 was a 55-year-old male smoker diagnosed with stage IVB well-differentiated squamous cell lung carcinoma. EGFR exon 19 deletion E746_A750 (19del) was detected from his matched tissue, plasma, and sputum supernatant samples, but was undetected from the sputum sediment sample. His disease achieved partial response with cisplatin, paclitaxel, and pembrolizumab as the front-line regimen. Upon detection of EGFR 19del, he received icotinib as the second-line regimen.
Figure 5C was the sputum cytology findings for Patient P41was a 66-year-old female non-smoker diagnosed with stage IVB poorly-differentiated lung adenocarcinoma. EGFR 19del was detected from all her matched samples, including tissue, plasma, sputum supernatant, and sputum sediment samples. In addition to the EGFR 19del, EGFR copy number amplification and TP53 c.783-1G > T were detected from her tissue samples, which were undetected in the other sample types. She received icotinib as the front-line regimen and achieved complete response.