The utility of sputum supernatant as an alternative liquid biopsy specimen for next-generation sequencing-based genomic proling

Comprehensive mutation proling has become a standard clinical practice in the management of advanced lung cancer. In addition to tissue and plasma, other body uids are also being actively explored as alternative sources of tumor DNA. In this study, we investigated the potential of induced sputum obtained from patients with non-small cell lung cancer (NSCLC) for mutation proling. Methods Capture-based targeted sequencing was performed on matched tumor, plasma, and induced sputum samples of 41 treatment-naïve patients with NSCLC using 168 gene panel. Results Comparative analysis on the mutation detection using matched tumor sample as reference revealed detection rates of 76.9% for plasma, 72.4% for sputum-supernatant, and 65.7% for sputum-sediment samples. Plasma, sputum-supernatant, and sputum-sediment achieved positive predictive values of 73.3%, 80.4%, and 55.6% and sensitivities of 50.0%, 36.9%, 31.3%, respectively, relative to tumor samples for 168 genes. Sputum-supernatants had signicantly higher concordance rates relative to matched tumor samples (69.2% vs 37.8%; P = 0.031) and maximum allelic fraction (P < 0.001) than its matched sputum-sediments. Sputum-supernatants had comparable detection rates (71.4% vs. 67.9%; P = 1) but with signicantly higher maximum allelic fraction than their matched plasma samples (P = 0.003). Furthermore, sputum-supernatant from smokers had a signicantly higher maximum allelic fraction than sputum-supernatant from non-smokers (P = 0.021). Our study demonstrated that supernatant fraction from induced sputum is a better sampling source than its sediment and has comparable performance as plasma samples. Induced sputum from NSCLC patients could serve as an alternative media for next-generation sequencing-based mutation proling. Statistical analyses using test, paired Student’s t-test, Wilcoxon signed-rank test, R proling. Our study contributes to the growing knowledge of alternative media for NGS-based mutation proling.


Introduction
In recent years, targeted therapy has transformed the treatment and management of patients with lung cancer into a more personalized approach [1][2][3][4]. Personalized medicine mainly relies on the accurate detection of actionable mutations and identi cation of patients who would bene t from targeted therapy.
Molecular testing with either conventional single-gene assays, including ampli cation refractory mutation system and direct sequencing, or next-generation sequencing has considered tissue biopsy as the gold standard specimen; however, the biopsy of primary tumor tissues requires invasive sampling procedures [5][6][7][8]. The rapid technological advancements in molecular assays have led to the improvement in the diagnostic accuracy in detecting tumor-speci c somatic mutations from DNA extracted from samples collected through minimally invasive procedures including liquid biopsy specimens [5][6][7][8]. Since blood is easily accessible and permits repeated sampling, plasma-based mutation pro ling using next-generation sequencing (NGS) has now become widely used in clinical oncology practice for diagnosis, treatment monitoring, and assessment of mechanisms of drug resistance [8]. Malignant non-blood biological uids that are in close contact with the tumors, including pleural effusion, ascites, and cerebrospinal uid, are gaining attention as specimens for molecular testing due to their effectiveness in re ecting tumor genomic pro les [9][10][11][12]. Meanwhile, other easily accessible biological uids including sputum that likely contain tumor-derived DNA are being actively explored for mutation detection [9,[13][14][15][16][17]. Sputum has been explored in the detection of genetic and epigenetic alterations in patients with various stages of lung cancer and in cancer-free chronic smokers who are at higher risk of developing lung cancer [9,13,[15][16][17][18][19][20][21][22][23][24][25][26][27]. These studies consistently demonstrate that induced sputum samples contain circulating cellfree DNA derived from the lungs and lower respiratory tract and are attractive candidate liquid biopsy media for lung cancer diagnosis [13,[15][16][17][18][19][20][21][22][23][24][25][26][27]. However, very few studies have explored its use in nextgeneration sequencing (NGS)-based genomic pro ling. In this study, we investigated the potential of induced sputum obtained from treatment-naïve patients with non-small-cell lung cancer (NSCLC) as a medium for comprehensive mutation pro ling. To achieve this aim, we performed capture-based targeted NGS on matched tumor, plasma, and sputum from 41 patients. We then performed a comparative analysis to identify the optimal fraction from induced sputum specimens that could yield better mutation detection rate and to establish the utility of induced sputum as an alternative source of tumor DNA for mutation pro ling by comparing with matched tumor and plasma samples.

Patient selection
Treatment-naïve patients who were diagnosed with locally-advanced to advanced NSCLCs from our hospital between October 2018 and June 2019 were included in this study. The main inclusion criteria for this cohort were as follows: 1. Pathologically con rmed NSCLC; 2. Locally-advanced to advanced disease stage; 3. Have not received prior systemic therapy; 4. Submitted matched tissue, plasma, and sputum samples; and 5. Provided consent for the use of their clinical and molecular data. This study was approved by the Medical Ethics Committee of Xiangya Hospital Central South University (approval number: 201911306) and performed in accordance with the Declaration of Helsinki. Written informed consent was provided by all patients for the use of their biological samples.
Collection and preparation of sputum samples Before sputum induction, the patients were administered with 400 µg albuterol by inhalation with their lung function measured by spirometry as forced expiratory volume (FEV 1 ) before and 10 minutes after albuterol inhalation. Sputum induction was performed with hypertonic saline (4.5%) inhalation for 15 minutes for patients with FEV 1 ≥ 1L and isotonic saline (0.9% ) was used for patients with FEV 1 < 1L. An aliquot of the expectorate was reserved for gene sequencing, and another aliquot was analyzed cytologically for the presence of cancer cells or heterogeneous cell nuclei.
The induced sputum samples (~ 8 mL) collected from each of the patients were treated with 0.25% pancreatin at 37 °C with agitation at 660 rpm for 30 minutes. The digestion condition was adjusted according to the viscosity of the sputum to a maximum of 1:2.5 sputum to pancreatin ratio and/or extension of incubation time until complete liquefaction of the sputum sample. The digest was centrifuged at 3,000 x g for 10 minutes at 4 °C. The sediment fraction was reconstituted in the remaining 1 mL of supernatant and stored at -80 °C until DNA extraction. Meanwhile, the supernatant fractions were transferred to fresh tubes, centrifuged at 16,000 x g for 10 minutes at 4 °C to remove cell debris, aliquoted into fresh tubes, and stored at -80 °C until DNA extraction.
Collection and preparation of tissue and plasma samples Whole blood samples (approximately 10 mL) and tissue biopsy samples were collected from each of the patients. Lung tumor tissue samples were obtained by biopsy and processed into formalin-xed, para nembedded (FFPE) cell blocks for storage. Plasma was separated from blood samples collected in EDTAtreated tubes by centrifugation (1,500 x g, 4 °C, 10 minutes). Plasma fractions were transferred into fresh tubes, centrifuged to remove cell debris (16,000 x g, 4 °C, 10 minutes), aliquoted into fresh tubes, and stored at -80 °C until DNA extraction.
DNA isolation and capture-based targeted DNA sequencing DNA isolation and targeted sequencing were performed at Burning Rock Biotech, a College of American Pathologist (CAP)-accredited/Clinical Laboratory Improvement Amendments (CLIA)-certi ed commercial clinical laboratory, according to optimized protocols as described previously [10,11]. DNA was extracted from FFPE tissue biopsy samples and sputum-sediment samples using appropriate QIAamp DNA tissue kits (Qiagen, Hilden, Germany). Circulating cell-free DNA (cfDNA) was extracted from 4-5 ml of plasma samples, and 15 mL sputum-supernatant samples using a QIAamp Circulating Nucleic Acid kit, according to the manufacturer's standard protocol (Qiagen, Hilden, Germany). DNA quality was assessed using Qubit 3.0 uorimeter with the dsDNA high-sensitivity assay kit (Life Technologies, CA, USA). Tissue DNA was sheared using M220 ultra-focused sonicator (Covaris, MA, USA). Fragments between 200-400 bp from the sheared tissue DNA and cfDNA were puri ed (Agencourt AMPure XP Kit, Beckman Coulter, CA, USA) and hybridized with capture probes baits using a commercial panel consisting of 168 genes. After puri cation and ampli cation of the capture-hybrid library, the quality and the size of the fragments were assessed with high sensitivity DNA kit using the 2100 Bioanalyzer instrument (Agilent Technologies, CA, USA). Indexed samples were sequenced on Nextseq500 (Illumina, Inc., CA, USA) with paired-end reads and average sequencing depth of 1,000⋅ for tumor samples and 10,000⋅ for plasma, sputum supernatant, and sputum sediment samples.

NGS data analysis
The NGS sequence data from the patients were mapped to the reference human genome (hg19) using Burrows-Wheeler Aligner version 0.7.10 [28]. Local alignment optimization, duplication marking, and variant calling were performed using the Genome Analysis Tool Kit version 3.2 [29], and VarScan version 2.4.3 [30]. Variants were ltered using the VarScan fp lter pipeline, loci with depth less than 100 were ltered out. Base-calling in plasma and tissue samples required at least 8 supporting reads for single nucleotide variations (SNV) and 2 and 5 supporting reads for small insertion-deletion variations (Indel), respectively. Variants with population frequency over 0.1% in the ExAC, 1000 Genomes, dbSNP, or ESP6500SI-V2 databases were grouped as single nucleotide polymorphisms and excluded from further analysis. Remaining variants were annotated with ANNOVAR (2016-02-01 release) [

Statistical analysis
The detection rate was de ned as the proportion of samples detected with mutations relative to the total number of samples of the same sample type. Maximum allelic fraction (maxAF) was de ned as the maximum fraction of the mutant allele detected from a sample regardless of mutation or gene. The concordance rate was de ned as the proportion of the total number of mutations detected from one sample type relative to the reference sample type. Statistical analyses were performed using the Fisher's exact test, paired Student's t-test, Wilcoxon signed-rank test, as applicable, in R software. P-value of less than 0.05 was considered statistically signi cant.  Sample distribution and quality control of the samples Matched tumor, blood, and sputum were collected from all the patients; however, some samples were unavailable for mutation pro ling due to insu cient sample volume (n = 23), and inadequate DNA quality for library construction (n = 2). Of them, tumor samples were available for 38 patients. Blood samples were available for 39 patients. Sputum supernatant and sputum sediment samples were available from 29 and 35 patients, respectively. Table S1 summarizes the distribution of the cohort according to sample type.

Patient characteristics
DNA was extracted from a total of 141 available samples, with an average DNA yield of 1571.8 ng for tumor, 121.9 ng for plasma, 2766.0 ng for sputum supernatant, and 8144.5 ng for sputum sediment ( Figure S1A). The distribution of the library complexity and insert size of all the sequenced samples revealed similar distribution for tumor biopsy samples and sputum sediments, which was distinct from plasma and sputum supernatant samples ( Figure S1B). A majority of the tumor and sputum sediment samples had insert sizes between 150 to 250 base pairs; while most of the plasma and sputum supernatant samples had insert sizes between 150 to 175 base pairs ( Figure S1B). The sequencing achieved a median depth of 1,275⋅ for tumor samples, 16,326⋅ for plasma samples, 10,549⋅ for sputum supernatant, and 16,660⋅ for sputum sediment ( Figure S1C).
These data indicate that the DNA extracted from sputum supernatant and sputum sediment samples have adequate quality and su cient quantity for NGS-based genomic pro ling.
Meanwhile, a total of 276 mutations in 75 genes were detected from matched sputum sediment samples from 14 patients, revealing a detection rate of 60.9%. Detection rates were comparable between sputum supernatant samples and its matched sputum sediment (78.3% vs 65.2%; P = 0.51). By considering the overall number of mutations detected from 23 patients with both sample types using the 168 gene panel, the mutations detected from both sputum supernatant and sediment samples were 47.8% concordant. Figure 2B illustrates the mutations detected in both or either of the sputum supernatant and/or sediment samples from the 23 patients. Based on the distribution of mutation types among the 168 genes, fusions achieved a concordance rate of 80.0%, SNVs and Indels had a concordance rate of 30.4%, while CNVs had very low detection rate, with only 1 sputum supernatant sample and no sputum sediment samples were detected with CNV (Table S2). Actionable mutations were detected from the matched sputum supernatant and sediment from 8 patients, including EML4-ALK fusions (n = 3), EGFR exon 19 deletion (n = 2), KRAS G12C/Q61H mutations (n = 2), and CD74-ROS1 fusion (n = 1). Moreover, a signi cantly higher maxAF was observed in sputum supernatant samples than their corresponding sediment samples (P < 0.001). The median maxAF in sputum supernatant was 1.26% (range: 0.0%-9.2%); while the median maxAF in sputum sediment was 0.79% (range 0.0%-14.1%).
These data suggest that DNA extracted from both sputum supernatant and sediment could be utilized for mutation pro ling; however, the abundance of mutations is signi cantly higher in sputum supernatant than its corresponding sediment samples.
Concordance of sputum supernatant and sputum sediment with the matched tumor sample Figure 3A illustrates the mutation pro le of tumor tissue samples. Comparing the mutation pro le of 26 patients with both the sputum supernatant and tumor samples, 41 mutations were detected from both samples (Fig. 3B), revealing a concordance rate of 69.2%. The detection of fusions from both sputum supernatant and matched tumor samples were highly concordant, achieving 75.0%. SNVs and Indels were only 34.0% concordant, while CNVs were only detected from 2 sputum supernatant samples resulting in a concordance rate of 7.7% (Table S3). It is worth noting that actionable fusions including EML4-ALK and CD74-ROS1 can be detected in sputum supernatant samples with a high concordance of 83.3% (5/6) relative to tumor samples (Table S3). Meanwhile, analysis of the mutation pro le of 32 patients with both sputum sediment and tumor samples demonstrated the detection of 45 mutations from both samples (Fig. 3C), revealing a concordance rate of 37.8%. The concordance rates relative to tumor samples of sputum supernatants were signi cantly higher than sputum sediments (69.2% vs. 37.8%; P = 0.031).
These data indicate that sputum supernatant samples are better than their sediment fraction in re ecting tumor-related mutations.
Concordance of sputum supernatant with their matched plasma sample Furthermore, Fig. 4A illustrates the mutation pro le of plasma samples. Comparing the mutation pro le of 28 patients with both the sputum supernatant and plasma samples, 32 mutations were detected from both samples, revealing a concordance rate of 53.6% (Fig. 4B). Sputum supernatant and plasma samples had comparable detection rates (71.4% vs. 67.9%; P = 1) but signi cantly higher median allelic fraction than its matched plasma sample (P = 0.034). Sputum supernatant and matched plasma samples were highly concordant in detecting fusions, achieving 83.5%. SNVs and Indels were 31.0% concordant, while CNVs were only detected from 3 sputum supernatant samples resulting in a concordance rate of 7.7% (Table S4).
These data indicate that sputum supernatant samples have comparable detection rates as plasma samples, suggesting its utility as an alternative sample for comprehensive genomic pro ling, particularly for non-CNV mutations.

Sputum supernatant from smokers and non-smokers
Next, we investigated the clinical factors that are associated with a better detection rate for induced sputum samples. All the clinical features analyzed including age, gender, disease stage, smoking history, and histology, were not statistically correlated with mutation detection rate in either sputum supernatant or sediment samples (Table S5). However, signi cantly higher maxAF (P = 0.018; Table S5) and AF (P = 0.021; Figure S2) were observed in the sputum supernatant samples from smokers than from nonsmokers.
These data suggest that sputum supernatant samples, particularly from smokers, could provide valuable genetic information.

Case Vignette
Of the 13 patients evaluable for sputum cytology, 38.5% (5/13) were identi ed with malignant cells. Figure 5 illustrates the apparent heterogeneous cell nuclei in the sputum cytology of samples from three patients. The three patients had stage IVA-IVB lung cancer of various histologies. Figure 5A was the sputum cytology ndings for Patient P23, who was a 56-year-old female non-smoker diagnosed with stage IVA pulmonary sarcomatoid carcinoma. TP53 c.993 + 1G > C and PIK3CA p.H1047R were detected from both the matched tissue and sputum supernatant samples but were undetected from the sputum sediment sample. With no actionable mutations detected, she received pemetrexed, carboplatin, and bevacizumab as front-line therapy. Figure 5B was the sputum cytology ndings for Patient P27 was a 55-year-old male smoker diagnosed with stage IVB well-differentiated squamous cell lung carcinoma. EGFR exon 19 deletion E746_A750 (19del) was detected from his matched tissue, plasma, and sputum supernatant samples, but was undetected from the sputum sediment sample. His disease achieved partial response with cisplatin, paclitaxel, and pembrolizumab as the front-line regimen. Upon detection of EGFR 19del, he received icotinib as the second-line regimen. Figure 5C was the sputum cytology ndings for Patient P41was a 66-year-old female non-smoker diagnosed with stage IVB poorly-differentiated lung adenocarcinoma. EGFR 19del was detected from all her matched samples, including tissue, plasma, sputum supernatant, and sputum sediment samples. In addition to the EGFR 19del, EGFR copy number ampli cation and TP53 c.783-1G > T were detected from her tissue samples, which were undetected in the other sample types. She received icotinib as the frontline regimen and achieved complete response.

Discussion
Exfoliative cytology, which involves the microscopic study of the cells exfoliated from tumors in various samples including saliva, sputum, and bronchial secretions, has been well-established non-invasive procedure in providing diagnostic information [34,35]. The diagnostic accuracy of sputum cytology for lung cancer diagnosis has been demonstrated to achieve a speci city of 90%, sensitivity of 87%, and positive predictive value of 79%, which in the absence of necrotizing pneumonia could exceed 95% [34]. As early as 1994, the use of sputum samples had been explored in the detection of gene mutations using polymerase chain reaction (PCR)-based single gene assays [13,14,17,18,25]; however, only one study thus far has explored its use in NGS-based mutation pro ling [16]. In our study, we have demonstrated the feasibility of using sputum supernatant as an alternative liquid biopsy specimen for comprehensive molecular pro ling. The quality and quantity of circulating cell-free DNA isolated from the supernatant and sediment fractions from the induced sputum samples of our cohort were adequate for molecular pro ling. Based on its signi cantly higher positive predictive value (80.4% vs. 55.6%) and concordance rate with tumor tissue samples as gold standard (69.2% vs. 37.8%; P = 0.031), sputum supernatant is the optimal fraction for the accurate detection of tumor-related non-CNV mutations than the sediment fraction. The higher concordance rate with tumor samples also suggests two important points: rst, the concentration of circulating tumor DNA found in sputum supernatant is higher as compared to sputum sediment fraction; and second, the molecular pro le derived from sputum supernatant samples more accurately re ects the non-CNV mutations found in the primary lung tumor tissues. The overall concordance rate in mutations detected from sputum samples relative to the matched tumor tissue samples we have observed from our cohort (69.2%) was consistent with the recent study by Wu and colleagues, which demonstrates a 74% overall concordance rate [16]. The comparable mutation detection rates, particularly in actionable non-CNV mutations, between sputum supernatant and plasma samples further suggest the feasibility of using sputum as an alternative liquid biopsy specimen for molecular testing. Similar to the observations by Wu and colleagues [16], our study also demonstrated differences and similarities in mutation pro le in matched sputum fractions, plasma, and tumor tissues, which might be related to spatial genetic heterogeneity inherent in small volume needle biopsy samples from tumor tissues. However, since sputum supernatant and sediment fractions were derived from 1 collection tube and were only physically separated in vitro, it is safe to conclude that sputum supernatant is the optimal fraction for molecular pro ling.
In clinical practice, the alternative use of induced sputum can minimize the need for obtaining tumor samples using tissue biopsy in molecular testing applications. Due to convenience and easy accessibility, sputum specimens can also be used when blood samples are di cult to obtain. Our observations on the signi cantly higher maxAF in sputum supernatant than plasma samples suggest that sputum supernatant might be useful in patients with early-stage lung cancer and can also be explored for monitoring of treatment response. Previous studies have demonstrated the feasibility of detecting genetic and epigenetic alterations in sputum samples from cancer-free chronic smokers as a strategy for early detection of lung cancer [13,18,19,22,25]. NGS-based molecular pro ling of sputum supernatant samples was adequately sensitive in detecting actionable mutations, particularly fusions, which has clinical value in guiding the use of appropriate targeted therapies. However, based on the data from our cohort, the detection rate for CNVs was low for sputum samples, which could be due to the limit of detection of CNVs in circulating cell-free DNA. We speculate that the use of unique modi er identi cationbased capture probes could improve the detection rate of CNVs and other mutations with ultralow allele frequencies.
Our study is limited by the small cohort size which included patients who participated in only a singlecenter. A study with a larger cohort is warranted to establish the utility of induced sputum specimens as a liquid biopsy specimen for NGS-based mutation testing and explore its application in treatment monitoring in advanced lung cancer patients or early detection of disease recurrence in both early and advanced lung cancer patients.

Conclusions
Our study demonstrates that the supernatant fraction of induced sputum specimens has comparable performance as plasma samples. Induced sputum from NSCLC patients with advanced-stage NSCLC could serve as an alternative source of tumor DNA for comprehensive mutation pro ling. Our study contributes to the growing knowledge of alternative media for NGS-based mutation pro ling.
Abbreviations CNV, copy number variations; FEV 1 , forced expiratory volume; FFPE, formalin-xed, para n-embedded; Indel, insertion-deletion variations; maxAF, maximum allelic fraction; NGS, next-generation sequencing; NSCLC, non-small-cell lung cancer; PPV positive predictive value; SNV, single nucleotide variations Declarations Ethical approval: All procedures performed in studies involving human participants were performed in accordance with the Declaration of Helsinki. This study was approved by the Medical Ethics Committee of Xiangya Hospital Central South University (approval number: 201911306). Written informed consent was obtained from all participants included in the study.
Consent for publication: Consent for publication has been obtained.
Data Sharing and Data Accessibility: All authors con rm adherence to the policy. The data that support the ndings of this study are available from the corresponding author upon reasonable request.    Figure 1 The mutation detection rate and allelic fraction from plasma and sputum supernatant were comparable.
A-B. Bar plots summarizing the mutation detection rates from 168 genes (A) and 9 genes (8 classic NSCLC oncogenic driver genes and TP53; B) with tumor samples as reference. C-D. Violin plots summarizing the maximum allele frequency from 168 genes (C) and 9 genes (D).

Figure 1
The mutation detection rate and allelic fraction from plasma and sputum supernatant were comparable.
A-B. Bar plots summarizing the mutation detection rates from 168 genes (A) and 9 genes (8 classic NSCLC oncogenic driver genes and TP53; B) with tumor samples as reference. C-D. Violin plots summarizing the maximum allele frequency from 168 genes (C) and 9 genes (D).

Figure 2
Sputum supernatant is a more optimal sputum fraction for molecular pro ling than its corresponding sediment. A-B. Oncoprints illustrating the mutation landscape derived from the sputum supernatant of 29 patients (A) and the comparison between the mutation pro les derived from the sputum supernatant and its corresponding sediment in 23 patients. The colors indicate either the mutation types (A) or the status of each mutation whether detected in both samples (Shared) or detected in only the supernatant (SPU only) or the sediment (Dregs only). Each column represents one patient. Each row represents a gene. Side bar represents the mutation rate of a certain gene. The oncoprint was summarized to only re ect the genes with 2 or more mutations detected.

Figure 2
Sputum supernatant is a more optimal sputum fraction for molecular pro ling than its corresponding sediment. A-B. Oncoprints illustrating the mutation landscape derived from the sputum supernatant of 29 patients (A) and the comparison between the mutation pro les derived from the sputum supernatant and its corresponding sediment in 23 patients. The colors indicate either the mutation types (A) or the status of each mutation whether detected in both samples (Shared) or detected in only the supernatant (SPU only) or the sediment (Dregs only). Each column represents one patient. Each row represents a gene. Side bar represents the mutation rate of a certain gene. The oncoprint was summarized to only re ect the genes with 2 or more mutations detected.

Figure 3
Sputum supernatant re ects more tumor-related mutations than its corresponding sediment. A-C.