Utility of Cell-Free DNA from Bronchoalveolar Lavage Fluids Using Next Generation Sequencing in Predicting Malignant Solitary Pulmonary Nodules

Purpose: To explore the utility of cell-free DNA (cfDNA) from bronchoalveolar lavage uids (BALF) using next generation sequencing (NGS) in differentiating malignant tumors from benign solitary pulmonary nodule (SPN). Methods: Between January 1st 2019 and January 1st 2021, 40 subjects undergoing computed tomography (CT) examination who were diagnosed with SPN, were prospectively enrolled at Zhangzhou Aliated Hospital of Fujian Medical University (Zhangzhou, Fujian, China). And pathological diagnosis were nally conrmed from tissue specimens obtained by surgical resection. For each patient, the mutations of gene were analysed using NGS in both extraction of cfDNA isolated from BALF and tissue. Results: Of 40 patients, 55% of them were diagnosed with lung adenocarcinoma, 20% with benign nodules, and 10% with small cell carcinoma. And patients with squamous carcinoma, adenosquamous carcinoma or large cell neuroendocrine carcinoma account for 5%, respectively. And 62.5% of malignant SPN (10/16) had at least one alteration. The most common alterations were TP53 (31.25%), followed by EGFR (18.75%), KRAS (12.5%), PIK3CA (6.25%), ERBB2 (6.25%), ALK (6.25%) and ROS1 (6.25%). Besides, there are some differences shown in the heatmap of gene mutation in the histologic sample. And there was a colse correlation between the mutations found in the tissue and BALF. For all 40 patients, the sensitivity, specicity, and concordance of BALF in predicting malignant nodules were 68.8%, 100%, and 75%, respectively. Conclusions: By NGS, tumor-specic mutations of cfDNA from BALF may be benecal to predicting malignant SPN, which may be taken into consideration for personalized cancer diagnosis.


Introduction
Lung cancer is generally detected at advanced, inoperable staging, which leads to the most common cause of cancer-associated deaths for both sexes [1]. Therefore, early identi cations and interventions in lung cancer would be of primary importance, especially for those patients with asymptomatic and curable solitary pulmoary nodule (SPN) [2].
It is now accepted that beroptic bronchoscopy is the key diagnostic approach in cases of suspected lung cancer [3], and the diagnostic yield of cytologic analysis of bronchoalveolar lavage uids (BALF) for peripheral lung cancer reaches 65% [4]. It is well known that circulating tumor cells in BALF may be sensitive to discriminate malignant SPN [5]. Before becoming clinically apparent, the detection of molecular indicators is bene cial to a decreased mortality [6]. Cell-free DNA(cfDNA), derived from tumors, is likely to express the entire genomic landscape. Indeed, next-generation sequencing (NGS) for the detection of cfDNA has also made great breakthroughs [7].
To our knowledge, there is little information available in literature about cfDNA from BALF using NGS on patients with SPN. Schmidt's group succeeded in isolating cfDNA from BALF and identifying alterations.
In particular, they focused on cancer patients and utilized polymerase chain reaction (PCR)/reverse transcriptase (RT)-PCR. Meanwhile, the positive incidence of gene mutation was unsatisfactory [8]. In addition, both surgically resected lung peripheral nodules and plasma DNA were investigated using NGS by Ye and his colleagues. Unfortunately, they suggested low concordance between tissue DNA and ctDNA mutations, so more attempts are required in optimizing the model [9]. Besides, Buttitta enrolled patients with lung adenocarcinoma who underwent surgical resection or biopsy and evaluated EGFR mutations from BALF [10]. Because of the important role in the pathogenesis of lung cancer, accurate detection of gene mutations such as EGFR and KRAS can be associated with prognosis [11,12].
Thus, we aimed to explore lung tumour-associated alterations using NGS in cfDNA from BALF and test its ability to predict malignant SPN in early diagnosis of lung cancer.

Patient recruitment and sample collection
Between January 1st 2019 and January 1st 2021, 40 subjects undergoing computed tomography (CT) examination who were diagnosed with SPN (nodule size<1cm) were prospectively enrolled at Zhangzhou a liated hospital of Fujian Medical University (Zhangzhou, Fujian, China). The scanned DICOM data was imported to computer via VBN software (DirectPath V1.02, Cybernet Systems), creating the target virtual bronchoscope bronchial images automatically, and the guidance pathway to lesions was established. Bronchoscope (Olympus BF-P260F, outer diameter in 4.0mm, working aperture in 2.0mm) was navigated to the target of the bronchi by the VBN system, pushing into the ultrasonic probe (UM-S20-20R, Olympus) to the corresponding segment and then explored to low echo ultrasound images. Afterward withdrew the ultrasonic probe slowly and measured the distance from the opening of segmental bronchus to area of the lesion indicated by ultrasound. Then, according to the measured distance, repeated observing whether the operation path was correct by the ultrasonic probe twice. Warm saline (0.9% NaCl) was injected in 20 ml aliquots through the working channel and harvested by pooling into sterile collection tubes (yield 15-35 ml). From each patient, 8-10 ml of BALF was collected and centrifuged at 2500g for 15 min to separate supernatants. BALF supernatant was used for the extraction of cfDNA. Within one month, patients underwent surgery after BALF and pathological types were nally con rmed by pathologists. For each patient, both the extraction of cfDNA isolated from BALF supernatants and tissue were analysed for mutation using NGS. And TNM classi cation was performed in accordance with 2020 NCCN index.
The study was approved by the ethical committee of Zhangzhou A liated Hospital of Fujian Medical University(ethics approval no. Zzsyy-2017-1116), and all patients provided informed written consent. All samples were tested in a centralized clinical testing center (Nanjing Geneseeq Technology Inc., Nanjing, China).
DNA extraction, target capture, and next-generation sequencing According to the manufacturer's protocol, we puri ed CfDNA from BALF supernatants utilizing the QIAamp Circulating Nucleic Acid Kit (Qiagen, Hilden, Germany) and extracted Genomic DNA from BALF sediment applying the DNeasy Tissue Kit (Qiagen).Then, Genomic DNA was quali ed using a Nanodrop2000 (Thermo Fisher), and BALF-cfDNA fragment distribution was analyzed on a Bioanalyzer 2100 using the High Sensitivity DNA Kit (Agilent Technologies). We quanti ed all DNA by utilizing the dsDNA HS Assay Kit on a Qubit 3.0 Fluorometer (Life Technologies). Subsequently, we prepared sequencing libraries using the KAPA Hyper Prep Kit (KAPA Biosystems). Indexed DNA libraries were pooled together up to 2µg of total input and subjected to probe-based hybridization with GeneseeqOne NGS panel targeting 425 prede ned cancer-associated genes (Geneseeq Prime panel). Captured libraries were ampli ed with Illumina p5 and p7 primers in KAPA HiFi HotStart ReadyMix (KAPA Biosystems).

Data processing and bioinformatics Analysis
We applied Trimmomatic for FASTQ le quality control [13]. Consequently, we removed leading/trailing low quality (quality reading below 15) or N bases. Then the data were aligned to the hg19 reference human genome with the Burrows-Wheeler Aligner (bwa-mem) [14]. According to the instruction,we applied local realignment around indels and base quality score recalibration with the Genome Analysis Toolkit (GATA 3.4.0) [15]. Normal and tumor BAM les were paired appling MuTect [16] with default parameters to indentify somatic single nucleotide variants (SNVs). We analyzed small insertions and deletions (indels) using SCALPEL (http://scalpel.sourceforge.net). For BALF-sDNA, we required minimum variant allele frequency=1%, minimum variant supporting reads = 5. For BALF-cfDNA, we required minimum variant allele frequency= 1% or 0.3%, minimum variant supporting reads = 5 or 3, for non-hotspot and hotspot mutations (de ned as recurrence>=20 in COSMIC database), respectively.
We excluded the SNVs and indels in the 1000 Genomes project and ExAC database with frequency>1%.
Then, according to the hg19 reference genome and 2014 versions of standard databases and functional prediction programs, SNV and indel annotation was performed. Gene fusions were identi ed by FACTERA [17] and copy number variations (CNVs) were analyzed with ADTEx[18]. The log2 ratio cut-off for copy number gain was de ned as 2.0 for BALF-sDNA and 1.6 for BALF-cfDNA samples. A log2 ratio cut-off of 0.6 was applied for copy number loss detection in all sample types.

Statistical analysis
Statistical analyses were conducted using SPSS 22.0 (Chicago, IL, USA). Categorical variables were discribed as number (percentage). They were compared applying the chi-square test or Fisher's exact test. Sensitivity, speci city were de ned as follows: sensitivity= TP/ (TP+FN), speci city = TN/ (TN+FP); where TP is true positive, FN is false negative, TN is true negative, and FP is false positive. All p values discribed are two sided, and a p value of <0.05 was thought statistically signi cant.

Results
Demographic and clinical characteristics of the study cohort Table 1 shows the demographic and clinical characteristics of all subjects. We included 24 men and 16 women, and the median age was 60 years (range 39-72 years). Of 40 patients, 55% of them were diagnosed with lung adenocarcinoma, 20% with benign nodules, and 10% with small cell carcinoma. And patients with squamous carcinoma, adenosquamous carcinoma or large cell neuroendocrine carcinoma account for 5%, respectively. The pathologic stages of all patients were stage I.  Table 2 shows the pathological diagnosis and mutation type of the study cohort. Most malignant nodule had realated driver genes. And detailed distributions were described in the heatmap of gene mutation in the BALF sample (Fig. 1). Patients had a median of one alteration (range, 0-10), and 62.5% of malignant patients (20/32) had at least one alteration. The most common alterations in BALF were TP53 (31.25%), followed by EGFR (18.75%), KRAS (12.5%), PIK3CA (6.25%), ERBB2 (6.25%), ALK (6.25%), NTRK (6.25%) and ROS1 (6.25%). TP53 and RB1 are the top two mutated tumor suppressor genes, with a frequence of 31.25% and 12.5%, respectively (Fig. 1). Other top mutated genes include tumor suppressor genes FAT1 (6.25%), as well as ARID1A (6.25%), KMT2B (6.25%). In addition, there are some differences shown in the heatmap of gene mutation in the histologic sample (Fig. 2), such as ALK, ROS1, EGFR and TP53. There was a colse correlation between the mutations found in the histologic sample and BALF. (Tab. 3). Also, a relatively colse association existed between the incidence of the malignant nodules and positive mutation from BALF (Tab. 4). Among all 40 patients, the sensitivity, speci city, and concordance of predicting malignant nodules with BALF were 68.8%, 100%, and 75%, respectively.   [19,20].

Description Of Alterations And Actionability Of The Detected Alterations
Particularly, the positive diagnoses of KRAS and TP53 mutations appeared not to be associated with histologic subtypes. Further, the incidence of mutation in BALF sample was not inferior to that of histological issue. Overall, our research may provide an clinically important method using BALF to predict malignant nodule in suspected cancer patients.
Indeed, the difference in the positives of each driver gene might be attributed to sample size and source, clinical histological type and sequencing panel. The sensitivity of the bronchoscopic method in peripheral tumor may be subjected to tumor size and location [21]. Clinicians should accurately locate the SPN and obtain effective BALF. A non-invasive approach to predict the malignancy of surgery-candidate SPN is urgently needed. So far, there is little data about clear relations between genetic alterations and tumour subtypes from BALF. Limited by small sample size, it is objevtively thought that predicting malignant nodules through driver mutation detection based on cfDNA may need more cautions. Still, different from other samples reporting about a high incidence of KRAS in adenocarcinomas [22] and a signi cantly low incidence of TP53 in squamous cell carcinoma [23], we demonstrated a relatively close link between malignant nodules and mutations of KRAS and TP53 from BALF.
Although some mutations which are not speci c to histologic type occur in NSCLC patients with an equal frequency in our ndings. The conclusions still suggested that these genetic changes may be initial events leading to lung cancer development. EGFR and KRAS mutations may existed in some synchronous lesions [24]. In agreement with previous reports [25,26], our data showed that L858R and 19del were the most common mutations. It might be associated with the fact that, EGFR mutations are more prevalent in NSCLC in Asia than in western countries. Different from previous researchers focusing on con rmed lung cancer patients[8-10], we speculated that early detection of diseases-related genes by NGS would be alarm to those people with SPN at high-risk lung cancer. Meanwhile, we hypothesized that for speci c patients, tomor may have possible selection mechanisms around certain genes or pathways that are important for carcinogenesis.
It is worth mentioning that NGS panel with targeting 425 prede ned cancer-associated genes was performed, and some rare mutations (such as NTRK, ARID1A, ARID2 and SETD2) were observed. As potential therapeutic targets including NSCLC and sarcomas [27], we highlight the need to routinely detect NTRK fusions to broaden the therapeutic options. Considering that tumor is normally a mixture of different cell types, inactivation mutations in several switch/sucrose non-fermenting subunits, such as ARID1A and ARID2, are identi ed in a signi cant proportion of lung tumors [28]. Besides, SETD2 as a potent tumor suppressor in lung adenocarcinoma, developed model systems to improve chromatin deregulation in lung cancer [29].
Similar with previous ndings [20,30], close correlation existed between the mutations found in the histologic sample and BALF in our study. We suggested that it may provide a useful means to predict malignant nodule by BALF. For those patients afraid of surgery without de nite malignant imaging manifestations, BALF enable patients to be tested less invasively [31]. We thought that comprehensive analysis of gene mutations in BALF may be a helpful supplement to enhance the diagnostic yield of differentiating malignant SPN. Unlike other traditional biopsies, cfDNA could provide an overview of all the mutations, allowing for a more targeted treatment. Particularly, personalized prediction may be recommended for those patients with negative mutation results to avoid unnecessary surgury, which enable clinicians to more accurately judge treatment options.
The present ngdings, however, should be interpreted taking into account some limitations. First, the sample size was too small to perform additional statistical analyses on factors such as nodule location and nodule size, that could affect the yield of procedure. However, in line with previous observations [8,32], our ndings illustrated relatively high positive rates of gene mutations, suggesting su cient DNA ampli ed by NGS from cell-free lavage supernatants. Of note, BALF needs less caution for specimen handling and is expected to minimize the tumor heterogeneity because of release of cfDNA fragments. Secondly, considering the cost, the patients did not utilize NGS on serum specimens, it may affect the correlation analysis between the mutations found in the serum sample and the BALF to some extent. Nevertheless, the mutations in pathological tissue and BALF maintained good agreement. Thirdly, it is noteworthy that KRAS mutation in BALF was detected not only from patients with lung cancer, but also from patients with benign lung disease [33]. Also, KRAS could be linked with smoking or chronic in ammatory processes [23].
Although limited by the sensitivity, tumor-speci c mutations of cfDNA from BALF specimens using NGS would be bene cal to predicting malignant SPN. Cancer early detection is by far the most economical and effective mean to reduce cancer-speci c mortality. And comprehensive targeted NGS enhances personalized cancer treatment. Informed consent: Informed consent was obtained from all individual participants included in the study.
Consent to publish: The authors a rm that human research participants provided informed consent for publication.

Figure 1
Heatmap of gene mutation in the BALF sample Heatmap of gene mutation in the histologic sample