RNF213 gene mutation of circulating tumor DNA in early diagnosis of NSCLC using targeted next-generation sequencing

To distinguish early stage lung cancer from benign disease of the lung nodules, especially the lesions with ground-glass opacity (GGO) or ground-glass nodule (GGN), we assessed gene mutations of the ctDNA in peripheral blood by using targeted next-generation sequencing (NGS). Methods Single lung nodule patients without mediastinal lymph nodes or symptoms hardly diagnosed by chest CT and biomarker of lung cancer were enrolled. All patients received minimally invasive surgery but refused preoperative biopsy. Gene mutations of pre-operative blood samples were detected by targeted NGS. Mutations with statistical differences were screened in lung cancer and benign disease grouped by postoperative pathology. Gene expression was determined by immunohistochemistry. Highly expressed genes were selected as biomarkers to verify the mutations in peripheral blood. Results In training set, RNF213, KMT2D, CSMD3 and LRP1B genes mutated more frequently in early stage lung cancer (25cases) than benign nodules (18cases) (P<0.05). High expressions of RNF213 gene in lung cancers and low expressions in benign diseases were evaluated by immunohistochemistry. RNF213 gene mutated in 25% lung cancer samples in the validation set of 28 samples and showed high specificity (100%) and low sensibility (25.9%). In GGO and GGN patients, RNF213 mutated more frequently in early stage lung cancer compared to benign diseases (P<0.05). then incubated with the primary antibodies at 4 °C overnight. Primary antibodies against KMT2D (cat no. 27266-1-AP) and RNF213 (cat no. 21028-1-AP) were obtained from Proteintech Group (Wuhan, Hubei, China). Primary antibodies against LRP-1B (cat no. NBP2-49582) and CSMD3 (cat no. NBP1-86371) were purchased from Novus Biologicals (Centennial, CO, USA). Then, the slides were washed and stained with the secondary antibody (Goat anti-Rabbit IgG H&L (HRP), cat. no. ab205718; Abcam) and DAB disclosure, counterstained with hematoxylin, dehydrated and mounted. The results were evaluated independently by two independent pathologists.


Introduction
To distinguish early stage lung cancer from benign disease of the lung nodules, especially the lesions with ground-glass opacity (GGO) or ground-glass nodule (GGN), we assessed gene mutations of the ctDNA in peripheral blood by using targeted next-generation sequencing (NGS).

Methods
Single lung nodule patients without mediastinal lymph nodes or symptoms hardly diagnosed by chest CT and biomarker of lung cancer were enrolled. All patients received minimally invasive surgery but refused preoperative biopsy. Gene mutations of pre-operative blood samples were detected by targeted NGS. Mutations with statistical differences were screened in lung cancer and benign disease grouped by postoperative pathology. Gene expression was determined by immunohistochemistry.
Highly expressed genes were selected as biomarkers to verify the mutations in peripheral blood.

Results
In training set, RNF213, KMT2D, CSMD3 and LRP1B genes mutated more frequently in early stage lung cancer (25cases) than benign nodules (18cases) (P<0.05). High expressions of RNF213 gene in lung cancers and low expressions in benign diseases were evaluated by immunohistochemistry.
RNF213 gene mutated in 25% lung cancer samples in the validation set of 28 samples and showed high specificity (100%) and low sensibility (25.9%). In GGO and GGN patients, RNF213 mutated more frequently in early stage lung cancer compared to benign diseases (P<0.05).

Conclusions
RNF213 gene mutation was observed more frequently in early stage lung cancer, but rather than benign nodules. Mutation of RNF213 gene in peripheral blood may be a high specificity biomarker and valuable for early diagnosis of lung cancer.

Background
Lung cancer remains a life-threatening malignancy with the highest morbidity and mortality in the world [1] . Five-year survival of lung cancer patients is still low [2,3] despite the using of molecular diagnosis and targeted therapy currently. Early diagnosis and treatment are effective ways to improve the survival of lung cancer patients. Using low dose computed tomography (LDCT) in screening could reduce lung cancer related mortality, and smaller lung nodules could be found in early stage. But the diagnosis may be difficult in some cases with atypical CT imaging, and traditional biomarkers such as carcino-embryonic antigen (CEA), neuro-specific enolase (NSE) and cytokeratin 19 (CYFRA-211) could not be satisfied for early diagnosis. Aspiration biopsy or surgery may be needed in most patients to confirm whether the nodules are malignancy or benign disease.
The ideal diagnostic method should be simple, less traumatic, easy to obtain and high positive rate.
Circulating cell-free DNA (cfDNA) is a fragment of DNA released through cell apoptosis widely existing in blood, cerebrospinal fluid, urine and saliva [4 , 5] . As cfDNA could also be released by tumor cells through apoptosis and necrosis [6,7] , this DNA is called circulating tumor DNA (ctDNA). Liquid biopsy of the blood ctDNA detection is important in the diagnosis, monitoring and prognosis of the tumor [8] .
The patient's ctDNA is more meaningful to better understanding the disease. CtDNA reflecting the character of somatic genetic features of the primary tumor [9] can be detected in the peripheral blood of patients with advanced cancers, and be used for monitoring therapeutic effect [10,11] . The content of plasma ctDNA accounts for 0.01% of cfDNA [12] . Studies [13] have indicated that the concentration of ctDNA in the plasma increases with stage probably because of the increasing of tumor burden.
Very low level of detectable ctDNA in plasma and unknown mutations limited the potential application in diagnosis of early stage lung cancer.
With the development of sensitivity of next-generation sequence (NGS), the low-level concentration of the ctDNA in blood can be detected. At present ctDNA of advanced stage lung cancer has been studied in blood for monitoring therapeutic effect. Fewer studies were aimed at early stage lung cancer by detecting tumor DNA in tissue, or identifying mutations in ctDNA for lung cancer patients with limited number of genes [14] . Some lung cancer-related genes such as EGFR, ALK, and KRAS were usually used for targeted NGS in early stage lung cancer [15,16] . Only few genes from the panel were used for targeted NGS. In addition, healthy or benign nodule individuals need to be used as the control group. So far there was no study to address whether ctDNA can be detected in lung benign nodules or whether there are differences of ctDNA in undiagnostic lung nodules including early stage lung cancer and benign disease.
Here we study ctDNA through targeted NGS in small lung nodules that cannot be clearly diagnosed by chest CT. A panel of 560 tumor-related hot spot genes was used to evaluate the targeted sequencing for plasma ctDNA in malignant and benign lung nodules. We hope to find out discrepant ctDNA in the two groups to guide diagnosis in early stage lung cancer.

Patients
Patients with single lung nodules were diagnosed in 2017-2018 enrolled in the study came from the Second Hospital of Shandong University, Shandong Provincial Chest Hospital and The 960 th Hospital of People's Liberation Army of China. Lung cancer or benign disease could not be confirmed in the chest CT. The largest diameter of the lesion was less than 5cm in diameter and there were no involvement of mediastinal lymph nodes in CT imaging. Clinical stage was less than T2N0M0 (stage II, TNM stage 7 edition) if the nodule was considered to be lung cancer. In terms of preoperative routine examination, there were no metastatic lesions and no patients with other oncology history. Lung cancer related biomarkers such as CEA, NSE and CYFRA-211 could not help making definite diagnosis in the patients. All patients refused biopsy or it was difficult to obtain tissues for histologic diagnosis.
All patients accepted minimally invasive thoracoscopic surgery.

Study design
A training set was established. In accordance with uniform diagnostic criteria, inclusion criteria and exclusion criteria, 42 cases met the standard and passed the blood sample test in all 58 registered patients. Qualified paired samples were sequenced by targeted NGS with a panel of 560 tumorrelated hot spot mutant genes. Mutated ctDNA was analyzed in lung cancer and benign disease according to histopathological results. We selected significantly different ctDNA in lung cancer group compared to benign disease control group in the results. Immunohistochemical staining was performed in formalin fixed paraffin-embedded (FFPE) tissue samples of these patients to analyze the expression of the selected ctDNA. Finally, high expression of the selected ctDNA in lung cancer was confirmed. A validation set included unknown pathological lung nodules was established and sequenced by the same panel NGS to test the selected ctDNA mutations.

Blood sample preparation
10 ml peripheral blood was sampled 1-3 days before operation. Blood samples in EDTA tubes were centrifuged for 10 minutes at 1600g at 4℃ and white blood cells were collected and stored. The supernatants from these samples were further centrifuged at 16,000 g for 10 min at 4℃, and plasma was collected and stored at −80℃ until use. White blood cell DNA was isolated using the DNA Isolation Kit for Mammalian Blood (Roche) and cfDNA was isolated using the QIAmp Circulating Nucleic Acid Kit (QIAGEN) according to the manufacturer's protocol. 10-50ng cfDNA was acquired from 1 ml plasma.

Genomic DNA preparation and targeted sequencing
The quality of genomic DNA about degradation and contamination was monitored on 1% agarose gel, while the concentration was measured by Qubit® DNA Assay Kit in Qubit® 2.0 Fluorometer (Life Technologies, Carlsbad, CA).
We designed probes on the website of Agilent about particular genes according the design description to get the target gene regions. Briefly, 180-280bp fragments were produced from fragmentation carried out by hydrodynamic shearing system (Covaris, Massachusetts, USA). Extracted DNA was then amplified by ligation-mediated PCR (LM-PCR), purified, and hybridized to the probe for enrichment.
Non-hybridized fragments were washed out subsequently. Both captured and non-captured LM-PCR products were subjected to real-time PCR to estimate the magnitude of enrichment. High-throughput sequencing was carried out at the average 1000× sequence depth when each captured library was loaded on an Illumina Hiseq4000 platform (Illumina, San Diego, California, USA). Each captured library was sequenced independently to ensure that each sample met the desired average fold coverage.

Sequence data quality control
The original fluorescence image files obtained from HiSeq platform were transformed to short reads (Raw data) by base calling and recorded in FASTQ format, which contained sequence information and corresponding sequencing quality information. Clean reads were acquired by excluding reads containing adapter contamination and low-quality/unrecognizable nucleotides. Downstream bioinformatical analyses were based on these clean data. At the same time, the total reads number, sequencing error rate, percentage of reads with average quality >20 and with average quality >30, and GC content distribution were calculated.

Reads mapping and somatic genetic alteration detection
Valid sequencing data were mapped to the reference human genome (UCSC hg19) by Burrows-Wheeler Aligner (BWA) software to get the original mapping results stored in BAM format [17] . Then, SAM tools [18] and Picard (http://broadinstitute.github.io/picard/) were used to sort BAM files and do duplicate marking, local realignment, and base quality recalibration to generate final BAM file for computing the sequence coverage and depth.
MuTect and Strelka softwares [19,20] respectively were used for calling somatic single nucleotide variations (SNVs) and small insertions and deletions (InDels) from paired tumor-normal samples. In addition to default filters, polymorphisms of somatic SNVs and InDels referenced in the 1000 Genomes Project [21] or Exome Aggregation Consortium (ExAC) [22] with a minor allele frequency over 1% were removed. Subsequently, VCF (Variant Call Format) was annotated by ANNOVAR software [23] .

StatisticalAnalysis
BWA, Samblaster and Sambamba softwares were used for comparing the sequenced data with the reference genome. MuTect software was used to search for somatic single nucleotide variation (SNV) mutation. Strelka software was used to search for somatic insertion-deletion (InDel). ANNOVAR software was used for anotating the structure and function of the detected variation. Lung cancer and benign disease were divided into two groups. CtDNA somatic SNVs in the two groups were analyzed by chi-square test. The general character data were analyzed by student t test or one way ANOVA.
Analyses were performed using the SPSS Statistics version 23 (IBM Corp). P value <0.05 was considered to be statistically significant difference.

Patients' general characteristics
In training set, we collected 58 pairs of blood samples. Samples of patients diagnosed with stage III lung cancer were excluded according to postoperative pathology. Rejecting contaminated genome and unqualified samples, a total of 42 pair samples were sequenced after being checked qualified.
The patients' general characteristics is shown in the table (Table 1) benign desease patients (26.2%) in the group with a diameter less than 3 cm, while those with a diameter of 3 to 5 cmwere 9 (21.4%) and 4 (9.5%). There was no significantly statistical difference (P>0.05) in the general character between the two groups including gender, smoking history and tumor size. The data of two groups are comparable.

Biomarkers detection
Blood biomarkers including CEA, NSE and CYFRA-211 were detected before surgery. In training set, the average vaules of these biomarkers were 3.04±1.64, 22.96±17.04, and 4.54±8.18 ng/ml in lung cancer, and 1.85±0.92, 20.20±7.02, and 1.66±0.86 ng/ml in benign disease, respectively, and there were no significant differences (P>0.05 for each comparison) in the two groups. In validation set, the average vaules of these biomarkers were 2.47±1. 30

Somatic mutation analysis
The number of mutated genes by targeted sequenced was total 246 in lung cancer and benign disease groups in training set. There were total 522 somatic mutations in the two groups including 374 somatic mutations dectected in lung cancer group and 148 mutations in benign disease group.  (Table 3).
LRP1B, KMT2D, RNF213 and CSMD3 gene mutation in lung cancer were more than that in benign disease, and there were significant statistical differences between the two groups (P<0.05). The WHSC1 gene mutated in 10 samples of lung cancer and 8 samples of benign disease, respectively.
The GDNF gene mutated in 7 samples in lung cancer and 1 samples in benign disease. There were less than 5 mutations of other genes detected in either one of the two groups. There were no significant statistical differences of these genes in two groups (P>0.05).

Immunohistochemical results
In training set, RNF213, LRP1B, KMT2D and CSMD3 genes were considered to be statistically significant difference in the analyzing of sequenced data in lung cancer compare to benign disease. Immunohistochemistry (IHC) was performed on specimens of 27 lung cancer and 14 benign disease FFPE tissues to detect the expression of RNF213, LRP1B, KMT2D and CSMD3. IHC was not carried out in one sample (B23) because the tissue was too little. After staining, taking photographs and evaluating, we show the representative illustrations of expressions of RNF213, KMT2D, CSMD3 and LRP1B in Figure 5. High and low expressions of the four genes were summarized ( Figure 5). High expressions of RNF213, KMT2D and CSMD3 were observed in lung cancer tissues, and low expressions of these genes were observed in benign disease tissues. There were significant statistical differences of these genes between the two groups (P<0.05), especially RNF213 (P<0.005). Low expression of LRP1B was observed in 26 lung cancer tissues and in 14 benign disease tissues. One of the lung cancer samples was high expression. This result was not significantly different (P>0.05).

Validation set results
There were 28 lung nodule patients enrolled the validation set without a definite diagnosis through the chest CT. The largest diameter of the lesion was all less than 3 cm in CT imaging. Malignant or benign nodules could not be confirmed in CT imaging and biomarkers. Blood samples were targeted sequenced in the same method. The number of the test genes RNF213, KMT2D, CSMD3 and LRP1B detected in lung cancer samples was 5, 5, 3 and 2 respectively. Twenty samples were confirmed to be lung cancer and 8 samples were benign nodules with postoperative pathology. RNF213 gene mutated in 25% lung cancer patients. KMT2D, CSMD3 and LRP1B genes mutated in 15%, 10% and 10% lung cancer patients, but KMT2D and CSMD3 genes were detected in 25% and 12.5% benign diseases ( Figure 6).

An analysis of GGO and GGN in all samples
We put all sequenced data of the 70 samples in an analysis to determine the veracity of the result.
There were 55 patients diagnosed GGO or GGN in chest CT including 36 early stage lung cancer and 19 benign disease. We detected RNF213 gene mutation in 10 (10/36, 27.8%) lung cancer samples and no samples in benign disease (P<0.05). All of these somatic mutations were missense mutation. The specificity of RNF213 gene mutation was 100% in diagnosis of GGO and GGN, and its sensibility was 27.8%.

Discussion
Early diagnosis and treatment are effective means to improve the survival rate. Small lesion of lung can be found in chest CT which is the most common and valid examination in use of diagnosis or screening in lung cancer [25,26] . Recently, early detection or screening with low dose computed tomography (LDCT) was shown to improve survival and reduce lung cancer specific mortality by the National Lung screening Trial (NLST) and other studies [27,28] .
Some lesions in CT are easy to diagnose lung cancer, some lesions are difficult to be identified for a lung cancer or benign disease especially when the lung nodule is small or imaging features are not typical [29] . For example, the ground-glass opacity (GGO) or ground-glass nodule (GGN) is usually adenocarcinoma in situ (AIS), minimally invasive adenocarcinoma (MIA) or atypical adenomatous hyperplasia (AAH) [30] . Higher false positive and false negative rate by using traditional biomarkers like carcino-embryonic antigen (CEA), neuro-specific enolase (NSE) and cytokeratin 19 (CYFRA-211) make the diagnosis more difficult [31,32] . These biomarkers are useless in early diagnosis of lung cancer. Aspiration biopsy may be needed to confirm the nodule is a malignancy or benign disease.
But, bleeding, pneumothorax, pain and possible diversion restrict its use in early diagnosis.
An ideal diagnostic method should be simple and convenient, safe and efficient. Using liquid biopsies to detect circulating biomarkers such as circulating tumor cell (CTC), Circulating tumor DNA (ctDNA) and exosome may offer a relatively simple method to analyze early stage tumor [33] . Detecting ctDNA in peripheral blood was used more commonly, while CTC was less commonly detected in early stage tumor.
Circulating cell-free DNA, a fragment of DNA which is released through cell apoptosis, widely exists in extracellular fluid such as blood, cerebrospinal fluid, urine and saliva [4,5] . The cfDNA of healthy people comes mainly from metabolism and cell apoptosis including bone marrow cells, lymphocytes, and normal tissue cells [34] . For patients with tumors, the cfDNA fragments of tumor cells known as circulating tumor DNA(ctDNA) also were released to peripheral blood through apoptosis and necrosis of tumor cells [6,7] . Plasma ctDNA, which is a fragment about 150-200 bp [35] containing genetic information about the tumor, is of great significance to the diagnosis, treatment and monitoring of the disease.
Circulating tumor DNA was used for monitoring therapeutic effect and prognostic prediction in treatment of malignancy because of ctDNA level in advanced stage tumor [36] .The levels of detected ctDNA increase correlate with the malignant progression [37,13] . Low level of ctDNA in early stage tumor makes the detection difficult. Early diagnosis can provide tremendous benefits for the treatment of patients with malignant tumors [13,38] . With the development of sequencing technology, low level ctDNA could be detected in blood more easily and accurately. More and more studies were applied this technology to investigate early stage tumors.
In the present study, we investigated ctDNA in early stage lung cancer and comparable benign disease diagnosed in chest CT. First, cfDNA could be detected in all lung benign disease samples as being reported in other solid tumors previously [39,40,41] . However, these studies did not consider the stage of malignancy compared to benign disease. The levels of cfDNA in malignancy related with the tumor burden such as tumor size, T stage and TNM stage [42] . Our data indicated that the level of cfDNA in early stage lung cancer was not significantly different with benign disease (0.53±0.66 ng/μl and 0.54±0.29 ng/μl, respectively; P>0.05). This result may be related to low tumor burden in early stage lung cancer, indicating that early stage lung cancers release low level cfDNA into blood stream similarly with lung benign disease. Cell apoptosis and necrosis from benign tumor or disease also cause cfDNA increasing.
Elevated cfDNA concentrations alone did not fully distinguish between lung cancer and benign disease. In our study, targeted NGS was implemented to detect ctDNA in these DNA samples. The panel used for NGS covered all known mutated genes in malignant tumor to investigate mutations in early stage lung cancer. We found a number of mutations were not related to gender, age, smoking history, tumor size, stage or pathology in two groups. Some genes mutated more frequently in lung cancer and others in benign disease. In training set, RNF213, KMT2D, CSMD3 and LRP1B genes mutated more frequently in early stage lung cancer than in benign diseases. There were 25.9% lung cancer patients showed RNF213 gene mutation and no one in benign disease patients. RNF213 gene has a high specificity in lung cancer and benign disease.
In order to make clear the protein expression of these four genes in tissues, we conducted immunohistochemistry of lung cancer tissues. RNF213, KMT2D and CSMD3 genes showed a higher expression than in benign disease samples especially RNF213. This differential expression in two groups may be due to the change in amino acid caused by genetic changes.
Finally, a verification experiment was carried out to study the diagnosis effect of these four genes.
RNF213 gene mutated in twenty five percent of lung cancer patients and not mutated in benign diseases, but KMT2D, CSMD3 and LRP1B mutated less in both two groups. The same high specificity of RNF213 gene mutation was showed in the validation set, although the difference was not statistically significant, probably due to a small number of samples. In all 70 samples including training set and validation set, RNF213 gene mutation was significantly different compared with lung benign diseases. In addition, we investigated RNF213 in all GGO and GGN patients of the study. There were 27.8% lung cancer samples showed RNF213 mutation and no samples in benign disease (P<0.05) with the similar high specificity. A larger number of randomized controlled samples need to be studied to further confirm these results.
RNF213 gene, known as ring finger protein 213, encodes a protein containing a RING finger domain [43] . It was found in some malignant tumors such as ovarian cancer, gastric cancer and liver cancer [44,45,46] , yet there were a few studies about RNF213 gene mutation in malignant tumors.
RNF213 has been reported that it may be a tumor suppressor in malignancy [47] . We first found RNF213 gene mutation in ctDNA of early stage lung cancer, and it was significantly statistical difference compared with lung benign disease. The missense mutation of RNF213 changed the amino acid thus affecting the protein function. This gene mutation resulted in a loss of its function of tumor suppressor and promoted tumor development and progression in lung cancer. The novel mechanisms need to be lucubrated in the future.

Conclusions
In conclusions, the concentration of cfDNA cannot be a good biomarker in diagnosis of early stage lung cancer and lung benign diseases. RNF213 gene mutation of ctDNA may be used for molecular diagnosis of lung malignant and benign nodules through targeted NGS. The effect of KMT2D, CSMD3 and LRP1B should be further confirmed in more samples.
To the best of our knowledge, this is the first tie to report that RNF213 gene mutation in ctDNA was detected by targeted NGS in early stage lung cancer. It has high specificity of 100% and sensibility of about 26% in diagnosis or screening of lung cancer. A larger-scale randomized controlled trial is needed to verify this finding in the future. In addition, the underlying mechanisms of these genes causing recurrence and development of lung cancer need a further study. The results of our study would be useful in the diagnosis and treatment of early stage lung cancer.

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.    Figure 1 The   most frequently mutated gene was WHSC1. There were 10 and 8 samples mutated respectively in two groups (P>0.05). RNF213, KMT2D, CSMD3 and LRP1B were more frequently mutated in lung cancer than benign disease (P<0.05). The genes mutated less than three were not listed in the heat map (All data was shown in supplementary). Mutations of validation set. RNF213 gene mutations were detected in five lung cancer samples and no one in benign control group. RNF213 gene mutation has high specificity of 100% and sensibility of about 25%. It is consistent with the data in the training set.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download. Figure S1.tif Table S2.xlsx Table S1.xls