Whole exome sequencing of non-small cell lung cancer tumor samples
We performed WES on fresh-frozen sample at an average of 1289x and on FFPE sample at 1067x average. WES identified 1719 somatic variants in the fresh-frozen sample and 8201 somatic variants in the FFPE sample. Between the two samples, 1111 variants were concordant. The distribution of variants depended on sample type (fresh-frozen vs. FFPE). Nonsynonymous or frameshift variants that lead to altered amino acid sequence and protein function made up 11% of putative variants in fresh-frozen sample and 38% in the FFPE sample (Fig. 1b). There were 155 variants in the fresh-frozen sample and 2852 in the FFPE sample that were low-level variants with less than 5% VAF (Fig. 1c).
Cancer exome sequencing usually reports somatic sequence variants in genes with diagnostic, prognostic, or predictive clinical evidence or in genes from cancer pathways, gene families, or functional groups that are therapeutic agents’ targets. The FoundationOne CDx panel provides a comprehensive list of 324 such genes [22]. In this manuscript, we define cancer-related genes to be those included in the FoundationOne CDx panel. Among all WES-detected variants, 37 from fresh-frozen sample and 176 from FFPE sample were found in cancer-related genes; of those, 6 in fresh-frozen sample and 60 in FFPE sample were low-level variants having VAF < 5% (Fig. 1c). In low-level variants from cancer-related genes, there were 2 nonsynonymous and 1 frameshift variant in fresh-frozen sample and 33 nonsynonymous and 3 frameshift variants in FFPE sample (Fig. 1d). The reason that FFPE sample showed more low-level variants than fresh-frozen is perhaps due to DNA damage occurring during fixation and storage, such as cytosine deamination and oxidation [23, 24, 25]. Deamination of nucleotides causes C:G > T:A changes while oxidative DNA damage causes C:G > A:T changes. Although DNA extracted from FFPE sample was enzymatically repaired before experiments, unrepaired DNA damages may still present at a low-level and be detected as base changes. For those 66 low-level cancer-related variants in fresh-frozen or FFPE samples, we further examined their presence with BDA qPCR/Sanger analysis, along with another 160 randomly selected low-level variants.
BDA qPCR/Sanger confirmation of WES-called putative variants with VAF < 5%
Among the < 5% VAF variants, we selected 94 from fresh-frozen sample and 132 from FFPE sample for confirmatory BDA qPCR/Sanger analysis, including all WES-detected low-level variants (6 in fresh-frozen, 60 in FFPE sample) of cancer-related genes from the FoundationOne CDx list. The lowest VAF among all WES-called variants was 0.5%; thus, because BDA assays have an analytical LoD of 0.1%, putative variants were examined at 0.5%-5% VAF.
NGSure BDA designer takes the input of the chromosome position and considers bioinformatic features such as amplicon GC content, the presence of pseudogenes and/or common germline SNPs near the priming/blocking regions (Fig. 2a). A BDA design contains a forward primer, a reverse primer and a blocker. The blocker is designed to be complementary to the known template sequence and contains an overlap region with the forward primer, forcing competition for binding. In templates containing variant, there is a resulting mismatch bubble when the blocker binds to the variant region. This weak binding allows the primer to displace the blocker and thereby promotes selective amplification of the variant sequence. This results in the differential amplification of the variant, yielding over 1000-fold enrichment when compounded over multiple cycles [17] (Fig. 2b).
Figure 2c-e shows representative Sanger traces for 3 putative variants where customized BDA assays were applied. If the sequence of the locus of interest matched the WES call, the variant was confirmed (Fig. 2c). In this case, qPCR Cq values were used to estimate the VAF of the variant. If the sequence at the locus of interest matched the wildtype, the variant was disconfirmed (variant absent in the sample). If the sequence matched neither the wildtype nor the WES-called variant, the experiment was repeated at least twice to ensure the result accuracy. Figure 2e shows a case where the WES-called variant was Chr7:2944337 G > A, while Sanger sequencing of BDA amplicon revealed an insertion of base A instead of substitution of base A, indicating the variant was misidentified by WES. Of the 226 variants undergoing BDA analysis, 5 were classified as variants misidentified by WES. We examined these 5 variants using amplicon-based NGS with a median 5000x depth per amplicon and all 5 variants were validated to be concordant with BDA analysis. Figure 2f shows the IGV results validating the Chr7:2944336_2944337 insA variant detected by BDA Sanger sequencing (Fig. 2e). For all WES-misidentified variants, Sanger traces from triplicate BDA reactions and IGV pileups of amplicon-based NGS reads are presented in Supplementary Fig. S3. Note that NGSure custom assays do not need further verification by amplicon-based sequencing; BDA qPCR/Sanger is sufficiently accurate. Amplicon-based sequencing experiments in this study were used only to validate our method.
For loci where somatic variants are detected, the mean depth was 527x in fresh-frozen sample and 131x in FFPE sample (Fig. 3a). Of the 155 somatic variants in fresh-frozen sample and 2852 somatic variants in FFPE sample with < 5% VAF in WES, the mean depth was 824x for fresh-frozen sample (ranging from 95x to 2585x) and 243x for FFPE sample (1x to 1570x). Although both samples were sequenced for more than 1000x, the actual sequencing depth of FFPE-detected variants is significantly less. This may be caused by the uneven coverage of FFPE-derived DNA, primarily attributed to short fragment length. For fresh-frozen sample, 842M raw reads were obtained, where 535M (63.7%) were unique reads and target bases above 20x were 97.3%. For FFPE sample, 857M raw reads were obtained, where 181M (21.1%) were unique reads and target bases above 20x were 71.9%.
Among the 226 WES-detected low-level variants undergoing confirmatory BDA qPCR/Sanger analysis, the overall disconfirmed rate was 52% (117/226, Table 1). 35% (33/94) in fresh-frozen sample and 64% (84/132) in FFPE sample were disconfirmed. This indicates a high false-positive rate of WES in detecting low-level variants.
Table 1
Summary of WES-detected variant confirmation results by blocker displacement amplification (BDA).
| N | Variant Confirmed | Variant Absent | Variant Misidentified | Disconfirmed Rate |
FF | 94 | 61 | 33 | 0 | 35% |
FFPE | 132 | 48 | 79 | 5 | 64% |
Total Tested | 226 | 109 | 112 | 5 | 52% |
FF (cancer-related) | 6 | 1 | 5 | 0 | 83% |
FFPE (cancer-related) | 60 | 11 | 47 | 2 | 82% |
Total Tested (cancer-related) | 66 | 12 | 52 | 2 | 82% |
In the 66 cancer-related variants, the disconfirmed rate was as high as 82% (54/66, Table 1). The confirmed cancer-related variants by this study are listed in Table 2 (Sanger traces in Supplementary Fig. S1-S2). The 12 confirmed variants include 10 substitutions, 1 small deletion (2-bp deletion in dinucleotide repeats) and 1 large deletion (62-bp deletion). Confirmatory results for all 226 variants are listed in Supplementary Table S1.
Table 2
WES-detected low-level variants in cancer-related genes confirmed by blocker displacement amplification (BDA).
Sample | DNA Variant (GRCh38) | Gene | Protein Variant | Category | WES (Variant/total reads) |
FF | chr12:g.18381739_18381740del | PIK3C2G | - | Intron | 3.9% (15/382) |
FFPE | chr5:g.1293375G > A | TERT | p.Ser504Leu | Nonsynonymous | 1.7% (5/289) |
FFPE | chr7:g.2915318G > A | CARD11 | p.Arg920Cys | Nonsynonymous | 1.4% (4/296) |
FFPE | chr8:g.127740689G > A | MYC | p.Glu366Lys | Nonsynonymous | 2.1% (4/187) |
FFPE | chr9:g.130885328G > A | ABL1 | p.Arg1032Gln | Nonsynonymous | 3.7% (10/270) |
FFPE | chr10:g.8058457G > A | GATA3 | p.Val132Ile | Nonsynonymous | 1.7% (14/828) |
FFPE | chr13:g.109783068C > T | IRS2 | p.Gly996Ser | Nonsynonymous | 1.4% (5/361) |
FFPE | chr19:g.11010434G > A | SMARCA4 | p.Arg726His | Nonsynonymous | 3.3% (7/209) |
FFPE | chrX:g.53194684G > A | KDM5C | p.Arg1165Cys | Nonsynonymous | 4.5% (9/198) |
FFPE | chr1:g.36466451_36466512del | CSF3R | p.Ser813ProfsTer48 | Frameshift | 3.5% (3/86) |
FFPE | chr15:g.90085380G > A | IDH2 | p.Gly325= | Synonymous | 3.1% (7/224) |
FFPE | chr19:g.42273661G > A | CIC | p.Ser626= | Synonymous | 2.2% (15/676) |
In fresh-frozen sample, variants > 3% VAF show a better confirmed rate (87%), compared to that of variants ≤ 3% VAF (50%). While in FFPE sample, all VAF brackets have a low confirmed rate (overall 36%) (Fig. 3b). This is likely due to the sequencing depth differences. Among the 155 somatic low-level variants from fresh-frozen sample (mean depth 824x), we unbiasedly selected 94 variants for BDA analysis. The mean depth for selected variants was 774x, with minimal depth 95x. All variants had at least 4 variant reads in WES. Among the 2852 somatic low-level variants from FFPE sample (mean depth 243x), we unbiasedly selected 132 variants for BDA analysis. The mean depth for selected variants was 200x, with minimal depth 43x. 131/132 variants had at least 3 variant reads in WES. In FFPE sample, the relatively low sequencing depth in the called variants hinders accurate variant calling, thus the reliability of FFPE-called variants is less than in fresh-frozen.
The disconfirmed rates across different variant functional categories were similar (Fig. 3c). For fresh-frozen sample, the disconfirmed rate of nonsynonymous, frameshift, synonymous, intron, and regulatory variants ranges from 32% (nonsynonymous) to 43% (frameshift). For FFPE sample, the disconfirmed rate across different categories ranges from 59% (nonsynonymous) to 77% (regulatory).
The disconfirmed rates by different alteration types varied widely. The A > C/T > G variant had the highest disconfirmed rate in both the fresh-frozen and FFPE samples (Fig. 3d). In the FFPE sample, the C > T/G > A variant accounted for 60% (79/132) of WES-detected variants and 92% (44/48) of the confirmed variants; the disconfirmed rate was significantly lower than other alteration types. The results agree with the fact that the FFPE samples represent cytosine deamination events which cause increased C > T/G > A variants [23, 24, 25].