Modelling random DNA fragmentation
To begin our study we required a model that would accurately reflect the properties of a stochastically fragmented DNA sample. The odds that a region targeted by a PCR assay will be interrupted by a DNA breakage in randomly fragmented DNA depend on the length of the region and the size of the fragments. These odds are effectively determined by establishing two adjacent fragment-sized sliding windows (wherein the end of one fragment is the start of another) and calculating the number of times a region is fully within the first fragment window, compared to the number of times the region is situated within both windows (Fig. 1).
This model is represented in Eq. 1, which determines the probability that a region of DNA will remain unbroken for a given fragment length:
proportion intact = \(\frac{\text{f} – \text{r} + 1}{\text{f}},\) (1)
where r is the length of the DNA region and f is the length at which the DNA is fragmented. However, DNA samples do not fragment at a single length but rather as a distribution, and by incorporating size distribution profiles, which contain the concentration of DNA at each fragment length, the proportion of intact target regions within a fragmented DNA sample can be calculated, as detailed in Eq. 2:
proportion intact = \(\frac{\sum _{f=r }^{n}\frac{\text{f} – \text{r} + 1}{\text{f} }{ C}_{f}}{\sum _{f=m }^{n}{C}_{f}}\), (2)
where n is the length of the longest fragment within the sample, m is the length of the shortest fragment, and Cf is the concentration of each fragment length (i.e., pg/µl).
Designing universal PCR assays for fragmented clinical cancer samples
We next sought to design qPCR and ddPCR assays that could be used to interrogate DNA fragmentation. A major focus of this assay design was to incorporate design elements that would enable the assays to be used on clinical cancer samples, as these samples are some of the most common types to undergo stochastic fragmentation. However, cancer samples are also prone to chromosomal amplifications and deletions within the genome16–18, and PCR assays that intersected with frequently amplified/deleted regions would result in inaccurate measures of concentration when these copy number aberrations (CNAs) occurred (i.e., the concentration of a region that is unique in the human reference genome is assumed to correspond to the overall number of genome copies within the measured sample). To control for this, we undertook an analysis to determine the regions of the human genome that were least affected by CNAs. CNA data that had been tested for statically significant gain or loss was retrieved from the Catalogue of Somatic Mutations in Cancer (COSMIC release v78)19,20. This data was filtered to exclude cell line samples and samples missing total copy number or minor allele values. Only 27 of the 10,637 samples remaining after this filtering were not derived from The Cancer Genome Atlas (TCGA) data21. We, therefore, opted to exclusively use these 10,610 TCGA samples to better ensure a dataset with experimental and analytical consistency in determining copy number changes (S1 Table).
After filtering out regions that were not covered by Affymetrix copy number probes (e.g., centromeres) the only regions completely devoid of CNAs were telomeric and likely artefactual. Outside of telomeres the minimum CNA region contained 5 samples. To determine a reasonable threshold for low copy number variation that might provide us with enough region space to meet the requirements of our assay design, we calculated the number of samples with CNAs in commonly used copy number reference genes. We found that the “Human TaqMan® Copy Number Reference Assays” targeting RNase P and TERT offered by Applied Biosystems had CNAs in 61 and 360 of the total 10,610 samples, respectively, and the well-established standard reference gene RPP30 had CNAs in 23 samples. Based on this we set a threshold at the bottom 10th percentile of regions, excluding those where greater than 34 samples had significant copy number variation (Fig. 2A). After applying this filter, we were left with 621 megabases across 858 non-contiguous regions on 22 chromosomes.
We next designed a single-tube 4-plex quantitative PCR assay targeting these CNA neutral regions, which included a variety of design considerations to maximize the utility of the assay and minimize confounding effects. First, each assay would target a separate chromosome to minimize inaccurate quantification due to the remote possibility that one of the chromosomes, or at least a large portion, may be affected by CNAs. Given the size and number of regions, the second design consideration was identifying assay regions that would be unaffected by bisulfite conversion treatment, since the bisulfite conversion process is used to examine DNA methylation and is a common application in cancer genomics but also leads to substantial sample fragmentation and loss. To address this design consideration the CNA neutral regions were further analysed to identify primer and probe regions that were cytosine-free and would, therefore, be unaffected by the bisulfite conversion process. Notably, use of the assays on bisulfite material requires an extra step in qPCR data analysis to correct for the fact that only one DNA strand is quantified, resulting in a positive shift of 1 cycle threshold when compared to the unconverted genomic DNA (gDNA) counterpart.
The third design criterion was to enable assessment of the degree of sample fragmentation using this 4-plex assay. To achieve this, two of the assays were designed to be 125 bp in length, and two were designed to be 175 bp long. By taking the ratio of concentrations for the long to short assays, a quantitative metric for sample fragmentation can be imputed for any sample.
Finally, we sought to establish the combination of fluorescent probe chemistries that would enable successful multiplexing quantitation using either standard qPCR or ddPCR. In qPCR four different probe fluorophores (FAM, HEX, Cy5 and Texas Red) were used, whereas ddPCR 4-plex was achieved using a method developed by Dobnik et al. (2016)22 that uses two FAM probes and two HEX probes and varies probe concentrations to alter the resulting levels of fluorescence amplitude, allowing for the detection of two targets per fluorescence channel (S1 and S2 Figs).
After all these design criteria were successfully implemented, we next undertook experiments to verify the amplification fidelity and efficiency of each of the four assays. The fidelity of the assays was established by performing standard PCR and qPCR on a variety of sample types (buffy coat DNA, cfDNA, and bisulfite-converted DNA) and analysing the PCR products by standard DNA gel electrophoresis to confirm that only a single PCR amplicon was produced in singleplex (S3 Fig), and that multiplex assays produced only two bands of the expected sizes (S4 and S5 Figs). Next, the amplification efficiencies of all assays were determined using LinRegPCR window-of-linearity analysis23, and standard titration curves; this was done for all four amplicons in both singleplex and multiplex configurations, for both genomic and bisulfite-converted DNA, using both fluorescent dye and PrimeTime qPCR probes in multiple fluorophore configurations (Fig. 2B-C and Table 1). Notably, all assays demonstrated > 90% amplification efficiency across all conditions, indicating robust performance. Primer and probe sequences can be found in S2 Table.
Table 1
Universal 4-plex qPCR amplification efficiencies.
Multiplex
|
Amplicon Size
(bp)
|
Assay
name
|
Chr
|
Fluorophore
|
DNA
|
Efficiency
|
Standard Curve
|
Window-of-Linearity
|
%
|
R2
|
Median (%)
|
MAD
|
Singleplex
|
125
|
UQ02
|
2
|
SYTO 9
|
genomic
|
93.1
|
0.997
|
94.5
|
3.6
|
bisulfite
|
96.7
|
0.997
|
102.0
|
1.6
|
Singleplex
|
125
|
UQ09
|
9
|
SYTO 9
|
genomic
|
93.3
|
0.999
|
104.1
|
1.2
|
bisulfite
|
97.3
|
0.997
|
97.7
|
3.0
|
Singleplex
|
175
|
UQ14
|
14
|
SYTO 9
|
genomic
|
91.3
|
0.997
|
96.9
|
1.2
|
bisulfite
|
90.3
|
0.993
|
112.9
|
1.1
|
Singleplex
|
175
|
UQ11
|
11
|
SYTO 9
|
genomic
|
90.7
|
0.998
|
103.9
|
2.8
|
bisulfite
|
92.1
|
0.997
|
103.4
|
3.3
|
2-plex
|
125
|
UQ02
|
2
|
FAM
|
genomic
|
92.3
|
0.999
|
91.6
|
4.8
|
bisulfite
|
92.0
|
0.993
|
95.9
|
1.3
|
175
|
UQ14
|
14
|
HEX
|
genomic
|
93.0
|
0.999
|
98.0
|
7.0
|
bisulfite
|
94.3
|
0.994
|
93.6
|
3.8
|
2-plex
|
125
|
UQ09
|
9
|
FAM
|
genomic
|
94.3
|
0.999
|
100.1
|
13.1
|
bisulfite
|
91.7
|
0.997
|
99.4
|
6.1
|
175
|
UQ11
|
11
|
HEX
|
genomic
|
93.8
|
0.999
|
92.7
|
8.0
|
bisulfite
|
94.0
|
0.995
|
87.9
|
8.8
|
4-plex
|
125
|
UQ02
|
2
|
FAM
|
genomic
|
95.3
|
0.997
|
105.8
|
6.5
|
bisulfite
|
98.2
|
0.991
|
102.8
|
4.2
|
UQ09
|
9
|
HEX
|
genomic
|
91.5
|
0.998
|
99.2
|
8.1
|
bisulfite
|
92.1
|
0.996
|
103.0
|
5.0
|
175
|
UQ14
|
14
|
Texas Red
|
genomic
|
94.2
|
0.998
|
96.8
|
3.5
|
bisulfite
|
92.6
|
0.992
|
112.1
|
8.5
|
UQ11
|
11
|
CY5
|
genomic
|
94.1
|
0.997
|
100.1
|
4.2
|
bisulfite
|
92.8
|
0.992
|
100.7
|
6.1
|
4-plex
|
125
|
UQ02
|
2
|
Texas Red
|
genomic
|
99.1
|
0.997
|
96.6
|
8.2
|
UQ09
|
9
|
CY5
|
genomic
|
101.7
|
0.996
|
103.3
|
4.5
|
175
|
UQ14
|
14
|
HEX
|
genomic
|
99.2
|
0.996
|
102.2
|
2.7
|
UQ11
|
11
|
FAM
|
genomic
|
102.1
|
0.997
|
108.5
|
2.6
|
Evaluating DNA fragmentation model with universal PCR quantitation assays
The [long]/[short] ratios of any two target region lengths can be determined by applying the following equation to fragment size distribution data (Eq. 3):
[long]/[short] = \(\frac{\sum _{f=b }^{n}\frac{\text{f} – \text{b} + 1}{\text{f} }{ C}_{f}}{\sum _{f=s}^{n}\frac{\text{f} – \text{s} + 1}{\text{f} }{ C}_{f}}\), (3)
where b is the length of the longer region and s is the length of the shorter region.
To evaluate how well our model of stochastic fragmentation fit with experimental results we compared [175bp]/[125bp] ddPCR and qPCR ratios with those derived using Eq. 3 on Agilent 2100 Bioanalyzer fragment size concentration data. This analysis was performed on seven levels of increasing fragmentation induced by the ultrasonication of pooled buffy coat gDNA. The ddPCR and qPCR [175bp]/[125bp] ratios of our sonicated samples both showed high goodness-of-fit for ratios derived using Eq. 3, with R-square values of 0.995 and 0.989 for ddPCR and qPCR, respectively (Fig. 3A-B).
Effects of fragmentation on DNA quantification
Quantification of DNA samples affects all subsequent experimental steps and can lead to costly experimental failures if this step is not performed accurately. Therefore, to further extend our study we next compared the effects of fragmentation on nucleic acid quantification techniques using our sonicated DNA samples, referred to here by their peak (modal) fragment sizes: 150, 195, 283, 694, 828, 1082, and 1504 bp.
One overlooked aspect of DNA fragmentation is that it results in fewer adjacent base pairs for fluorescent DNA dyes to intercalate when dye-based fluorometric methods are used. Thus predictably, and as other studies have noted1,2, the mean DNA concentration measured by fluorescence spectroscopy (Qubit 2.0) decreased with increasing fragmentation (p < 0.001; one-way ANOVA), with untreated gDNA measuring at 50.40 ng/µl (SD = 0.72), and the most fragmented sample (150 bp) at 35.27 ng/µl (SD = 2.14), which calculate to 14,400 (SD = 206) and 10,100 (SD = 613) genome copies, respectively, assuming 1 genome weighs 3.5 pg based on the following formula:
$$\text{A}\text{m}\text{o}\text{u}\text{n}\text{t} \left(\text{p}\text{g}\right) = \frac{\text{l}\text{e}\text{n}\text{g}\text{t}\text{h} \left(\text{b}\text{p}\right)\text{*}\text{p}\text{g}/\text{g}\text{*}\text{w}\text{e}\text{i}\text{g}\text{h}\text{t} \text{o}\text{f} \text{b}\text{p} (\text{g}/\text{m}\text{o}\text{l}\text{e}/\text{b}\text{p})\text{*}\text{c}\text{o}\text{p}\text{i}\text{e}\text{s} \left(\text{m}\text{o}\text{l}\text{e}\text{c}\text{u}\text{l}\text{e}\text{s}\right)}{\text{A}\text{v}\text{o}\text{g}\text{a}\text{d}\text{r}\text{o}{\prime }\text{s} \text{n}\text{u}\text{m}\text{b}\text{e}\text{r} (\text{m}\text{o}\text{l}\text{e}\text{c}\text{u}\text{l}\text{e}\text{s}/\text{m}\text{o}\text{l}\text{e})}$$
(4)
$$\text{A}\text{m}\text{o}\text{u}\text{n}\text{t} \left(\text{p}\text{g}\right) = \frac{\text{3,234,830,000}\text{*} {10}^{12} \text{*}650\text{*}1}{6.022\text{*}{10}^{23}}$$
For absorption spectroscopy (Nanodrop 1000), the mean measurement for intact gDNA was 68.40 ng/µl (SD = 1.97), which calculates to 19,600 (SD = 563) genome copies. Although there was no dose-dependent trend towards decreasing concentration with increasing fragmentation, a one-way ANOVA did show a significant difference in concentration (p < 0.001), and a Tukey's HSD test found the concentration of intact gDNA to be significantly higher than all seven levels of fragmentation (p < 0.001). The highest mean concentration measured for the fragmented gDNA was 63.10 ng/µl (SD = 0.79; 150 bp) and the lowest was 57.43 ng/µl (SD = 0.32; 283 bp), which calculate to 18,100 (SD = 226) and 16,400 (SD = 92) genome copies, respectively.
Both qPCR and ddPCR measured substantial downward trends in concentration with increasing fragmentation (Fig. 3C). This decline in amplifiable copies with increasing fragmentation reflects an increasing number of breakages in the targeted regions, the magnitude of decline being greater for the 175 bp amplicon as longer target regions are more likely to be cleaved. ddPCR on the intact gDNA measured 18,984 (SD = 765) and 19,058 (SD = 608) mean copies for the two 125 bp assays and 18,905 (SD = 308) and 19,306 (SD = 246) for the two 175 bp assays.
The mean absorbance spectroscopy estimate for the number of genome copies in our intact gDNA sample was only 2.8% greater than the combined mean of the four ddPCR assays (M = 19063, SD = 150). Whereas, the mean number of genome copies estimate for fluorescence spectroscopy was 25% lower, suggesting this method also underestimated intact, not just fragmented, DNA concentration. Our results, therefore, show that absorbance spectroscopy is the most accurate method for quantifying overall nucleic acid concentration, regardless of the degree of fragmentation. However, this technique lacks sensitivity and becomes increasingly inaccurate at the lower end of its analytical range (1–5 ng/ul)24. Absorbance spectroscopy is also highly susceptible to reporting falsely high concentrations due to protein contamination and/or phenolic compounds that absorb UV. PCR-based quantification is highly sensitive and most accurately measures the amount of amplifiable DNA at the amplicon length used. Our universal multiplex assay and accompanying online tool Fragment Calculator, which we detail in the following section, extends this ability to estimate the amount of amplifiable DNA of any given region length, while also providing an estimate of overall concentration when working with human genomic or bisulfite-converted DNA.
Fragment Calculator
In addition to describing the fragmentation of the sample, the dual 175 and 125 bp assays, combined with representative DNA samples, can also be leveraged to estimate the concentration of any other sized DNA region. To better enable this we designed the Fragment Calculator online tool to provide a more quantitative and actionable estimate of fragmentation (www.primer-suite.com/fragcalc). This tool uses measured 175 bp and 125 bp concentrations and the [175bp]/[125bp] ratio to estimate the average fragment length of a genomic or bisulfite-converted human DNA sample, the total number of genome copies in a measured sample, as well as the number of amplifiable (unbroken) instances of a DNA region of any length. This tool uses the fragment size distribution data of our seven sonicated DNA samples with average fragment lengths of 254, 291, 428, 493, 590, 745, and 1274 bp, a highly fragmented FFPE DNA sample with an average fragment length of 92 bp to represent the lower bounds of random fragmentation, and four gDNA samples with average fragment lengths of 6714, 15422, 34625 and 41496 bp for the upper bounds (S1 File).
The number of intact copies of an input DNA region length is estimated by taking the two [175bp]/[125bp] ratios from our representative fragment size distribution data that an input [175bp]/[125bp] ratio falls between (x1, x2), calculating the corresponding [125bp]/[input size] ratios using Eq. 3 on these size distribution data (y1, y2), determining the slope between these points to estimate the [125bp]/[input size] ratio corresponding to the input [175bp]/[125bp] ratio, and dividing the 125 bp concentration by this ratio. For example, if the concentration measured for a fragmented DNA sample is 1000 copies for the 125 bp amplicon and 700 copies for the 175 bp amplicon, the input [175bp]/[125bp] ratio is 0.7, which falls between the [175bp]/[125bp] ratios of the 291 bp (0.669) and 428 bp (0.778) reference samples. To estimate the concentration of a 50 bp region, for example, the corresponding [125bp]/[50bp] ratios determined using Eq. 3 are 0.585 and 0.707, for the 291 bp and 428 bp reference samples, respectively. The 50 bp concentration is then calculated using the following linear equation:
$$\text{y}=mx+{y}_{0},$$
5
where m is the slope and y0 is the y-intercept. The number of genome copies is also estimated using this same method by dividing the input 125 bp concentration by the [125bp]/[1bp] ratio. Similarly, the average fragment length is estimated using the [175bp]/[125bp] ratios from our fragment size distribution data (x1, x2) and their corresponding average fragment lengths (y1, y2) (Fig. 4).
Importantly, Fragment Calculator assumes fragment distributions for the samples being estimated to be similar to those of our representative samples. However, in our experience working with these assays, we have found FFPE samples do not behave like untreated DNA samples. The [175bp]/[125bp] ratio for FFPE samples is generally much lower than the ratio calculated from the size distributions of these samples using Eq. 3. This reveals that there is generally less amplifiable DNA in FFPE samples than their size distribution profiles suggest, which we hypothesise is likely due to a combination of single-stranded breaks and incomplete reversal of DNA crosslinking. Our assays are, therefore, a better indicator of the amount of amplifiable FFPE treated DNA than fragment size distribution data from microfluidic capillary electrophoresis instruments like the Agilent 2100 Bioanalyzer.
Further complicating this, however, is evidence that even regions of the same length can have substantially different concentrations of amplifiable FFPE treated DNA. Some of our routine quality control and quantification analyses of FFPE treated samples have revealed vast differences in the number of copies measured by the two 125 bp assays, and these differences are consistent among numerous FFPE samples (S2 File). Despite assays having the same length amplicons, differences in the number of amplifiable copies are likely to occur at high degrees of fragmentation, for instance, due to differences in binding efficiencies among primers when their target regions are truncated. Indeed, we regularly observe statistically significant differences in the number of copies measured by assays of the same size in highly fragmented pooled buffy coat gDNA samples subjected to ultrasonication, some examples of which are forthcoming. However, these differences are relatively small in magnitude and may be due to sequence-specific biases in sonication-induced scission25,26. We hypothesise that the much greater differences we observe in FFPE samples may emerge due to differences in the degree to which crosslinking is reversed among regions, as well as potential differences in their susceptibility to DNA breakage. These differences may reflect an underlying nucleosome footprint given that formaldehyde cross-linking is more efficient in nucleosome-bound DNA, as evidenced by the FAIRE-Seq (Formaldehyde-Assisted Isolation of Regulatory Elements) technique27.
Universal multiplex comparison of bisulfite conversion kits
Since PCR-based assays that target both genomic and bisulfite-converted DNA provide more accurate measures of bisulfite conversion recovery than other quantification techniques28, we next assessed the performance and utility of our universal multiplex assay to compare the recovery and degree of fragmentation of three commonly used commercial bisulfite conversion kits (MethylEasy Exceed, EZ DNA Methylation-Gold, and EZ DNA Methylation-Lightning) across three starting concentrations (500, 50 and 5 ng) using high molecular weight (HMW) gDNA.
A three-way ANOVA on the qPCR results found significant effects of starting concentration (p < 0.001), assay (p < 0.001), and conversion kit (p < 0.001) on recovery (Fig. 5A). Additionally, a significant interaction was found between starting concentration and kit (p < 0.001), resulting from an increase in recovery with decreasing concentration in MethylEasy Exceed but a decrease in EZ DNA Methylation-Gold and EZ DNA Methylation-Lightning. Trends were similar for the 125 bp and 175 bp assays, except in MethylEasy Xceed where the proportional increase in mean recovery between 50 ng and 5 ng was greater in 125 bp assays (22%, SD = 12 vs. 32%, SD = 5) compared to the 175 bp assays (16%, SD = 10 vs. 20%, SD = 5; Fig. 5B). As for fragmentation, a two-way ANOVA found a significant effect of conversion kit on the [175bp]/[125bp] ratio (p < 0.001), no significant effect of starting concentration (p = 0.251), but a significant interaction between kit and concentration (p = 0.027) arising from a decrease in the [175bp]/[125bp] ratio of MethylEasy Xceed with decreasing starting concentration.
Due to the low starting concentration and recovery of the 5 ng samples, we did not have enough sample left for ddPCR analysis and therefore only ran the 500 ng and 50 ng samples. In addition to the three commercial kits, we also included our in-house bisulfite conversion protocol in these ddPCR comparisons (Fig. 5C). A three-way ANOVA showed similar results to the qPCR analysis, with significant effects of starting concentration (p < 0.001), assay (p < 0.001), and conversion kit (p < 0.001) on recovery, and a significant interaction between kit and concentration (p = 0.001). Similar to qPCR, this interaction resulted from declines in the mean recovery of similar proportions between 500 ng and 50 ng in all kits except MethylEasy Xceed, which showed a mild increase (13%, SD = 4 vs. 16%, SD = 11). A two-way ANOVA found a slight statistically significant difference in the [175bp]/[125bp] ratios among conversion kits (p = 0.033), which a Tukey's HSD test showed resulted from a significant difference (p = 0.048) between DNA Methylation-Lightning (M = 0.83, SD = 0.05) and MethylEasy Xceed (M = 0.75, SD = 0.07). Our in-house method and DNA Methylation-Gold had mean ratios of 0.77 (SD = 0.04) and 0.81 (SD = 0.07), respectively. To estimate the absolute nucleic acid recovery and average fragment size after bisulfite conversion we used our Fragment Calculator tool on combined qPCR and ddPCR results (Table 2).
Table 2
Recovery and fragmentation of bisulfite kits using universal quantification 4-plex.
Sample
(ng)
|
Conversion Kit
|
175 bp % recoverya
|
125 bp % recoverya
|
[Big]/
[Small]
|
Total % Recoveryb
|
Average size (bp)b
|
500
|
EZ DNA Methylation-Gold
|
52 ± 6
|
61 ± 5
|
0.85
|
90
|
635
|
EZ DNA Methylation-Lightning
|
38 ± 5
|
45 ± 3
|
0.84
|
69
|
574
|
MethylEasy Xceed
|
15 ± 6
|
19 ± 7
|
0.79
|
31
|
467
|
50
|
EZ DNA Methylation-Gold
|
39 ± 9
|
46 ± 9
|
0.84
|
68
|
617
|
EZ DNA Methylation-Lightning
|
31 ± 11
|
36 ± 12
|
0.85
|
54
|
636
|
MethylEasy Xceed
|
15 ± 9
|
21 ± 12
|
0.72
|
41
|
362
|
aCombined average of qPCR and ddPCR of both assays of the same length; ±, standard deviation
bEstimate using Fragment Calculator tool
|
Effects of amplicon size and nucleosome positioning on PCR-based cfDNA quantitation
Snyder et al. (2016)11 identified nucleosome protection peaks using deep sequencing of pooled cfDNA samples. Implicit in these analyses is the fact that nucleosome position correlated with the enrichment of fragments at specific locations, which could only occur if nucleosome positions were at least somewhat conserved among people. However, it was unclear the extent to which these peaks might shift between individuals. If little movement occurs and peaks are instead universally conserved, this would have important implications for assay design. Targeting such peaks would maximise an assay’s sensitivity in cfDNA while failing to consider nucleosome protection could severely reduce sensitivity.
Snyder et al. (2016)11 calculated a Windowed Protection Score (WPS) for each nucleotide position within the mappable human genome by summing the number of sequenced 120–180 bp cfDNA fragments that wholly overlap a centred 120 bp window and subtracting the number that truncate within this window. Peaks in nucleosome-mediated protection were then called by identifying contiguous regions of elevated WPS. Using the nucleosome protection peaks determined for the pooled healthy sample CH01, we designed two cfDNA assays targeting nucleosome protection peaks that could also be used for bisulfite-converted DNA material: a 95 bp assay targeting chromosome 2 (cfUQ02) with an above-average WPS of 108 and maximum distance of 62 bp from the local maxima, and a 100 bp assay targeting chromosome 11 (cfUQ11) with a below-average WPS of 30 and a maximum distance of 56 bp. The mean WPS of the nearly 13 million peaks identified in the CH01 sample is 63.7 (SD = 41.4). We also designed several staggered PCR assays of varying lengths to flank each of these regions.
15 cfDNA samples isolated from the blood plasma of breast cancer patients were profiled using dye-based ddPCR to compare the number of amplifiable copies of our universal cfDNA assays along with these staggered assays. We observed that some samples displayed substantial differences in amplifiable copies among assays whereas others did not and that this appeared to coincide with the technique used for cfDNA isolation. We measured the fragmentation profiles of these samples and found 6 displayed characteristic ~ 166 peaks with no sign of HMW contamination, which we thus classified as true cfDNA (Fig. 6A), 6 had little to no cfDNA peak and were reclassified as contaminating HMW DNA (Fig. 6B), and 3 had strong cfDNA peaks but also possible or likely contamination by HMW DNA and were excluded from analysis (S7 Fig). Although high levels of HMW DNA can occur in cfDNA due to non-apoptotic cell death (e.g., necrosis), we suspect the source in these samples was instead the result of poor plasma separation and extraction. Regardless of its source, we only expect to find nucleosome-mediated patterns of fragmentation in the DNA of apoptosed cells, and HMW DNA is likely to obscure these patterns.
To normalise among samples of the same category the concentration measured for each assay was divided by the mean concentration of all assays within each region (chr11 or chr2), giving a ratio to mean copies (assay/sample mean). For ddPCR on HMW gDNA, all assays specific for unique regions should measure the same number of copies within the same sample. Therefore, the ratio of copies measured for a single assay to the mean copies of all assays should be 1:1 for intact gDNA, regardless of proximity to nucleosome peaks. Consistent with this, a one-way ANOVA on samples classified as contaminating HMW DNA found no statistical difference in ratio to mean copies among assays in the chr11 (p = 0.100) and chr2 (p = 0.239) regions. HMW gDNA samples extracted from the blood of 15 healthy individuals were also used as negative controls and similarly showed little variation in ratio to mean copies among assays. No significant difference was found among assays in the chr2 region (p = 0.084). However, a significant difference was detected in the chr11 region (p = 0.004), resulting from a minor effect of amplicon length on the number of amplifiable copies (Fig. 6C). A similar trend appears to exist in the contaminating HMW DNA; however, its effects likely did not reach statistical significance due to the smaller sample size (6 vs. 15).
In contrast, the ratio to mean copies for cfDNA decreased with increasing distance from the nucleosome peak, with the highest ratio for each region being our universal cfDNA assays (cfUQ11 and cfUQ2). However, given that cfDNA is highly fragmented, differential amplicons sizes are likely to result in differences in the number of amplifiable copies, therefore confounding the effects of nucleosome protection. To control for this we used ultrasonication and gel purification to produce a blood pooled gDNA sample with a similar level of fragmentation as cfDNA, which we measured in four technical replicates for each assay to compare the effects of random fragmentation on the number of amplifiable copies. In the chr11 region, which had the greatest variance in amplicon size among assays, similar ratios were observed in the sonicated DNA and cfDNA for each assay tested (Fig. 6D). A two-way ANOVA comparing these two sample types found a significant difference among assays (p < 0.001) but no statistically significant interaction between sample type and assay, signifying that only the cfDNA level of fragmentation, and not nucleosome protection, was affecting the number of amplifiable copies (p = 0.637). These results show that even small differences in amplicon length can have a significant impact on the number of amplifiable copies at such high levels of fragmentation but proximity to the nucleosome protection peak is likely providing little to no differential protection within this region.
Conversely, the assays targeting the chr2 region were far less variable in length and showed little difference in ratio to mean copies in the sonicated DNA, especially when compared to the cfDNA. A one-way ANOVA on the sonicated samples within this region did find significant differences in concentration ratios among assays (p = 0.001); however, the magnitudes of these differences were small, they did not track with differences in amplicon length, and they appear to result from a positional effect, perhaps resulting from a sequence-specific bias in fragmentation within this region. Unlike the chr11 assays, the ratio to mean copies for the chr2 assays tracked the distance from the nucleosome peak in cfDNA, rather than the amplicon length. A two-way ANOVA comparing the sonicated and cfDNA samples found a significant difference among assays (p < 0.001) as well as a significant interaction between assay and sample (p < 0.001), which supports cfDNA having an effect on the number of amplifiable copies in this region beyond that caused by its level of fragmentation on differently sized amplicons (Fig. 6E). Notably, a one-way ANOVA on the cfDNA samples showed no significant difference (p = 0.495) in ratio to mean copies (M = 1.00, SD = 0.10 vs. M = 0.97, SD = 0.13) for the two assays with the most similar maximum distances from the nucleosome peak (92 and 99 bp) and only 1 bp difference in length (100 vs. 99 bp.). Whereas, the 50 bp distance (92 vs. 142 bp) separating the two 100 bp amplicons resulted in a significant decrease (M = 1.00, SD = 0.10 vs. M = 0.78, SD = 0.07; p < 0.001), and the 95 bp universal cfDNA assay with the smallest maximum distance from the nucleosome peak (62 bp) had a significantly higher ratio (M = 1.25, SD = 0.06) than each of the other three assays (p < 0.001; Tukey’s HSD). Despite HMW contamination, the three samples with substantial cfDNA size peaks excluded from this analysis also revealed differences in copies among assays that match a nucleosome-mediated fragmentation pattern in the chr2 region (S3 File).
To further explore and confirm these results we designed probes for one flanking assay per region (in addition to the probes already designed for the cfUQ11 and cfUQ02 universal cfDNA assays), selecting those with the greatest difference between the sonicated and cfDNA samples. Where necessary, the forward or reverse primer for each assay was redesigned to normalise all amplicons to 100 bp while maintaining the same maximum distance from the nucleosome peak. We ran these assays in duplex ddPCR on cfDNA samples extracted from the blood plasma of 34 patients with colorectal cancer and 10 patients with brain cancer, as well as gDNA samples from the blood of 20 healthy donors and four technical replicates of the sonicated gDNA. We then calculated the ratio of copies for the assay furthest to the assay closest to the nucleosome peak (chr11 = [102bp]/[56bp] and chr2 = [142bp]/[62bp]) and compared the four sample types for each region. For the chr11 region, a one-way ANOVA found no significant difference between the ratios of the colorectal (M = 0.95, SD = 0.11) or brain cancer (M = 0.98, SD = 0.11) cfDNA, gDNA (M = 1.00, SD = 0.06), or sonicated DNA (M = 1.07, SD = 0.03) samples (p = 0.081). Although not significant, these differences tended towards a slight nucleosome-mediated protective effect (Fig. 6F).
Conversely, a one-way ANOVA found a significant difference (p < 0.001) among sample types for the chr2 region. A post hoc Tukey HSD test showed this difference was due to a drop in the [142bp]/[62bp] ratio in cfDNA, with gDNA (M = 1.00, SD = 0.09) and sonicated DNA (M = 1.00, SD = 0.03) being placed in one homogenous subset, and colorectal (M = 0.67, SD = 0.10) and brain cancer (M = 0.62, SD = 0.12) cfDNA placed in another (p < 0.001). These results strongly reinforce our previous findings, showing that, unlike the chr11 nucleosome peak, the chr2 peak provides substantial and consistent protection from fragmentation among individuals. Furthermore, comparison across these two regions revealed that the stronger chr2 protection peak resulted not only in greater protection than the weaker chr11 peak but greater degradation in the adjacent valley (Fig. 6G). A two-way ANOVA found significant differences (p < 0.001) in the ratio to mean copies between the four assays, and no significant interaction (p = 0.189) between the colorectal and brain cancer samples, indicating that the differences between assays were similar for these two cohorts. A Tukey HSD test showed significant differences between all four assays, with the chr2:142bp (M = 0.81, SD = 0.09), chr11:102bp (M = 0.95, SD = 0.06), chr11:56bp (M = 1.00, SD = 0.08), and chr2:62bp (M = 1.24, SD = 0.10) assays each being placed into separate homogenous subsets (α = 0.025). These results are consistent with cfDNA protection peaks being the result of nucleosome occupancy. As predicted, the protection peak with a low WPS provided weaker but more even protection within its occupied region and the peak with a high WPS provided greater but more narrow protection, thus validating the WPS metric that Snyder et al. (2016)11 applied in their analyses.