Digital PCR (dPCR) and qPCR mediated determination of transgene copy number in the forage legume white clover (Trifolium repens)

Obtaining data on transgene copy number is an integral step in the generation of transgenic plants. Techniques such as Southern blot, segregation analysis, and quantitative PCR (qPCR) have routinely been used for this task, in a range of species. More recently, use of Digital PCR (dPCR) has become prevalent, with a measurement accuracy higher than qPCR reported. Here, the relative merits of qPCR and dPCR for transgene copy number estimation in white clover were investigated. Furthermore, given that single copy reference genes are desirable for estimating gene copy number by relative quantification, and that no single-copy genes have been reported in this species, a search and evaluation of suitable reference genes in white clover was undertaken. Results demonstrated a higher accuracy of dPCR relative to qPCR for copy number estimation in white clover. Two genes, Pyruvate dehydrogenase (PDH), and an ATP-dependent protease, identified as single-copy genes, were used as references for copy number estimation by relative quantification. Identification of single-copy genes in white clover will enable the application of relative quantification for copy number estimation of other genes or transgenes in the species. The results generated here validate the use of dPCR as a reliable strategy for transgene copy number estimation in white clover, and provide resources for future copy number studies in this species.


Introduction
The first downstream steps in production of genetically modified (GM) crop lines, involve identification of transformed events and determination of transgene copy number. Typically, events with low transgene copy number are preferred in order to avoid potential transgene silencing, and copy number variation due to segregation and recombination [1]. Furthermore, identification of plants homozygous for the transgene(s) is required as part of generation of stable transgenic lines. Historically, this task was performed by test crosses, which entails crossing the transgenic plants with non-transgenic plants in order to analyse segregation ratios [2,3]. This method is time consuming, given that it involves an extra generation for evaluating progenies.
Traditionally, transgene copy number has been determined by Southern blot [4]. Although the results generated using this technique are very reliable, Southern blot can be labour intensive [5,6]. As an alternative, qPCR has gained favour [6,7]. This technique enables high throughput copy number estimation in a shorter timeframe, with lower DNA quantities required than Southern blots [6][7][8]. The use of qPCR to identify homozygous plants for the inserted transgene is an advance over test crosses, as it is faster and less labour intensive [8]. Zygosity determination by qPCR in transgenic plants has been described in a number of crops [6][7][8]. However, difficulties differentiating homozygous and hemizygous transgenic plants using qPCR have been reported [9].
More recently, Digital PCR (dPCR) has been implemented for copy number determination. This technique involves partitioning the sample into a large number of reactions contained in small volumes, which can range from nanolitres to picolitres [10]. Some of these partitions will contain target nucleic acid and will be positive, whilst others will not and will be negative for the reaction. This information enables an estimation of the number of molecules, assuming a Poisson distribution [11]. Digital PCR was reported to be more accurate and sensitive compared to qPCR [5,12]. Furthermore, given the technique is highly accurate and less laborious than Southern blot, it has been advanced as a reliable substitute for Southern blot for transgene copy number studies in GM plants [5,13,14].
Copy number estimation using qPCR or dPCR are usually performed by relative quantification. Therefore, identification of one or more reliable reference genes is necessary. Ideally, a reference gene for copy number estimation should be single copy and not be subject to genotypic variation in copy number within the species [6]. Single copy genes have been identified for use as references in a number of species [5][6][7]. However, there are no detailed reports on searches for reference genes for copy number estimation in white clover.
In this study dPCR and qPCR were evaluated for their accuracy and precision in transgene copy number estimation in white clover. Transgenic plants analysed in this work contain the transgenes Isopenthenyl transferase (IPT), nodule enhanced Malate dehydrogenase (TrneMDH), Alfalfa mosaic virus coat protein gene (CP-AMV) in a single T-DNA ( Fig. 1) [15]. Optimized duplexed qPCR or dPCR reactions for estimating transgene copy number by relative quantification in white clover are described. Work on identification of a single copy gene for use as a reference for copy number estimation in white clover was also undertaken, and two suitable genes were identified. These findings provide a useful resource for future analyses of white clover.

Materials and methods
Transgenic plants used in this work were generated by Agrobacterium-mediated transformation using a binary plasmid containing the transgenes IPT, TrneMDH, and CP-AMV in a single T-DNA ( Fig. 1)  For selection of reference genes, the eight candidate genes in Table 1 were evaluated by qPCR for consistent amplification in simplex reactions. Reactions were performed using Qiagen QuantiTect Probe PCR kit (Qiagen) in a 20 µL final volume, 600 nM of forward and reverse primers, 200 nM of T terminator. Grey triangles represent XmnI restriction sites; black triangles represent EcoRV restriction sites; black bars marked with asterisks represent PCR amplification sites [15]  HEX-labelled probes, and 1 × of master mix. Reaction conditions were the following: 10 min at 95 ºC, and 40 cycles of: 10 s at 95 ºC, 30 s at 60 ºC, and 10 s at 72 ºC. Only genes whose reaction curves exhibited Cq values below 30 and a low variation among samples in the 30 events studied were selected. PDH and ATP-dependent protease genes, which exhibited consistent amplification by qPCR, and a low variation among samples, were evaluated by dPCR in duplex reactions with different primers and probes combinations directed to the respective transgenes. The performances of the reactions were ranked by estimating signal-to-noise ratios, calculated by dividing the mean fluorescence amplitude of positive droplets over the mean amplitude of negative droplets. Copy number of the insert was estimated by dPCR and qPCR. Duplex PCR were performed including an internal reference gene for both qPCR and dPCR. The single copy genes PDH and ATP-dependent protease were used as the internal references for dPCR assays, and only PDH for qPCR assays. Primers and probes were designed with primer3 [16], and annealing temperatures were manually set to 60 ºC for primers, and to > 68 ºC for fluorescently labelled probes. GC content was selected between 40 and 60%. Primers and probes sequences used are presented in Table 2. Sequence information of the remaining oligonucleotides evaluated is provided in Online Resource 1. For qPCR, reactions were performed using Qiagen QuantiTect Probe PCR kit (Qiagen) in a 20 µL final volume, 600 nM of forward and reverse primers, 200 nM of probes, and 1 × of master mix. Reaction conditions were the following: 10 min at 95 ºC, and 40 cycles of: 10 s at 95 ºC, 30 s at 60 ºC, and 10 s at 72 ºC. Probes were labelled with FAM and HEX for transgenes of interest and reference genes respectively. For dPCR, digested samples were prepared in dPCR reaction mixtures with four units of either EcoRV or XmnI enzyme, or samples were added to the mixture without enzyme for the undigested treatment. The three treatments, EcoRV and XmnI digested, and undigested, were run separately. Reaction mixtures were prepared in a 24 µL final volume, 600 nM of forward and reverse primers, 200 nM of probes, 12 µL of 2 × ddPCR super mix for Probes (Bio-Rad, Hercules, CA, USA), and 20-50 ng of DNA. Emulsified reaction droplets were generated using a droplet generator AutoDG™ (Bio-Rad) and a DG32 Automated droplet generator cartridge (Bio-Rad) containing a 20 µL reaction mixture and 70 µL of dPCR droplet generation oil (Bio-Rad) per well. The droplet emulsions generated were transferred to 96-well PCR plates and PCR performed according to the ddPCR super mix for Probes protocol, with some modifications: for 10 min at 95 ºC, and 40 cycles of: 30 s at 95 ºC, 1 min at 60 ºC with 0.2 ºC increments per cycle until 60 ºC, and then 30 s at 72 ºC in a T100 thermal cycler (Bio-Rad). The fluorescence of each thermal cycled droplet was measured using a QX100 droplet reader (Bio-Rad). Data was analysed using the Bio-Rad Quanta-Soft™ software version 1.7, and threshold for determination of positive and negative droplets was set manually. Samples whose accepted droplets were under 10,000 were not included in the analysis.
Two technical replicates per transgene were performed for dPCR. Coefficients of variation among samples were calculated as [(SD/mean)*100] where SD is the standard deviation. Estimated copy numbers from undigested samples

Results
Identification of a reference gene is integral to conduct copy number estimations by relative quantification. The ideal reference for these purposes is a single or low copy gene exhibiting a stable copy number in the studied species [6,17]. Given that there is limited genomic data available from white clover [18], identification of such a gene is challenging. With this objective in mind, BLASTN searches of the NCBI white clover EST database were performed to identify putative orthologues of single copy genes reported in Arabidopsis [19]. Eight white clover ESTs with high sequence identity to the genes from Arabidopsis were identified (Table 1). Primers were designed and evaluated by qPCR for consistent amplification (Cq values below 30 in undiluted samples) among the 30 transgenic events available.
These three genes were compared for their stability in copy number between samples. Variation of Cq values between the genes, calculated as ΔCq [20], was estimated for each sample. In this analysis, we assumed that the difference in the Cq of two genes remains constant in different samples only if the copy number of these two genes is constant among samples. To calculate the Cq difference, each gene was compared with the remaining two genes across the 30 samples. The SD of ΔCq for the 30 samples was then calculated and the mean of the SD was estimated in order to select the gene whose copy number was more stable among samples. Lowest SD of the ΔCq was observed between PDH and ATP-dependent protease (0.98), and higher values were observed when PDH or ATP-dependent protease were compared to Ribosomal protein (2.5 in both cases). Based on this data, PDH and ATP-dependent protease were selected and evaluated for use as references for copy number estimation by dPCR.
All the transgenic plants used in this work contain the transgenes Isopenthenyl transferase (IPT), nodule enhanced Malate dehydrogenase (TrneMDH), Alfalfa mosaic virus coat protein gene (CP-AMV) in a single T-DNA (Fig. 1) [15]. Duplexed reactions were optimized for transgene copy number estimation by qPCR and dPCR. Two pairs of primers directed to each of the three transgenes were designed and tested in duplex reactions. FAM-labelled probes were used to detect the transgenes, and HEX-labelled probes the reference gene. Amplification efficiency of primer/probe combinations for each transgene and reference gene was evaluated by estimating signal-to-noise ratios of dPCR results, by dividing the mean fluorescence amplitude of positive droplets over the mean amplitude of negative droplets. High signal-tonoise ratios are indicative of better separation between positive and negative droplet fluorescence signals, which enables a more reliable copy number estimation. Single primer pair and probe combinations with highest signalto-noise ratios per transgene were selected for the analysis. These were IPT1, TrneMDH1, and CP-AMV1 (Table 2 and Online Resource 2). All the primer-probe combinations directed to the target transgenes and the two reference genes exhibited signal-to-noise ratios higher than two (Online Resource 2), indicative of an optimal separation between positive and negative reactions [21].
Transgenes were digested with restriction enzymes for a better transgene copy number resolution (Fig. 1). Discrimination of one or more target transgenes in a single droplet is not otherwise possible by dPCR. Enzyme digestions within the transgene but outside the amplicon can resolve this issue, as they increase the probability of the target being contained in different droplets. For qPCR, primer efficiencies were estimated using the formula E = 10(− 1/slope) from a plot of Cq versus log cDNA dilution [22]. Primer efficiencies were 0.96 for IPT, 0.94 for TrneMDH, 0.93 for CP-AMV, 0.91 for PDH, and 0.98 for ATP dependent protease, which are near optimal values for relative quantification [23]. However, ATP-dependent protease exhibited a low R-squared of 0.84.
Transgene copy number was estimated using dPCR, by calculating ratios between concentrations of the target transgene and the reference gene. For qPCR, copy number was calculated by the ΔΔCt method [23]. Event 34, which exhibited a single T-DNA insertion, and low SD among replicates by dPCR, was selected from the 30 events available as the calibrator sample for estimating transgene copies by qPCR. Given that white clover is allotetraploid and has two sub-genomes [18], genes used as a reference, if single copy, would have 2 copies at a single locus in each sub-genome. Therefore, if amplification occurs in the two homeologous genes, the expected ratio between the reference gene and the transgene of interest will be 4:1 when there is a single transgene insertion.
In order to compare the correspondence between copy number estimates obtained using each of the reference genes, dPCR was performed on the transgenic events using both the ATP-dependent protease and PDH genes as references. Estimates of copy numbers for each of the transgenes using the ATP-dependent protease and PDH genes were highly correlated (R 2 = 0.96-0.99) (Fig. 2).
Copy number among the 30 transgenic events using either reference gene was estimated to be between 1 and 10 for the transgenic events by dPCR (Fig. 3). Forty three percent of the generated events exhibited putative single copy transgene insertions, whilst no events with transgene copy numbers below one were identified.
The coefficient of variation (%CV) of the estimated copy number was markedly lower for dPCR relative to qPCR in all events analysed, with some exceptions (Fig. 3). Mean %CV values for each transgene across events were between 1.8 and 3.7 fold lower in dPCR (5.3, 5.5, and 5.2 for IPT, TrneMDH, and CP-AMV respectively) than in qPCR (11.6, 10.3, and 19.4) (Online resource 3).
Results generated by qPCR and dPCR were compared, and accuracy and precision were estimated. A linear correlation was observed between transgene copy numbers estimations by qPCR and dPCR, with the highest R 2 value observed for the CP-AMV transgene (Fig. 4). The slopes of the linear functions were greater than one for all three transgenes. This is evidence of a tendency for higher estimates of copy number by qPCR over dPCR.
Furthermore, transgene copy number approximated integer values more closely when estimated by dPCR. This was observed in 19 cases for IPT, 18 for TrneMDH and 17 for CP-AMV, in 29 of the 30 events evaluated. The observation of values closer to integral represents a reduction in ambiguity when rounding copy number estimates up or down.

Discussion
An ideal reference gene for transgene copy number determination should be single or low copy, and exhibit minimal variation among genotypes within the species analysed [6,17]. The two candidate reference genes identified in this work exhibit characteristics validating their single-copy status. Almost half of the transgenic events analysed were revealed to have single copy T-DNA insertions when quantified using either of the genes as reference. This is in agreement with previous observations in white clover [24], and other plant species transformed by Agrobacterium [25,26]. Theoretically, if the candidate reference genes were present in more than one copy, transgene copy numbers below one would be observed. Thus, the absence of such events provides strong evidence of the single copy status of these candidate reference genes. Furthermore, the correspondence in copy number observed in the different events using ATPdependent protease and PDH provides evidence that the copy number of these two genes remains unchanged among the studied samples.
The observation of lower coefficients of variation by dPCR in this study provides evidence of the greater precision of dPCR, relative to qPCR, in copy number estimation. dPCR is usually described as more precise than qPCR [11,27]. The techniques were compared by Głowacka et al. [5] in tobacco where a similar trend was reported.
Despite the high levels of precision generally observed in qPCR results in this work, there have been uncertainties in discriminating hemizygous from homozygous plants in previous studies. Bubner et al. [9] identified a detection limit of two-fold for differentiation of hemizygous from homozygous plants, as in some cases SD for Fig. 2 Correlation of copy number estimations for each of the three transgenes when using the reference genes ATP-dependent protease versus Pyruvate dehydrogenase (PDH), measured by dPCR 1 3 Fig. 3 Comparison of copy number estimates with dPCR and qPCR. Copy numbers measured using dPCR and qPCR by relative quantification (left y-axis). dPCR data was generated using ATP-dependent protease as a reference, and qPCR data was generated using PDH as a reference. %CV dPCR and %CV qPCR are the coefficients of variation of measured copy numbers using either technique (right y-axis) ΔCq was above 0.5, which represents a twofold difference (2 (0.5 × 2) = 2). By contrast, Xu et al. [28] reported that the technique could be effective for zygosity determination in maize if some parameters are optimized. In our work, although there were some cases where SD was over 0.5, limiting differentiation between single and double copies, average SD of ΔCq values were 0.14 for IPT, 0.13 for TrneMDH, and 0.26 for CP-AMV using three replicates, indicative of high precision. However, ambiguous results were observed in some events, for example, events 2, 6, 9 and 33 in transgene TrneMDH, where it was not possible to discriminate between single and double copy T-DNA insertions. By contrast, unequivocal single copy T-DNA insertions were predicted by dPCR for these events. The occurrence of such events suggests that qPCR is suboptimal for definitive differentiation of hemizygous and homozygous events.
The results generated in this study provide a strong case for selecting dPCR over qPCR for copy number estimation. However, other factors need to be contemplated before a decision is made. Platform and reaction costs are significantly lower for qPCR [29]. Furthermore, for some platforms, more steps, and as a consequence more time, is required for setting up dPCR reactions [29]. On the other hand, the higher precision observed in dPCR could allow a reduction in the number of technical replicates, which would obviate the cost difference. More importantly, use of dPCR enables copy number estimation by relative quantification, without the need for a previously identified calibrator sample, necessary for qPCR [8,28]. An additional advantage of dPCR is that high primer efficiencies are not an absolute requirement for optimal assays [30]. By contrast, this parameter should approach 100%, and similar efficiencies need to be observed for the gene(s) of interest, and the reference gene, for reliable qPCR results [31].
Enhanced accuracy in the differentiation of homozygous and hemizygous transgenic plants is highly pertinent to the breeding of GMO plants. Given white clover is a highly self-incompatible allogamous species [32], crossing non-transgenic with transgenic plants is usually performed between populations to maintain heterozygosity in the genome [2,3]. This system entails inter-crossing hemizygous F 1 plants to obtain homozygous plants at F 2 [2,33]. Identification of homozygous plants at this step is typically carried out by test-cross, which is both labour intensive and time demanding. Thus, the establishment of a fast and reliable method for determining transgene zygosity in white clover at F 2 is highly relevant. The higher accuracy and precision observed in dPCR when compared to qPCR in the present study demonstrates that the technique is ideal for these purposes.
In this work two single copy genes were identified for use as references in white clover for transgene copy number estimation by relative quantification. We further described the optimization of single duplexed reactions for estimating transgene copy number using qPCR and dPCR. The information outlined here provides resources for future work in copy number determination in the species. The generally low variability observed among replicates in qPCR assays supports the use of this technique for a high throughput transgene copy number estimation in white clover, and we consider that the technique is highly suited to primary screening and identification of low copy number events at the T 0 generation. However, implementation of a transgene copy number screening methodology using dPCR would be the optimal strategy for high accuracy copy number determination, and to support the breeding of transgenic white clover varieties.