Agrobacterium tumefaciens-mediated transformation (AMT) is the most widely used method for generating transgenic plants. In nature, this soil bacterium transforms dicotyledonous plants by translocating part of its DNA, the transferred (T)-DNA, into plant cells, where it integrates into the plant’s genome1. Subsequent expression of the Agrobacterium genes causes crown gall disease. Within Agrobacterium the T-DNA is located on a Tumor-inducing (Ti) plasmid flanked by a repeated sequence of 25 bp, the Left and Right Border repeat (LB and RB). These sequences are recognition sites for the virulence proteins VirD1 and VirD2, which generate ssDNA breaks required to liberate T-DNA as a single-stranded DNA molecule, the T-strand2. The VirD2 protein remains covalently bound to the 5’ end of the T-strand3,4 and pilots it into the plant cell through the type 4 secretion system5 that is created by the Agrobacterium virulence program upon detection of wounded plant cells6. The T-DNA is subsequently imported into the nucleus7 where it integrates at a random position in the genome8. The molecular mechanism by which the T-DNA is integrated into the plant genome has remained enigmatic until recently when it was found for Arabidopsis thaliana that this process critically depends on polymerase theta (Pol θ)9, a host protein that acts in the repair of DNA double-strand breaks (DSBs) via end joining. Abundant genetic and biochemical research performed over the last few years has established that Pol θ facilitates repair of DSBs in a multitude of species by using (few) complementary bases in 3’ protruding ssDNA break ends to carry out DNA extension on one break end using the other end as a template10,11. This biochemical property of the enzyme combined with occasionally occurring primer-template switching provides an explanation for two characteristic features that are observed at sites where Pol theta-mediated end joining (TMEJ) of genomic DSBs takes place, i.e. micro-homology and so-called templated insertions. These features are also prevalent at the junctions of T-DNA integration sites9,12 – in plant transgenesis, templated insertions have also been described as “filler” sequences13,14. However, while TMEJ presents a logical model for connecting the 3’ end of a T-DNA to a potentially resected genomic break (Fig. 1a), the biochemistry of capture of its 5’ end has not yet been elucidated. It is currently also unknown how plant cells remove the covalently attached VirD2 from 5’ T-DNA ends to allow integration.
To study the capture of T-DNA by the Arabidopsis genome, and in particular the attachment of the RB, we developed an NGS-based method, which we termed TRANSGUIDE (T-DNA random integration site genome-wide unbiased identification), that allows us to identify hundreds of T-DNA-genome junctions (both LB and RB) in pools of root-transformed Arabidopsis cells (Fig. 1b). We employ custom-made software to filter for high-quality, reliable outcomes and annotate individual T-DNA integration junctions with respect to potentially relevant features, such as genomic position, loss of T-DNA or genomic DNA sequences, degree of microhomology, and absence or presence of filler DNA. The outcomes of this pipeline reliably represents in vivo biology: using PCR and Sanger sequencing on the input material we validated 23 out of 24 predictions (Supplementary Data 1).
Using this technology, we obtained a collection consisting of ~2200 RB-genome junctions and ~5100 LB junctions upon transformation of the Col-0 ecotype (Supplementary Data 2). Consistent with earlier findings15, these junctions are scattered across the entire genome, with the exception of the pericentromeric regions (Supplementary Fig. 1). Arguing for a prominent role for Pol θ in integration, we found (very similar) filler DNAs to be abundantly present at both RB- and LB-genome junctions (Fig. 1c), however, the percentages were not identical: 32 % fillers at RB-genome junctions versus 39 % at LB-genome junctions. Also, the degree of junctional microhomology (the median being 1 bp for RB- versus 3 bp for LB-genome junctions) and loss of terminal nucleotides was different between RB and LB (Fig. 1d, 1e and Supplementary Fig. 2). As microhomology usage and filler formation are hallmark features of Pol θ activity, these data suggest an unequal involvement of this enzyme in the attachment of the two T-DNA ends.
We previously found that Arabidopsis plants deficient for Pol θ (teb mutants) are completely recalcitrant to AMT, arguing for an essential role for Pol θ in genomic capture of T-DNA. This conclusion is here further substantiated by demonstrating an almost complete absence of T-DNA-genome junctions in DNA isolated from root-transformed Pol θ deficient plants: instead of finding a few hundred T-DNA integrations, we obtained only few cases in pools of teb calli (Fig. 1f). To exclude potential methodological distortions, e.g. resulting from PCR steps within TRANSGUIDE, we also performed AMT competition experiments: we mixed DNA from wild type and teb that were transformed with nearly identical yet bar-coded T-DNA constructs and attributed T-DNA junctions to the appropriate genotype afterwards. These internally controlled experiments corroborate our finding that genomic T-DNA capture is Pol θ dependent (Fig. 1f, Supplementary Data 3). Of note, while the almost complete absence of T-DNA junctions in teb material unequivocally demonstrates that TRANSGUIDE outcomes for wild type plants represent bona fide biology, we cannot conclude that the residual T-DNA-genome junctions found in teb samples represent completed T-DNA integration, as opposed to e.g. one-sided capture, in vivo recombination, or PCR artifacts. Interestingly, however, and in agreement with a recent report16, we find the molecules representing genomic capture in teb to be almost exclusively RB-to-genome junctions (Fig. 1f). Together with the notion of a reduced signature of Pol θ activity at RB-genome junctions in Pol θ proficient plants, as compared to LB-genome junctions, this result may point to another, redundant, molecular mechanism capable of attaching the 5’ end of T-DNA to the plant genome.
The obvious candidate for end joining activity other than TMEJ is canonical non-homologous end joining (cNHEJ), another pathway to repair genomic DNA breaks. Previous analysis of AMT in cNHEJ deficient Arabidopsis led to conflicting results: whereas some labs reported reduced T-DNA integration17-20, others found no effects21-23 or even elevated frequencies23,24. We investigated a potential involvement of cNHEJ in T-DNA capture by monitoring shoot development and performing TRANSGUIDE upon root transformation of cNHEJ deficient ku70 and lig4 Arabidopsis mutants. We a reduced number of shoots in cNHEJ deficient plants (Fig. 2a +2b), arguing that NHEJ action affects stable transformation but is not essential. TRANSGUIDE of calli subsequently revealed a profound effect on the composition of T-DNA-genome junctions, specifically at the RB side (Fig. 2c-e): whereas LB-genome junctions found in ku70 and lig4 mutant roots are indistinguishable from those found in wild type, RB-genome junctions isolated from NHEJ mutant plants were characterised by an increased degree of microhomology (median of 3 bp in ku70 and lig4, versus 1 bp in wild type). In fact, when plotted for the degree of microhomology, the distribution of RB-genome junctions in NHEJ mutant conditions is similar to that of the LB-genome junction, in both NHEJ deficient and proficient contexts (Supplementary Fig. 3). This increased usage of microhomology is accompanied by increased loss of T-DNA sequence at the RB end, as well as an increased percentage of junctions containing fillers (Supplementary Fig. 3), which were of similar length as those observed in wildtype (Fig. 2e). We conclude that capture of the T-DNA 3’ end critically depends on intrinsically mutagenic TMEJ, whereas the 5’ end can be attached to the genome via two redundant activities, i.e. TMEJ and cNHEJ.
The identification of two end joining pathways capable of attaching the T-DNA 5’ end to the plant genome stirs the question: which enzymatic activity removes the bacterial VirD2 protein that is covalently bound to the outermost 5’ nucleotide of T-DNA? Although the sequence of events leading to completed T-DNA integration is unknown, one can envisage a scenario where Pol θ-mediated genomic capture of the T-DNA 3’ end leads, simply by DNA synthesis using the T-DNA as a template, to conversion of the single stranded T-DNA into dsDNA (see Fig. 1a). The resulting structure would have a striking resemblance to DSB ends that occur during meiotic recombination (by SPO11), or follow from some types of chemotherapy (TOP2 poisons), both of which have proteins covalently attached to their 5' termini25,26. Removal of these end-blocking proteins is a prerequisite to DSB repair and one demonstrated mechanism for their removal involves MRE11-catalyzed nicking of the protein-linked strand distal to the DSB terminus27. Arabidopsis MRE11 null mutant plants are sterile, hampering their analysis28, however, an mre11 hypomorphic allele (mre11-2) exists, which in a homozygous state confers sensitivity towards DNA damaging agents yet supports plant development29. We inspected T-DNA integration in this mutant background and found the RB-genome junction spectrum altered but inversely to what was observed in cNHEJ mutants: instead of a more profound TMEJ signature we observed a clear depletion of TMEJ hallmarks in mre11-2: less microhomology at the junctions and reduced filler size (Fig. 2f + 2g). We conclude that MRE11 functionality is needed for Pol θ-mediated capture of the T-DNA 5’ end – when impaired, only cNHEJ can perform this function. Interestingly, we find a wild-type profile for LB-genome junctions in mre11 mutant plants (Supplementary Fig. 4), which could either mean that MRE11 is not needed to process genomic breaks for capturing a T-DNA, or that the hypomorphic mre11-2 allele encodes a protein still capable of this activity. One prediction that follows from our genetic analyses is that while single cNHEJ and mre11 mutant plants are proficient for AMT, double mutants may not be. This is indeed what we observe: whereas 30 - 60 % of calli derived from AMT-treated ku70, ku80 and mre11 mutant plants form shoots on selective medium (which we use as a proxy for stable T-DNA integration), we find none in ku70 mre11 and ku80 mre11 double mutant plants (Fig. 2h + 2i, Supplementary Fig. 5). Corroborating the absence of shoots, we also found a dramatic reduction in the number of junctions in mre11 ku70 , and (to a somewhat lesser extent) in mre11 lig4 calli using TRANSGUIDE competition experiments (Supplementary Fig. 6). Expression of a T-DNA encoded β-glucuronidase (GUS) marker demonstrates that the absence of T-DNA integration in the double mutants is not caused by impaired T-DNA transfer (Supplementary Fig. 7).
The notion of cNHEJ being proficient in attaching the 5’ end of the T-DNA to the genome when MRE11 is impaired argues for another activity able to remove VirD2. The fact that most RB-genome junctions are without loss of the T-DNA’s outermost 5’ nucleotides suggests the action of an enzyme able to cleave the phosphotyrosyl bond between VirD2 and the 5′ phosphate of the DNA, as such generating a ligatable end that can be used by cNHEJ. Previous work in a variety of biological systems has identified the tyrosyl-DNA phosphodiesterase 2 (TDP2) to possess such biochemical activity30, hence we next assayed Arabidopsis plants deficient for the orthologous protein. Root tissue from such tdp2 mutant plants was efficiently transformed by Agrobacterium as visualized by shoot formation from selected calli, demonstrating that TDP2 is not essential for T-DNA integration (Fig. 2i + 2j). However, similar to mutations in cNHEJ, also TDP2 deficiency alters the junctional spectrum, specifically of RB-genome junctions, which shifts towards a typical TMEJ profile (Fig. 2k, Supplementary Fig. 8). This outcome is consistent with a model where TDP2 acts to facilitate cNHEJ and in line with this interpretation, we find that AMT is severely impaired in mre11 tdp2 double mutant plants (Fig 2i + 2j, Supplementary Fig. 5).
We next reasoned that mutant backgrounds that have impaired T-DNA integration because of an inability to capture the 5’ end would be proficient for AMT in situations where 3’ attachment of a T-DNA is sufficient to produce cells that stably transmit T-DNA. Such T-DNAs have been previously created: T-DNAs that at their 5’ side contain so-called telomere repeat arrays (TRAs), being long stretches of sequence exclusively consisting of (TTTAGGG)n, are able to trigger the formation of new telomeres following genomic capture at their 3’ end31 (see Fig. 3a for a schematic representation). Two types of outcomes are found upon AMT of TRA-containing T-DNAs i.e. type I: canonical T-DNA integration at a random position in the genome, and type II: telomere formation-dependent integration, which goes together with loss of DNA positioned between the new and former telomere31. Likely because of provoking haplo-insufficiency (providing counter-selection for viability) type II integrations are preferentially found proximal to chromosomal ends (within ~2.5 mb) in full grown plants. We next performed AMT experiments using TRA-containing T-DNA (in parallel to control T-DNAs) in the aforementioned genetic backgrounds. A lig4 mutant background was used to assay cNHEJ deficiency as Ku is involved in maintaining telomere homeostasis and also strongly affects de novo telomere formation31-33. In agreement with cNHEJ being required for AMT in plants with disturbed MRE11 function we found profoundly reduced shoot formation in lig-4 mre11 mutant plants transformed with control T-DNA, although not to the same extent as observed for ku70 mre11 and ku80 mre11, which failed to produce shoots altogether (Fig. 3b + 3c). However, successful AMT with a telomere-forming T-DNA construct did not require functional cNHEJ in the mre11-2 mutant background (Fig. 3b + 3d), supporting the conclusion that cNHEJ action is specific to genomic attachment of the 5’ end of T-DNAs. In agreement with the prediction that these integrations are predominantly of type II, we found upon inspection by TRANSGUIDE a profound overrepresentation of LB junctions mapping near the ends of chromosomes (Fig. 3e, Supplementary Fig. 9). The finding that AMT was reduced for mre11 tdp2 mutant roots even with TRA-containing T-DNA, yet not in the respective single mutants (Fig. 3b +3d), argues that 5’ covalently bound VirD2 is also a blocking entity to de novo telomere formation.
Following our previous elucidation of how, during AMT, the 3’ end of a T-DNA molecule is attached to the plant genome, we have here identified the mechanisms by which the 5’ end can be attached. In contrast to T-DNA’s 3’ end, which because of its chemical composition (i.e. a 3’ hydroxyl at the terminus of a ssDNA molecule) is an ideal substrate for TMEJ, the structure of the 5’ end needs additional processing to create a ligatable end. Our data suggests that MRE11 acts to liberate the 5’ end to facilitate TMEJ, whereas TDP2 acts to allow genomic attachment via cNHEJ.
Given the biochemical properties of both MRE11 and TDP2, i.e. acting on dsDNA, we consider it likely that single-stranded T-DNA molecules are first converted to a double-stranded configuration prior to 5’ attachment. One potential mechanism for such conversion is genomic capture of the T-DNA 3’ end followed by DNA synthesis using the genomic end as a primer. In this way a new “extended” DSB end is created (see Fig. 3f) in which the VirD2 protein blocks 5’ to 3’ resection. Such a structure is conceptually similar to a meiotic SPO11-bound DSB-end or to a stalled TOP2 cleavage complex; substrates that for protein removal to facilitate repair depend either on MRE11 or on TDP2. However, the observation of relatively proficient “transient” expression of T-DNA-encoded genes in plants deficient for Pol θ argues for dsDNA formation also in the absence of genomic capture. It is conceivable that free-floating T-DNA molecules can also react with each other via the identified end-joining mechanisms prior to genomic capture, a process that may underlie two yet unexplained AMT phenomena: i) extrachromosomal T-circles34,35, and ii) T-DNA conglomerates that were recently found to make up a large proportion of AMT outcomes36,37.
The observation of cNHEJ-mediated attachment of T-DNA 5’ ends also in Pol θ proficient cells reveals that a proportion of the integrations have used both pathways, i.e. cNHEJ for 5’ and TMEJ for 3’ attachment, as was previously hypothesized38. This finding may explain many seemingly contradictory observations in mutant analysis that has confounded AMT research for several decades: the usage of cNHEJ over MRE11-stimulated TMEJ to capture the 5’ end may be context dependent with respect to the AMT protocol, the reagents used, and the tissue that is targeted. cNHEJ repairs DSBs in G1 and in pre-replicative DNA in S phase39, whereas recent work in mammalian cells argues for TMEJ in late-S/G2/M phases of the cell cycle40, and it is thus tempting to speculate that the cell-cycle stage of the host cell when infected may dictate pathway choice and AMT outcome. Indeed, comparing the genome-T-DNA junction signature of AMT events derived from somatic transformation with those from germline transformation reveals that TMEJ is more prominently used to attach the 5’ end of T-DNA in germ cells9,12 (Supplementary Fig. 2).
Apart from providing a mechanistic understanding, we aim to unravel the biology of (T-)DNA integration to allow for improved biotechnological strategies to develop transgenic crops. Recent work demonstrated that homology-directed gene targeting in Pol θ-deficient plants goes without undesired integration of AMT reagents41, which otherwise contaminates gene targeting in wild-type conditions. Here, we find that a combinatorial inhibition of MRE11 and cNHEJ activities, for which inhibitors are available, also precludes random integration. We envisage that an increased understanding on how exogenously provided DNA molecules interact with the genome of a host plant can help in developing precise genome-engineering approaches to benefit crop development.