T-DNA integration in plants requires MRE11- or TDP2-mediated removal of the 5’ bound Agrobacterium protein VirD2


 Agrobacterium tumefaciens, a pathogenic bacterium capable of transforming plants through horizontal gene transfer, is nowadays the preferred vector for plant genetic engineering. The vehicle for transfer is the T-strand, a single-stranded DNA molecule bound by the bacterial protein VirD2, which guides T-DNA into the plants nucleus where it integrates. How VirD2 is removed from T-DNA, and which mechanism acts to attach the liberated end to the plant genome is currently unknown. Here, using newly developed technology that yields hundreds of T-DNA integrations in somatic tissue of Arabidopsis thaliana, we uncover two redundant mechanisms for the genomic capture of the T-DNA’s 5’ end. Different from capture of the 3’ end of the T-DNA, which is the exclusive action of polymerase theta-mediated end joining (TMEJ), 5’ attachment is accomplished either by TMEJ or by canonical non-homologous end joining (cNHEJ). We further find that TMEJ needs MRE11, whereas cNHEJ requires TDP2 to remove the 5’-end blocking protein VirD2. As a consequence, T-DNA integration is severely impaired in plants deficient for both MRE11 and TDP2 (or other cNHEJ factors). In support of MRE11 and cNHEJ specifically acting on the 5’ end, we demonstrate rescue of the integration defect of double-deficient plants by using T-DNAs that are capable of forming telomeres upon 3’ capture. Our study provides a mechanistic model for how Agrobacterium exploits the plant’s own DNA repair machineries to transform them.

insertions. These features are also prevalent at the junctions of T-DNA integration sites 9,12 -in plant transgenesis, templated insertions have also been described as " ller" sequences 13,14 . However, while TMEJ presents a logical model for connecting the 3' end of a T-DNA to a potentially resected genomic break (Fig. 1a), the biochemistry of capture of its 5' end has not yet been elucidated. It is currently also unknown how plant cells remove the covalently attached VirD2 from 5' T-DNA ends to allow integration.
To study the capture of T-DNA by the Arabidopsis genome, and in particular the attachment of the RB, we developed an NGS-based method, which we termed TRANSGUIDE (T-DNA random integration site genome-wide unbiased identi cation), that allows us to identify hundreds of T-DNA-genome junctions (both LB and RB) in pools of root-transformed Arabidopsis cells (Fig. 1b). We employ custom-made software to lter for high-quality, reliable outcomes and annotate individual T-DNA integration junctions with respect to potentially relevant features, such as genomic position, loss of T-DNA or genomic DNA sequences, degree of microhomology, and absence or presence of ller DNA. The outcomes of this pipeline reliably represents in vivo biology: using PCR and Sanger sequencing on the input material we validated 23 out of 24 predictions (Supplementary Data 1).
Using this technology, we obtained a collection consisting of ~2200 RB-genome junctions and ~5100 LB junctions upon transformation of the Col-0 ecotype (Supplementary Data 2). Consistent with earlier ndings 15 , these junctions are scattered across the entire genome, with the exception of the pericentromeric regions ( Supplementary Fig. 1). Arguing for a prominent role for Pol θ in integration, we found (very similar) ller DNAs to be abundantly present at both RB-and LB-genome junctions (Fig. 1c), however, the percentages were not identical: 32 % llers at RB-genome junctions versus 39 % at LBgenome junctions. Also, the degree of junctional microhomology (the median being 1 bp for RB-versus 3 bp for LB-genome junctions) and loss of terminal nucleotides was different between RB and LB (Fig. 1d,   1e and Supplementary Fig. 2). As microhomology usage and ller formation are hallmark features of Pol θ activity, these data suggest an unequal involvement of this enzyme in the attachment of the two T-DNA ends.
We previously found that Arabidopsis plants de cient for Pol θ (teb mutants) are completely recalcitrant to AMT, arguing for an essential role for Pol θ in genomic capture of T-DNA. This conclusion is here further substantiated by demonstrating an almost complete absence of T-DNA-genome junctions in DNA isolated from root-transformed Pol θ de cient plants: instead of nding a few hundred T-DNA integrations, we obtained only few cases in pools of teb calli (Fig. 1f). To exclude potential methodological distortions, e.g. resulting from PCR steps within TRANSGUIDE, we also performed AMT competition experiments: we mixed DNA from wild type and teb that were transformed with nearly identical yet bar-coded T-DNA constructs and attributed T-DNA junctions to the appropriate genotype afterwards. These internally controlled experiments corroborate our nding that genomic T-DNA capture is Pol θ dependent (Fig. 1f, Supplementary Data 3). Of note, while the almost complete absence of T-DNA junctions in teb material unequivocally demonstrates that TRANSGUIDE outcomes for wild type plants represent bona de biology, we cannot conclude that the residual T-DNA-genome junctions found in teb samples represent completed T-DNA integration, as opposed to e.g. one-sided capture, in vivo recombination, or PCR artifacts. Interestingly, however, and in agreement with a recent report 16 , we nd the molecules representing genomic capture in teb to be almost exclusively RB-to-genome junctions (Fig. 1f). Together with the notion of a reduced signature of Pol θ activity at RB-genome junctions in Pol θ pro cient plants, as compared to LB-genome junctions, this result may point to another, redundant, molecular mechanism capable of attaching the 5' end of T-DNA to the plant genome.
The obvious candidate for end joining activity other than TMEJ is canonical non-homologous end joining (cNHEJ), another pathway to repair genomic DNA breaks. Previous analysis of AMT in cNHEJ de cient Arabidopsis led to con icting results: whereas some labs reported reduced T-DNA integration 17-20 , others found no effects [21][22][23] or even elevated frequencies 23,24 . We investigated a potential involvement of cNHEJ in T-DNA capture by monitoring shoot development and performing TRANSGUIDE upon root transformation of cNHEJ de cient ku70 and lig4 Arabidopsis mutants. We a reduced number of shoots in cNHEJ de cient plants (Fig. 2a +2b), arguing that NHEJ action affects stable transformation but is not essential. TRANSGUIDE of calli subsequently revealed a profound effect on the composition of T-DNAgenome junctions, speci cally at the RB side ( Fig. 2c-e): whereas LB-genome junctions found in ku70 and lig4 mutant roots are indistinguishable from those found in wild type, RB-genome junctions isolated from NHEJ mutant plants were characterised by an increased degree of microhomology (median of 3 bp in ku70 and lig4, versus 1 bp in wild type). In fact, when plotted for the degree of microhomology, the distribution of RB-genome junctions in NHEJ mutant conditions is similar to that of the LB-genome junction, in both NHEJ de cient and pro cient contexts ( Supplementary Fig. 3). This increased usage of microhomology is accompanied by increased loss of T-DNA sequence at the RB end, as well as an increased percentage of junctions containing llers ( Supplementary Fig. 3), which were of similar length as those observed in wildtype (Fig. 2e). We conclude that capture of the T-DNA 3' end critically depends on intrinsically mutagenic TMEJ, whereas the 5' end can be attached to the genome via two redundant activities, i.e. TMEJ and cNHEJ.
The identi cation of two end joining pathways capable of attaching the T-DNA 5' end to the plant genome stirs the question: which enzymatic activity removes the bacterial VirD2 protein that is covalently bound to the outermost 5' nucleotide of T-DNA? Although the sequence of events leading to completed T-DNA integration is unknown, one can envisage a scenario where Pol θ-mediated genomic capture of the T-DNA 3' end leads, simply by DNA synthesis using the T-DNA as a template, to conversion of the single stranded T-DNA into dsDNA (see Fig. 1a). The resulting structure would have a striking resemblance to DSB ends that occur during meiotic recombination (by SPO11), or follow from some types of chemotherapy (TOP2 poisons), both of which have proteins covalently attached to their 5' termini 25,26 . Removal of these endblocking proteins is a prerequisite to DSB repair and one demonstrated mechanism for their removal involves MRE11-catalyzed nicking of the protein-linked strand distal to the DSB terminus 27 . Arabidopsis MRE11 null mutant plants are sterile, hampering their analysis 28 , however, an mre11 hypomorphic allele (mre11-2) exists, which in a homozygous state confers sensitivity towards DNA damaging agents yet supports plant development 29 . We inspected T-DNA integration in this mutant background and found the RB-genome junction spectrum altered but inversely to what was observed in cNHEJ mutants: instead of a more profound TMEJ signature we observed a clear depletion of TMEJ hallmarks in mre11-2: less microhomology at the junctions and reduced ller size ( Fig. 2f + 2g). We conclude that MRE11 functionality is needed for Pol θ-mediated capture of the T-DNA 5' end -when impaired, only cNHEJ can perform this function. Interestingly, we nd a wild-type pro le for LB-genome junctions in mre11 mutant plants ( Supplementary Fig. 4), which could either mean that MRE11 is not needed to process genomic breaks for capturing a T-DNA, or that the hypomorphic mre11-2 allele encodes a protein still capable of this activity. One prediction that follows from our genetic analyses is that while single cNHEJ and mre11 mutant plants are pro cient for AMT, double mutants may not be. This is indeed what we observe: whereas 30 -60 % of calli derived from AMT-treated ku70, ku80 and mre11 mutant plants form shoots on selective medium (which we use as a proxy for stable T-DNA integration), we nd none in ku70 mre11 and ku80 mre11 double mutant plants ( Fig. 2h + 2i, Supplementary Fig. 5). Corroborating the absence of shoots, we also found a dramatic reduction in the number of junctions in mre11 ku70 , and (to a somewhat lesser extent) in mre11 lig4 calli using TRANSGUIDE competition experiments ( Supplementary  Fig. 6). Expression of a T-DNA encoded β-glucuronidase (GUS) marker demonstrates that the absence of T-DNA integration in the double mutants is not caused by impaired T-DNA transfer ( Supplementary Fig.  7).
The notion of cNHEJ being pro cient in attaching the 5' end of the T-DNA to the genome when MRE11 is impaired argues for another activity able to remove VirD2. The fact that most RB-genome junctions are without loss of the T-DNA's outermost 5' nucleotides suggests the action of an enzyme able to cleave the phosphotyrosyl bond between VirD2 and the 5′ phosphate of the DNA, as such generating a ligatable end that can be used by cNHEJ. Previous work in a variety of biological systems has identi ed the tyrosyl-DNA phosphodiesterase 2 (TDP2) to possess such biochemical activity 30 , hence we next assayed Arabidopsis plants de cient for the orthologous protein. Root tissue from such tdp2 mutant plants was e ciently transformed by Agrobacterium as visualized by shoot formation from selected calli, demonstrating that TDP2 is not essential for T-DNA integration ( Fig. 2i + 2j). However, similar to mutations in cNHEJ, also TDP2 de ciency alters the junctional spectrum, speci cally of RB-genome junctions, which shifts towards a typical TMEJ pro le (Fig. 2k, Supplementary Fig. 8). This outcome is consistent with a model where TDP2 acts to facilitate cNHEJ and in line with this interpretation, we nd that AMT is severely impaired in mre11 tdp2 double mutant plants (Fig 2i + 2j, Supplementary Fig. 5).
We next reasoned that mutant backgrounds that have impaired T-DNA integration because of an inability to capture the 5' end would be pro cient for AMT in situations where 3' attachment of a T-DNA is su cient to produce cells that stably transmit T-DNA. Such T-DNAs have been previously created: T-DNAs that at their 5' side contain so-called telomere repeat arrays (TRAs), being long stretches of sequence exclusively consisting of (TTTAGGG) n , are able to trigger the formation of new telomeres following genomic capture at their 3' end 31 (see Fig. 3a for a schematic representation). Two types of outcomes are found upon AMT of TRA-containing T-DNAs i.e. type I: canonical T-DNA integration at a random position in the genome, and type II: telomere formation-dependent integration, which goes together with loss of DNA positioned between the new and former telomere 31 . Likely because of provoking haplo-insu ciency (providing counter-selection for viability) type II integrations are preferentially found proximal to chromosomal ends (within ~2.5 mb) in full grown plants. We next performed AMT experiments using TRA-containing T-DNA (in parallel to control T-DNAs) in the aforementioned genetic backgrounds. A lig4 mutant background was used to assay cNHEJ de ciency as Ku is involved in maintaining telomere homeostasis and also strongly affects de novo telomere formation [31][32][33] . In agreement with cNHEJ being required for AMT in plants with disturbed MRE11 function we found profoundly reduced shoot formation in lig-4 mre11 mutant plants transformed with control T-DNA, although not to the same extent as observed for ku70 mre11 and ku80 mre11, which failed to produce shoots altogether (Fig. 3b + 3c). However, successful AMT with a telomere-forming T-DNA construct did not require functional cNHEJ in the mre11-2 mutant background (Fig. 3b + 3d), supporting the conclusion that cNHEJ action is speci c to genomic attachment of the 5' end of T-DNAs. In agreement with the prediction that these integrations are predominantly of type II, we found upon inspection by TRANSGUIDE a profound overrepresentation of LB junctions mapping near the ends of chromosomes (Fig. 3e, Supplementary Fig. 9). The nding that AMT was reduced for mre11 tdp2 mutant roots even with TRA-containing T-DNA, yet not in the respective single mutants (Fig. 3b +3d), argues that 5' covalently bound VirD2 is also a blocking entity to de novo telomere formation.
Following our previous elucidation of how, during AMT, the 3' end of a T-DNA molecule is attached to the plant genome, we have here identi ed the mechanisms by which the 5' end can be attached. In contrast to T-DNA's 3' end, which because of its chemical composition (i.e. a 3' hydroxyl at the terminus of a ssDNA molecule) is an ideal substrate for TMEJ, the structure of the 5' end needs additional processing to create a ligatable end. Our data suggests that MRE11 acts to liberate the 5' end to facilitate TMEJ, whereas TDP2 acts to allow genomic attachment via cNHEJ.
Given the biochemical properties of both MRE11 and TDP2, i.e. acting on dsDNA, we consider it likely that single-stranded T-DNA molecules are rst converted to a double-stranded con guration prior to 5' attachment. One potential mechanism for such conversion is genomic capture of the T-DNA 3' end followed by DNA synthesis using the genomic end as a primer. In this way a new "extended" DSB end is created (see Fig. 3f) in which the VirD2 protein blocks 5' to 3' resection. Such a structure is conceptually similar to a meiotic SPO11-bound DSB-end or to a stalled TOP2 cleavage complex; substrates that for protein removal to facilitate repair depend either on MRE11 or on TDP2. However, the observation of relatively pro cient "transient" expression of T-DNA-encoded genes in plants de cient for Pol θ argues for dsDNA formation also in the absence of genomic capture. It is conceivable that free-oating T-DNA molecules can also react with each other via the identi ed end-joining mechanisms prior to genomic capture, a process that may underlie two yet unexplained AMT phenomena: i) extrachromosomal Tcircles 34,35 , and ii) T-DNA conglomerates that were recently found to make up a large proportion of AMT outcomes 36,37 .
The observation of cNHEJ-mediated attachment of T-DNA 5' ends also in Pol θ pro cient cells reveals that a proportion of the integrations have used both pathways, i.e. cNHEJ for 5' and TMEJ for 3' attachment, as was previously hypothesized 38 . This nding may explain many seemingly contradictory observations in mutant analysis that has confounded AMT research for several decades: the usage of cNHEJ over MRE11-stimulated TMEJ to capture the 5' end may be context dependent with respect to the AMT protocol, the reagents used, and the tissue that is targeted. cNHEJ repairs DSBs in G1 and in prereplicative DNA in S phase 39 , whereas recent work in mammalian cells argues for TMEJ in late-S/G2/M phases of the cell cycle 40 , and it is thus tempting to speculate that the cell-cycle stage of the host cell when infected may dictate pathway choice and AMT outcome. Indeed, comparing the genome-T-DNA junction signature of AMT events derived from somatic transformation with those from germline transformation reveals that TMEJ is more prominently used to attach the 5' end of T-DNA in germ cells 9,12 ( Supplementary Fig. 2).
Apart from providing a mechanistic understanding, we aim to unravel the biology of (T-)DNA integration to allow for improved biotechnological strategies to develop transgenic crops. Recent work demonstrated that homology-directed gene targeting in Pol θ-de cient plants goes without undesired integration of AMT reagents 41 , which otherwise contaminates gene targeting in wild-type conditions. Here, we nd that a combinatorial inhibition of MRE11 and cNHEJ activities, for which inhibitors are available, also precludes random integration. We envisage that an increased understanding on how exogenously provided DNA molecules interact with the genome of a host plant can help in developing precise genome-engineering approaches to bene t crop development.
Root transformation for TRANSGUIDE and shoot formation assay. Root transformations were performed as described previously 44 , using disarmed Agrobacterium tumefaciens strain AGL1 45 harbouring either pUBC (pUBC-YFP-Dest 46 with ccdB cassette removed), pUBC-2 (same as pUBC-YFP, but the sequence between secondary TRANSGUIDE primer and LB or RB nick was replaced by a semi-random 56 bp sequence), pWY82 47 , or pCAS9 (pDE-CAS9 48,49 with gRNA against PPO1; AT4G01690), or pCAMBIA3301 (Cambia). After co-culture root explants were transferred to shoot induction medium with vancomycin and timentin to kill off remaining bacteria, and phosphinotricin to select for transformed plant cells. After 3 weeks of selection either calli were harvested for TRANSGUIDE analysis (20 per sample), or were transferred to fresh selection medium for assaying shoot formation. After a total of 6 weeks of selection, plates were photographed and calli were scored for shoot formation (without prior knowledge of the genotype); any leaf-like protrusions from callus tissue was considered shoot tissue.
Junction enrichment and sequencing. Enrichment of T-DNA-genome junctions was similar to the GUIDEseq procedure 50 . DNA extraction was performed with the Wizard genomic DNA isolation kit (Promega; Madison, WI, USA). Sonication was performed with a Bioruptor (Diagenode; Liège, Belgium) for 6 cycles (30 seconds on, 30 seconds off) on 'high' intensity. End repair, A-tailing, and Y-adapter ligation was performed with the NEBNext ultra II kit (New England Biolabs; Ipswich, MA, USA), and the library ampli cation was performed with Phusion polymerase (Thermo Scienti c; Waltham, MA, USA). See Supplementary Table 1 for the primers that were used. Sequencing was performed on the Illumina MiSeq (300 bp paired end, v3 chemistry, at LGTC; Leiden, The Netherlands) and on the Illumina NovaSeq 6000 (150 bp paired end, v1.5 chemistry, at GenomeScan BV; Leiden, The Netherlands). Samples were demultiplexed using bcl2fastq2 conversion software v2.2 (Illumina; San Diego, CA, USA).
Junction calling. Reads were clipped to 150 bp and adapters removed (Trimmomatic 51 ). Reads with identical molecular identi er (adapter UMI + 6 bp from forward read + 6 bp from reverse read) were combined into consensus sequences using custom software. Mapping was done with BWA-mem 52 , using the default settings. Reads with identical unique molecular identi er were combined into consensus sequences, and any remaining optical duplicates were excluded from the analysis. Read pairs were grouped into junctions based on their genomic positions. Second-in-pair reads were required to start with the secondary T-DNA primer and end with a genomic sequence. These reads were used to determine the exact genomic position, as well as ller and homology sequences and deletion length. First-in-pair reads (anchors) were counted for each junction, and indicated the number of fragments present in the sample that support the junction. For each junction we generated a consensus sequence and calculated the percentage of reads exactly matching the consensus (consensus match). The junctions were then ltered i) for duplicate positions between samples (barcode hopping was accounted for), ii) for number of anchors (at least 3), and iii) for the consensus match (at least 75%). For most analyses (except for junction number comparison) we applied an additional lter for fair comparison, because distances between primer and border were not constant: homology ≤ 57 bp, ller ≤ 22 bp, end deletion ≤ 26 bp.
Competition assay. Roots were transformed with either pUBC (barcode 1) or pUBC-2 (barcode 2). 10 calli were collected per sample, and equal DNA amounts of 2 samples with different barcodes were combined prior to junction enrichment. During junction calling the reads were assigned to the sample of origin using the barcode. Junctions with duplicate positions within a sample pair were removed.
Junction validation. Using the same DNA samples as used for TRANSGUIDE, we performed up to two PCRs (nested) followed by Sanger sequencing (Macrogen Europe BV; Amsterdam, The Netherlands) to determine the correctness of the called junctions. Junctions were selected semi-randomly, making sure different types of junctions ( ller/ non-ller, intact/non-intact, etc) were included. See Supplementary Table 1 for the primers that were used.
GUS staining. After co-cultivation, root explants were stained overnight in phosphate buffer (pH 7.3) containing 1 mM K 3 Fe(CN) 6 ,1 mM K 4 Fe(CN) 6 , 10 mM Na 2 EDTA, 0.1% SDS, 0.1% Triton X-100 and 2 mM X-gluc, and destained using 70% ethanol.  integration success) after transformation with pCAMBIA3301. Error bars indicate the standard error of the mean. c -g, Overlapping histograms showing the frequency of junctions with the indicated degree of micro-homology (c, d, f), or ller presence (e, g) for wt (yellow) or mutant (shades of blue for cNHEJ mutants ku70 and lig4, and light red for mre11) junctions. The medians (dashed lines), the number of observations (n), and shifts in the mutant distribution relative to wt (s) are indicated. h, Photographs of representative calli of various genotypes (wt, mre11, mre11 ku70) after transformation with pCAS9, grown on either selective or non-selective medium. i, Bar charts showing the average percentage of calli with shoot tissue after transformation with pCAMBIA3301. Error bars indicate the standard error of the mean. j, Photographs of representative calli of tdp2 and mre11 tdp2 mutants with two different genetic backgrounds (Ws and a Col-0 / Ws mix) after transformation with pCAMBIA3301, grown on either selective or non-selective medium. k, Overlapping histograms showing the frequency of junctions with the indicated degree of micro-homology for wt (yellow) or tdp2 mutant (cyan). One-sided Student's t-tests were performed to test for signi cant reductions in T-DNA integration e ciency (b, i); ns: p>=0.05, *: p<0.05, **: p<0.01, ***: p<0.001, na: not enough observations. Mutants were compared to the wt of the same genetic background, except for mutants with mixed background, which were compared to the Col-0 wt. One-sided Wilcoxon rank-sum tests were performed to nd the direction and signi cance level of the shifts in homology and ller distributions (c-g, k); ns: p≥0.05, *: p<0.05, **: p<0.01, ***: p<0.001. Scale bars, 1 cm in a, h, j. MRE11 and TDP2 are required for 5' attachment of T-DNA. a, Schematic to illustrate that subsequent to genomic capture of its 3' end (1.), 5' telomere repeat array (TRA)-containing T-DNAs can be resolved either via de novo telomere formation (2a.) or via regular RB capture (2b.). In case of telomere formation, loss of a part of the broken chromosome may ensue. b, Photographs of representative calli of various genotypes (wt, lig4, tdp2, mre11, mre11 lig4, mre11 tdp2) with three different genetic backgrounds (Col-0, Ws, or a mix of Col-0 and Ws) after transformation without TRA (pUBC, -TRA) or with TRA (pWY82, + TRA) grown on selective medium. Scale bar, 1 cm. c, d, Bar charts showing the average percentage of calli with shoot tissue (a measure of T-DNA integration success) after transformation with pUBC (c) or pWY82 (d). Error bars indicate the standard error of the mean. One-sided Student's t-tests were performed to test whether mutants had decreased integration compared to the corresponding wt (in the case of Col-0/Ws the Col-0 wt was used); ns: p≥0.05, *: p<0.05, **: p<0.01, ***: p<0.001. Mutants were compared with wt of the same genetic background, with the exception of mre11 lig4 and mre11 tdp2, which were compared to the wt of the Col-0 background. Averages are based on different independent experiments; numbers above the bars indicate the total numbers of calli that were examined. e, Density plots showing the relative frequency of LB junctions after transformation with pWY82 (+ TRA) or pUBC (-TRA) along all chromosome arms, comparing wt (yellow) and mutants (other colors). Mutants were compared to wt of the same genetic background, with the exception of mre11 lig4 and mre11 tdp2, which were compared to the wt of the Col-0 background. 0 % indicates centromeric position and 100 % telomeric; n indicates the number of mutant