Sequence analysis of DNA A+T content and curvature
The eccDNA replicon is heavily punctuated with sharp changes in A+T and G+C content, which may imply biological function (18), including replication initiation sites (19) (Figure 1). Overall, the eccDNA replicon biased in A+T content at 66%. A motif scan of the eccDNA replicon revealed a single exact match to the Extended Autonomous Consensus Sequence (EACS, 17bp), previously described in yeast and other eukaryotes (20) (Figure 1 and Figure S1). A 50bp window surrounding the EACS sequence was characteristically high in A+T content for an origin of replication at 76% (Figure S1). Just upstream of the EACS sequence, two additional regions with elevated A+T content were also found and are characteristic of DNA unwinding elements (DUE) (Figure S1). DUE 1 is 43bp with 73% A+T content and DUE 2 is 41bp in length and 62% A+T (Figure S1). Nucleotide comparison of DUE 1 and DUE 2 did not reveal similar sequence, except for consistent elevated A+T content and a 6bp AATAAA motif that is in common (Figure S1). DNA curvature modeling of a 256bp window containing the EACS and the two predicted DUE elements (287,484 – 287,739) revealed DNA bending, which is characteristic of reported origins of replication (Figure 3 and Table S1). There is consistent curvature from the beginning of DUE 1 to the end of DUE 2, with 2 sharper bends just in front of and after the EACS motif which is indicative of low helical stability (Figure 2). Interestingly, this predicted origin of replication is contained within the eccDNA replicon gene, AP_R.00g000493, that contains a NAC domain. NACs are a large family of plant specific transcription factors whose functions include apical shoot development (21), secondary wall formation (22), and responses to abiotic/biotic pressures (23). Interestingly, both the closely related genomes of waterhemp (A. tuberculatus), and grain amaranth (A. hypochondriacus) were searched and do not contain an annotated ortholog to the NAC gene. However orthologs were found in Spinacia oleracea, Beta vulguris, and Chenipodium quinoa, which contained 2 copies. A nucleotide alignment of these sequences produced an overall pairwise identity of 63% with 756 identical sites, and an overall A+T content of 60% (Figure S2). Both of the DUE sequences contain both insertions and deletions (indels) and single nucleotide variants (SNVs) in each of the 3 species, relative to the eccDNA replicon (Figure S2). Interestingly, the EACS sequence was more conserved among the other species with seven variant positions across the 17 nucleotide consensus sequence. In addition, the other species sequences were less A+T rich when compared to the eccDNA replicon (Figure S2).
Cloning and functional verification of EACS activity in yeast
By cloning +/- 1kb regions containing the putative origin of replication into a selectable ARS-less yeast vector, we observed dividing colonies, verifying that the eccDNA replicon ARS sequence is functional and can facilitate DNA replication in yeast (Table S1, Figure 3, and Figure S3). Recombinant yeast growth was much slower with a lower abundance of colonies on plates with the eccDNA replicon ARS, relative to the control ARS suggesting a possible role of cis-elements and trans-factors for efficiency in the plant (19) (Figure S3).
Transformation efficiencies for eccDNA ARS containing plasmids are approximately 300-400 CFU/μg, as compared to 2.15 x 106 CFU/μg for pRS315 (Figure 3). Transformation efficiencies between plasmids containing the eccDNA ARS are not statistically significant from each other, but are statistically different from the pRS305 and pRS315 plasmids, as determined by a two-tailed t-test with a 95% confidence level (Figure S4). To validate plasmid retention and stability, cells were passaged on selective plates three times. Passaged cells were used in a colony PCR to validate retention of the ARS sequence (Figure 3). Demonstration of bands in the cells transformed with plasmids containing CS-ARS1, and not in the wild type or cells transformed with pRS315, indicates dividing plasmids are due to the eccDNA ARS1.