Sequence analysis of DNA A + T content and curvature
The eccDNA replicon is heavily punctuated with sharp changes in A + T and G + C content, which may imply biological function (18), including replication initiation sites (19) (Fig. 1). Overall, the eccDNA replicon biased in A + T content at 66%. A motif scan of the eccDNA replicon revealed a single exact match to the Extended Autonomous Consensus Sequence (EACS, 17 bp), previously described in yeast and other eukaryotes (20) (Figs. 1 and 2). A 50 bp window surrounding the EACS sequence was characteristically high in A + T content for an origin of replication at 76% (Fig. 2). Just upstream of the EACS sequence, two additional regions with elevated A + T content were also found and are characteristic of DNA unwinding elements (DUE) (Fig. 2). DUE 1 is 43 bp with 73% A + T content and DUE 2 is 41 bp in length and 62% A + T (Fig. 2). Nucleotide comparison of DUE 1 and DUE 2 did not reveal similar sequence, except for consistent elevated A + T content and a 6 bp AATAAA motif that is in common (Fig. 2). DNA curvature modeling of a 256 bp window containing the EACS and the two predicted DUE elements (287,484–287,739) revealed DNA bending, which is characteristic of reported origins of replication (Fig. 3 and Table S1). There is consistent curvature from the beginning of DUE 1 to the end of DUE 2, with 2 sharper bends just in front of and after the EACS motif which is indicative of low helical stability (Fig. 3). Interestingly, this predicted origin of replication is contained within the eccDNA replicon gene, AP_R.00g000493, that contains a NAC domain. NACs are a large family of plant specific transcription factors whose functions include apical shoot development (21), secondary wall formation (22), and responses to abiotic/biotic pressures (23). Interestingly, both the closely related genomes of waterhemp (A. tuberculatus), and grain amaranth (A. hypochondriacus) were searched and do not contain an annotated ortholog to the NAC gene. However orthologs were found in Spinacia oleracea, Beta vulguris, and Chenipodium quinoa, which contained 2 copies. A nucleotide alignment of these sequences produced an overall pairwise identity of 63% with 756 identical sites, and an overall A + T content of 60% (Figure S1). Both of the DUE sequences contain both insertions and deletions (indels) and single nucleotide variants (SNVs) in each of the 3 species, relative to the eccDNA replicon (Figure S1). Interestingly, the EACS sequence was more conserved among the other species with seven variant positions across the 17 nucleotide consensus sequence. In addition, the other species sequences were less A + T rich when compared to the eccDNA replicon (Figure S1).
Cloning and functional verification of EACS activity in yeast
By cloning +/- 1 kb regions containing the putative origin of replication into a selectable ARS-less yeast vector, we observed dividing colonies, verifying that the eccDNA replicon ARS sequence is functional and can facilitate DNA replication in yeast (Table S1 and Figure S2). Recombinant yeast growth was much slower with a lower abundance of colonies on plates with the eccDNA replicon ARS, relative to the control ARS suggesting a possible role of cis-elements and trans-factors for efficiency in the plant (19) (Figure S2).