Homoeologous evolution of the allotetraploid genome of Poa annua L.

doi:10.21203/rs.3.rs-2729084/v1

Download PDF

Research Article

Homoeologous evolution of the allotetraploid genome of Poa annua L.

https://doi.org/10.21203/rs.3.rs-2729084/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 26 Jun, 2023

Read the published version in BMC Genomics →

You are reading this latest preprint version

Poa annua (annual bluegrass) is an allotetraploid grass and one of the most widely dispersed plant species on earth. Here, we report the chromosome-scale genome assemblies of P. annua’s diploid progenitors, Poa infirma and Poa supina. We find that the diploids diverged from their common ancestor 5.5–6.3 million years ago and hybridized to form P. annua ≤ 50,000 years ago. The diploid genomes are similar in chromosome structure and most notably distinguished by the divergent evolutionary histories of their transposable elements, leading to a 1.7⋅ difference in genome size. We show that P. annua’s smaller (B) subgenome is preferentially accumulating genes and that its genes are more highly expressed. Whole-genome resequencing of several additional P. annua ecotypes revealed large-scale chromosomal rearrangements characterized by extensive TE-downsizing and evidence supporting the Genome Balance Hypothesis. The findings and genomic resources presented here will enable the development of homoeolog-specific markers for accelerated weed science and turfgrass breeding.

Poa annua L. is an allotetraploid (2n = 4x = 28) grass and one of the most ubiquitous plant species on earth. It has seeding populations on all seven continents and grows in 96% of cities around the world (Fig. 1b; Hemp, 2008; Aronson et al., 2014; Chwedorzewska et al., 2015). Poa annua has remarkable phenotypic variability with tens, if not hundreds, of subspecies, varieties, and morphologies recorded, leading some authors to characterize it as a species aggregate rather than a bonafide species (Gibeault, 1971; Vargas & Turgeon, 2003; Nosov et al., 2019). Poa annua is a problematic weed in urban, agricultural, and turfgrass ecosystems, partially due to its evolved resistance to more than 10 different herbicide modes of action (Heap, 2021). Despite its unfavorable reputation, P. annua has developed an agronomic niche on golf course putting greens where it often invades and outcompetes turfgrass species that were bred to thrive under the intensive management conditions of 2-3mm mowing height (Lush, 1988). Some golf course superintendents view P. annua as an elite putting surface and allow it to slowly envelope the entire putting green. In fact, seven of the top ten golf courses in the United States utilize P. annua putting greens (top100golfcourses.com).

Tetraploid P. annua originated from an interspecific cross between diploid species, Poa infirma Kunth and Poa supina Schrader (Fig. 1a; Nannfeld, 1937; Soreng et al., 2010; Mao and Huff, 2012; Chen et al., 2016). The parental diploids can hybridize at low frequencies (0.20%) and the offspring are amphihaploid (i.e., plants that contain a single set of unpaired chromosomes for each subgenome; Darmency and Gasquez, 1997). Amphihaploid plants (2n = 14) are sterile at first but have been observed to spontaneously transition to fertile allotetraploids (2n = 28; Tutin, 1957), suggesting that P. annua’s path to polyploidy may have involved mitotic (somatic doubling) rather than meiotic error (unreduced gametes). Interestingly, amphihaploids are frequently found on golf course putting greens (Hovin, 1958), suggesting that polyploid P. annua can return to amphihaploidy (Ravi and Chan, 2010; Dunwell, 2010) in certain environmental conditions and may oscillate between the two cytotypes.

The parental diploids of P. annua are restricted to their niches where P. infirma thrives in arid Mediterranean climates and P. supina prefers the damp alpine regions of central Europe. The merger of their genomes into a single polyploid nucleus has led to exceptional transgressive versatility and niche expansion and likely occurred during the Quaternary glaciation (0–2.4 million years ago) when the diploids would have been forced out of their preferred habitats and may have come into close contact (Fig. 1b; Tutin, 1957). Allopolyploids like P. annua are typically subject to meiotic irregularities between parental chromosomes (homoeologs) and genomic instability as the cell works to reconcile the sudden presence of an additional genome (Stebbins, 1947; McClintock, 1984). Most allopolyploids eventually establish a ‘dominant’ subgenome, with greater gene content and higher expression of homoeologs as the species returns to a diploid-like state (diploidization; Thomas et al., 2006; Schnable et al., 2011; Wang et al., 2011; Edger et al., 2019). Here, we leverage the genomes of the diploid progenitors to accurately assign P. annua homoeologs to their appropriate parental origin. Using this methodology, we unravel P. annua’s polyploid evolutionary history with the goal to better understand its phenotypic plasticity and provide a valuable genetic resource for turfgrass breeders and weed scientists.

Genome assembly and annotation

The P. infirma (2n = 2x = 14) and P. supina (2n = 2x = 14) genomes each assembled into seven pseudomolecules that represented 96% of the estimated genome sizes by k-mer analysis and contained > 97% of the 1,614-core conserved orthologs in the Embryophyta OrthoDB (v10), supporting high-quality chromosome-level genome assemblies for both species (see methods; Supplementary Table 1; Supplementary Fig. 1). The chromosome-level assemblies represent the collapsed haploid (unphased) genomes for each species (n = 7). Chromosomes were named according to a pre-established nomenclature presented by Robbins et al., (2022), where P. infirma contributes the ‘A’ subgenome to P. annua and P. supina contributes the ‘B’ subgenome. A prefix designates the species of origin, such that P. infirma chromosomes are ‘PiA’, P. supina’s are ‘PsB’, and P. annua’s are either ‘PaA’ or ‘PaB’ (Supplementary Fig. 2a).

Repetitive DNA and transposable elements (TEs) were annotated using custom built repeat libraries and included class I retrotransposons as well as class II DNA transposons. Genes were predicted using the BRAKER2 pipeline on the repeat-masked genome assemblies (Brůna et a., 2021). Full-length IsoSeq transcripts from each species was incorporated with protein evidence from Arabidopsis and related grasses for ab initio gene prediction. In addition, we identified 14,743 long noncoding RNAs (lncRNAs) in the P. infirma genome and 13,963 in the P. supina genome. Poa annua contained approximately the additive number of lncRNAs as its diploid parents with fewer lncRNAs in the A (infirma) subgenome (14,394) and more lncRNAs in the B (supina) subgenome (15,057).

Genome Characteristics And Synteny

The P. infirma genome is 1,125 Mb in length, which makes it 489 Mb (1.77⋅) larger than the P. supina genome (636 Mb), despite being closely related species and sister taxa within the section Ochlopoa. Most (76%) of the excess in genome size is due to orthologous chromosomes 1 and 2 being a combined 374 Mb larger in the P. infirma genome (Fig. 2ab; Supplementary Fig. 3). The subgenomes of P. annua are similar in composition to the genomes of the diploid progenitors, with the A subgenome (1,116 Mb) being 1% shorter than the P. infirma genome and the B subgenome (662 Mb) being 4% larger than the P. supina genome (Supplementary Fig. 2c). It’s a similar story at the gene level, where the A (infirma) subgenome had 6% fewer genes than P. infirma (37,123 and 39,420, respectively), and the B (supina) subgenome had 4% more genes than P. supina (39,536 and 37,935 respectively). Overall, the P. annua reference genome is 99% of the length of its progenitor genomes and contains 99% of its parental genes, most of which (95%) are represented as colinear syntenic blocks (Fig. 2c; Supplementary Fig. 2c; Supplementary Fig. 4).

Poa infirma and P. supina chromosomes were 81% and 65% repetitive, respectively. These percentages amount to 489 Mb (1.77⋅) more repetitive DNA in P. infirma than P. supina, suggesting that TEs have played an outsized role in the disparate genome sizes between the two diploids, particularly on chromosomes 1 and 2. The majority of annotated repetitive sequences were classified as Gypsy and Copia long-terminal-repeat (LTR) retrotransposons (598 Mb (53%) of the P. infirma genome and 241 Mb (38%) of the P. supina genome). The sequence length of the non-repetitive portions in each diploid is very similar, totaling 211 Mb in the P. infirma genome and 225 Mb in the P. supina genome. The subgenomes of P. annua have slightly less repetitive DNA than their corresponding diploid progenitor genomes, with 7% less repetitive DNA in the A (infirma) subgenome and 2% less in the B (supina) subgenome.

Nucleotide Divergence And Molecular Dating

Genomic similarity can be assessed at the nucleotide level using measures of average nucleotide identity (ANI) and is a useful indicator of genetic divergence between sequence alignments. The ANI between P. infirma (A) and P. supina (B) orthologous chromosomes is 95%. The ANI when comparing P. annua chromosomes to their corresponding parental sequences was 98% (i.e., PaA to PiA alignments and PaB to PsB; Supplementary Fig. 2b). To estimate divergence and hybridization times, we calculated the synonymous substitutions rate (Ks) between homologous and homoeologous gene pairs. Gene pairs between P. infirma (A) and P. supina (B) have a peak Ks = 0.065 and was used to estimate the date the two species diverged from their common ancestor. Ks between P. annua’s A subgenome and P. infirma (and also P. annua’s B subgenome and P. supina) was very close to zero and used to estimate the date that the two progenitor diploids hybridized to form P. annua. With a Poaceae mutational rate of 5.76174 ⋅ 10^− 9 substitutions per synonymous site per year (De La Torre et al., 2017), our Ks values suggest that the diploids diverged from their common ancestor 5.5–6.3 million years ago (Mya) and hybridized to form polyploid P. annua 0–600,000 years ago (Supplementary Fig. 5). The most recent of the ancestral whole-genome duplication (WGD) events in the Poaceae is rho (ρ) and pre-dates the divergence of the BOP (C3 grasses) and PACMAD (C4 grasses) grasses (McKain et al., 2016). Syntenic gene pairs from rho have a Ks = 1 in our Poa species and corresponds to a date of 87 Mya, which largely overlaps with the reported rho WGD date of 85–97 Mya and helps to corroborate our methodology (Supplementary Fig. 6; Clark and Donoghue et al., 2018).

To further evaluate the date of hybridization and explore the 1.7-fold difference in genome size between A and B, we examined the mutation rates between pairs of LTRs. LTRs multiply by escaping host silencing and ‘burst’ into activity for a short time before being re-silenced (McCue et al., 2013; Sigman and Slotkin, 2016). Repeats of an LTR are identical when inserted, owing to their copy-and-paste mode of transposition (vonHoldt et al., 2012). Mutations between an ancestral LTR and its transposed derivative is a reflection of its evolutionary divergence. Our analysis suggests that the A genome experienced a burst in proliferation of LTRs that climaxed ~ 340,000 years ago, while bursts of LTRs in the B genome occurred more recently, with peak rate of transposition dating back to ~ 50,000 years ago (Fig. 3a). Because the density of LTR insertion times in P. infirma and P. supina closely mirror that of P. annua’s A and B subgenomes, it is likely that those bursts occurred during the speciation of the diploids and prior to the hybridization event that formed P. annua. Thus, we suggest a narrower timeframe for P. annua hybridization at 0–50,000 years ago. We expect that the 489 Mb difference in TE content and genome size between P. infirma and P. supina is greatly impacted by the two species varying abilities to silence retrotransposons.

Single-gene Duplications

In addition to the WGD that formed P. annua, smaller scale duplications can also accompany polyploidy and are collectively referred to as single-gene duplications (Panchy et al., 2016). We identified 2,008 tandemly duplicated and 1,815 proximally duplicated genes in the P. infirma genome. These numbers are similar to P. supina with 1,940 tandem and 1,914 proximal duplications. As compared to its progenitor genomes, allotetraploid P. annua has slightly fewer single-gene duplications in the A (infirma) subgenome (1,806 tandem and 1,736 proximal duplicated genes), and slightly more in the B (supina) subgenome (1,999 tandem and 2,160 proximal). Transposed duplications are another type of single-gene duplication and are thought to occur extensively after WGD (Zhao X.P. et al., 1998; Freeling M. et al., 2009; Qiao et al., 2019). We used the progenitor P. infirma and P. supina genomes as outgroup to identify pairs of transposed genes that were mobilized after the diploids hybridized to form P. annua (post-polyploidy). We found 63% more transposed duplications in P. annua’s B subgenome than in P. annua’s A subgenome (5,917 and 3,438 transposed genes, respectively). This result is similar to the pattern observed with proximal and tandem duplications and may point to a post-polyploidy expansion of the B subgenome and contraction of the A subgenome within P. annua.

Interestingly, 74% of transposed duplications in the B subgenome remained within B, while 46% of A duplications remained within the A subgenome, suggesting that inter-subgenomic duplications preferentially move from the A (infirma) subgenome and integrate into B (supina; Fig. 3b; χ² test, P < 0.0001). Inter-subgenomic transposed duplications are enriched for functions associated with Gypsy and Copia-type LTRs, suggesting that they are heavily involved with retrotransposon activity. Taken together with our molecular dating of LTRs, we expect that the observed bias in inter-subgenome transpositions is a reflection of the two subgenomes uneven abilities to silence retrotransposons and is a continuation of the TE momentum that was established during the independent evolution of the diploids. The observed bias in inter-subgenome transpositions may point to a trans effect, where retrotransposons diffuse from the subgenome with higher TE content to the subgenome with lower TE content.

Homoeologous Exchanges

Crossing over between ancestrally related chromosomes is a common occurrence in newly formed allopolyploids and are referred to as homoeologous exchanges (HEs; Gaeta and Pires, 2010; Mason and Wendel, 2020). We assessed HEs in P. annua (i.e., A segments in the B subgenome and B segments in the A subgenome) using the parental sequences as a guide to assign P. annua reads as either being derived from P. infirma or P. supina. We detected 1,299 homoeologous exchanges in the P. annua genome (Fig. 4; 657A segments in the B subgenome and 642 B segments in the A subgenome). Almost 2% of P. annua gene annotations are within HEs. Of those, 68% are A to B, suggesting that there may be an asymmetric exchange of genic sequence between the two subgenomes (823 A genes in the B subgenome vs 385 B genes in the A subgenome; χ² test, P < 0.0001). The average length of an HE was 16 kb for A to B subgenome HEs and 13 kb for B to A subgenome HEs. Interestingly, 1.6% of the B subgenome consists of A sequences (10.4 Mb), while 0.7% of the A subgenome is B sequences (8.3 Mb). A to B HEs were most enriched for genes involved in gibberellin 3-beta-dioxygenase activity, while B to A HEs were enriched in genes involved in telomere maintenance. The largest HE is a 2.2 Mb Pa7A to Pa7B exchange containing 103 genes (Fig. 4c). Three of P. annua’s 26 annotated histone H3-K4 methylation genes reside in this 2.2 Mb HE. The differences in HEs between subgenomes points to a visible but tenuous bias accumulation of genes in the B subgenome.

Fractionation Bias

Gene loss (fractionation) occurs via intrachromosomal recombination resulting in short deletions and is a typical behavior of ancient allopolyploids (Cheng et al., 2018). We compared the A and B subgenomes of P. annua to the A and B genomes of its progenitors and identified consistent gene retention (97%) across all chromosomes, likely reflecting the recent timescale of the P. annua WGD event (Supplementary Fig. 7). Although this result seems to clash with our observations at the single-gene and HE levels, it is important to note the distinction between these methodologies. The fractionation analysis used here (Joyce et al., 2016) calculates the number of genes retained in P. annua with respect to the syntenic sequences in the progenitor genomes. Consequently, single-gene duplications would only impact our fractionation analysis if they had duplicated in the progenitor genome but not in P. annua. The impact of HEs on our fractionation analysis is relatively small, since there are only 1,208 genes within HEs and most (~ 61%) have an ancestrally syntenic ortholog in the homoeologous subgenome and therefore would not impact fractionation values.

Homoeolog Expression And Subgenome Dominance

P. annua is typically described as having two distinct biotypes; plants with wild-type morphology and plants with dwarf-type morphology (Fig. 1a; sometimes referred to as annual- and perennial-types, respectively; Heide, 2001). Plants with wild-type habit resemble P. infirma, while the dwarf-types more closely resemble P. supina. Broad phenotypic plasticity has been reported where environmental factors such as animal disturbance, intense wind, soil properties, temperature, elevation, and even golf course-style management can influence P. annua to preferentially favor one biotype over the other (Williams et al., 2018). The two opposed morphologies likely play an important role in P. annua’s ability to infiltrate and persist across a spectrum of climactic conditions (Law et al., 1976).

Shimizu-Inatsugi et al. (2017), introduced the ‘polyploid plasticity hypothesis’ stating that an allopolyploid species might differentially utilize the expression profiles of its progenitor genomes depending on the environment. With agronomic and turfgrass breeding in mind, we aimed to test the hypothesis that P. annua might preferentially express genes from the B (supina) subgenome when exposed to mowing stress and the A (infirma) subgenome when allowed to grow in the absence of mowing stress (unmowed). We vegetatively propagated dwarf- and wild-type P. annua plants and subjected one clone to mowing stress for three months, while leaving the other clone unmowed for three months. We observed no correlation in the expression profiles between biotypes (dwarf or wild) across our biological replicates (Supplementary Fig. 8; Supplementary Fig. 9), indicating that dwarf- and wild-types exhibit similar transcriptional behavior under both mowed and unmowed conditions. After removing biotypes as a variable, we identified 5,505 and 6,400 differentially expressed pairs of homoeologs in our unmowed and mowed comparisons, respectively. We found that both mowed and unmowed plants showed a homoeolog expression bias favoring the B subgenome (Wilcoxon test: p = 0.001 and p = 0.0008, respectively), indicating that P. annua preferentially utilizes B (supina) genes regardless of mowing stress (Fig. 5). Although P. annua’s B subgenome expression bias is statistically significant in both treatment comparisons, the bias is not as evident as reported in other neo-allopolyploids (Flagel et al., 2008; Edger et al., 2017; Sigel et al., 2019; Bird et al., 2020), likely reflecting the recent timescale of the hybridization but perhaps also pointing to a more equitable relationship between P. annua’s subgenomes where primary metabolic function is partitioned across pairs of homoeologs (Supplementary Fig. 10). Only chromosomes one, four, and six showed consistent expression bias toward B homoeologs, suggesting that these three chromosomes contribute disproportionally to homoeolog expression bias at the whole-genome level (Fig. 5). Thus, we conclude that counter to the polyploid plasticity hypothesis, P. annua utilizes genes from both subgenomes with modest homoeolog expression bias favoring B (supina) genes irrespective of our environmental treatments.

In addition to homoeolog expression analysis, we also used our transcriptional data to compare gene expression between pairs of recently transposed gene duplications that were identified during our analysis of single-gene duplications. We identified, 973 pairs of transposed genes as being differentially regulated between their novel and ancestral copies. Of those, 847 (87%) were upregulated in the novel copy and most were A to B transpositions.

Whole-genome Resequencing And Large-scale Chromosomal Modifications

Homoeologous exchanges and bursts of activity in transposable elements contribute to genomic instability in polyploids but do not provide a satisfying explanation for the reported 80% variation in DNA content between P. annua ecotypes (Koshy, 1968; Mowforth and Grime, 1989). To explore intraspecific variation in P. annua at the whole-chromosome and DNA sequence level, we re-sequenced 13 geographically distinct genotypes and two additional elite breeding genotypes. Together, the 15 samples represent nine countries and four continents (Supplementary Fig. 11). The Illumina reads were aligned to the P. annua reference genome with a depth of coverage ranging between 13–26⋅. More than 99% of all reads mapped to the P. annua reference genome. SNP density across a 1Mb sliding window showed large variability in sequence divergence within subgenomes, suggesting that there may have been multiple hybrid origins (Supplementary Fig. 12). Of the 76,541 gene annotations in the reference, we found that 7,808 were absent (dispensable) from at least one of the 15 samples, leaving 68,733 ‘core’ genes approximately evenly split between subgenomes (Supplementary Fig. 13; 52% of core genes were from the B subgenome). Dispensable genes were enriched for function in RNA-mediated transposon integration, suggesting that retrotransposons are actively proliferating in the species in a genotype-specific manner. In addition to core and dispensable genes, we used the diploid genomes to identify HEs and determine the parental origin for P. annua homoeologs across all 15 samples. There were 5,217 genes within HEs in at least one sample. Most (60%) genes within HEs were transferred from A to B subgenome. A to B HEs were enriched for functions associated with primary metabolism, while B to A HEs were enriched for functions associated with telomere maintenance, continuing to point toward a biased accumulation of genes in the B (supina) subgenome (Fig. 4b; χ² test, P < 0.0001).

Reads mapped to the P. annua reference genome (and diploid progenitor genomes) provide a view of structural modifications at the whole-chromosome level. Using this approach, we identified striking variation in chromosome structure post-polyploidization. The largest is a 224 Mb deletion in the centromeric and pericentromeric region of chromosome 1A in some samples that amounts to 70% of the length of the reference chromosome (Fig. 6; Fig. 7; Supplementary Fig. 14). Coinciding with the deletion at 1A is a 32 Mb duplication at chromosome 1B. Split reads and improperly paired reads at the deletion and duplication breakpoints suggest that the duplicated region at 1B resides within the deleted region of 1A, and indeed, capillary electrophoresis using homoeolog-specific markers across the chromosomal breakpoint confirms this to be the case (Supplementary Fig. 15). The 1B duplication contains the highest density of LTRs across the chromosome, suggesting that the rearrangement most likely spans the centromere (Fig. 7b). There are 1,996 annotated genes and 133 functional enrichments (mostly transposon-associated categories) in the 224 Mb centromeric deletion. The 32 Mb homoeologous centromere brings 1,321 genes back and all but four of the functionally enriched categories.

Perhaps the most parsimonious path to this karyotype involves meiotic error, where 1A and 1B form a quadrivalent and adjacent disjunction leads to two 1A’s going to one pole and two 1B’s going to the other. When fertilized by a normal nucleus, the resulting offspring would be 1A1B1B1B (or 1A1A1A1B). Subsequent generations would lead to introgression of 1A at recombination sites, which would cause most of the genic regions of the displacing 1B chromosome to return to a 1A-like state. Alternatively, it is possible that dysploidy and Robertsonian rearrangements (fusion-fission) played an intermediate role, where again, introgression back to the population resulted in the observed karyotype (Schubert and Lysak, 2011). To our knowledge, cytological studies have not recorded any evidence of dysploidy in P. annua.

Chromosome 1A has more repetitive DNA (90%) than any other chromosome in the P. annua reference genome, which likely plays a role in the observed restructuring in some ecotypes (Kent et al., 2017). Most (99%) of the 224 Mb deleted region is low-complexity repetitive sequence, indicating that it would likely be wound into pericentromeric heterochromatin and suppressed from meiotic recombination (Beadle, 1932; Baker, 1958; Si et al., 2015). Fittingly, rearrangements at 1A appear to reside at the periphery of heterochromatic sequences (Fig. 7b). Large-scale chromosomal rearrangements in P. annua occur in intragenic recombination ‘hotspots’, where individual genotypes display some variability in the exact nucleotide coordinates of the breakpoint but are often within several kilobases of each other (Supplementary Fig. 16; Supplementary Fig. 17). Some individuals appear to contain large-scale variability between their parental haplotypes. For example, sample ‘Ohio’ has a copy of 1A that resembles the P. annua reference genome, while the other haplotype contains the 224/32 Mb centromeric displacement discussed above (Fig. 8; Supplementary Fig. 15). Such variability between haplotypes is surprising given that homologous chromosomes recognize each other by sequence similarity and incorrect pairing could lead to multivalents, which are associated with improper segregation and reduced fertility.

Variation Of Epsps

P. annua is most commonly known as a noxious weed. It can be managed with both pre- and post-emergent herbicides, but repeated application has resulted in the evolution of multiple herbicide resistance pathways (Sammons & Gaines, 2014; Gaines et al., 2019). Glyphosate resistance has been particularly problematic for managers of P. annua. Glyphosate works by inhibiting the enzyme 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) in the shikimate pathway. Resistant P. annua’s have been reported to have increased EPSPS copy number variation and missense mutations (Bruhharo et al., 2019). The A subgenome EPSPS gene is 3,891 bp in length, while the B subgenome copy is 9,092 bp. We identify large genotypic variation in the structure of EPSPS homoeologs, particularly in the B subgenome, where nearly 6 kb is deleted from the two largest introns in some genotypes (Supplementary Fig. 18). We do not see evidence of copy number variation in our resequencing data, but that is likely because none of our genotypes originated from a population with known herbicide resistance.

Here, we present the chromosome-scale genome sequences of Poa infirma and Poa supina, the diploid progenitor species of allotetraploid Poa annua. The genomic resources generated here, and in Robbins et al. (2022), comprise one of the first reports detailing an allopolyploid and its progenitors sequenced to chromosome level.

The plant cell’s response to WGD: Comparative genomics between the P. annua reference genome and the genomes of its diploid parents suggests that some tetraploid genotypes are remarkably unchanged since polyploidy. In contrast to paleo-allopolyploids where biased fractionation is a hallmark of diploidization, it appears that neo-allotetraploid P. annua is more accurately characterized by biased reshuffling, where the B (supina) subgenome has preferentially acquired genes from the A (infirma) subgenome in the absence of measurable loss. It is possible that reshuffling precedes fractionation, and if homoeolog expression bias and TE content are accurate early predictors of subgenome dominance, as is the case in other allopolyploids (Hollister and Gaut, 2009; Freeling et al., 2012; Edger et al., 2017), the bias accumulator of genes (i.e., the B subgenome) will be preferentially retained (dominant).

Retrotransposon response to WGD: Our data suggest that the subgenomes of P. annua have varying ability to inhibit LTRs, both within and between subgenomes. Because inter-subgenome defense likely involves silencing LTRs after they re-enter the nucleus (for review see Sabot and Schulman, 2006), we hypothesize that the observed bias in transposon movement from A (infirma) subgenome to B (supina) subgenome is driven by differences in the subgenome’s ability to repress retrotransposons post-transcriptionally. We expect that newly formed allopolyploids with broadly divergent TE immunities will approach retrotransposon-equilibrium as the subgenome with fewer TE inhibitory mechanisms will be preferentially bloated by TEs.

The plant cell’s response to retrotransposons in light of WGD: Although the P. annua reference genome closely resembles the parental genomes, this is not the case for all P. annua individuals, with some genotypes appearing heavily restructured relative to the genomes of the diploid progenitors. It is likely not a coincidence that the observed chromosomal rearrangements result in the substitution of a heavily TE-parasitized region with a less parasitized homoeologous segment. It appears that WGD has provided Poa annua with the homoeologous ‘spare parts’ to purge highly parasitized sequences. This result supports the Genome Balance Hypothesis, which predicts that differences in the amount of pericentric heterochromatin between subgenomes (as observed between 1A and 1B) will cause chromosomes to move to the poles at uncoordinated times, and that the centromere of one of the parents will be retained to overcome those segregation issues (Freeling et al., 2015). We expect that our genome resequencing provides a snapshot of an organism caught in that act of positive selection for a balanced genome. The genomic resources detailed in this work should serve as a powerful tool for turfgrass breeders and herbicide biologists to facilitate better targeting of P. annua and accommodation for its unique evolutionary origin.

Collecting genomic and transcriptomic resources

Seeds of P. infirma were obtained from the turfgrass breeding collection at Penn State University and represent the only publicly available source of this species. The ‘Supranova’ cultivar was selected to represent P. supina as it is the most widely used cultivar on the market with agronomic application as a turfgrass, primarily known for its shade tolerance. Seeds were germinated on moist filter paper in petri dishes before being transferred to potting soil in a growth chamber at 20°C and 8-hour day lengths. A single genotype was selected for each species and clonally propagated.

Genomic DNA was extracted from fresh leaf tissue using the cetyltrimethylammonium bromide (CTAB) method as outlined by OPS Diagnostics protocols with minimal vortexing and cut pipet tips to promote high molecular weight DNA extractions. Sample integrity was verified using pulsed-field electrophoresis indicating an average range between 50–70 kb. DNA was sheared to 20 kb length using a Megaruptor (PacBio). HiFi libraries were constructed using the PacBio Express kit, v2.0, and size selection was performed on a SageELF (Sage Science) to obtain narrow 15–20 kb libraries for sequencing using a PacBio Sequel II (Brigham Young University, DNA Sequencing Center). Three 8M SMRT cells with 30-hour movies were used for each diploid. PacBio sequencing yielded 72 Gb of Q20 reads for P. infirma (29⋅ fold coverage) and 45 Gb of Q20 reads for P. supina (30⋅ fold coverage). For Omni-C proximity ligation (Dovetail Genomics), genomic DNA was re-extracted from the same genotypes after 72-hours of dark treatment. One proximity ligation library was prepared for each species and sequenced using the Illumina HiSeq platform to obtain 464 million reads (28⋅ coverage) for P. infirma and 466 million reads (49⋅ coverage) for P. supina using 75 ⋅ 75 bp paired-end reads. We pooled a variety of tissues and treatment types for full-length RNA-sequencing with Iso-seq (PacBio) to help facilitate high quality gene annotations. Tissue types included germinating seedlings, fresh leaves and root, and juvenile and mature inflorescences. Treatments included clonally propagated individuals that were exposed to 8-hour light, 16-hour light, cold (4°C) treatment for two weeks, treated to 1” simulated mowing stress for one week (five total cuts), and exposure to 100 mM NaCl for two weeks. Meristematic crown tissue was collected for each treatment. All RNA samples were extracted using the Qiagen RNeasy Plant Mini Kit. RNAs for Iso-Seq were pooled and libraries were constructed using the PacBio express kit (v2.0). Each of the two Iso-seq libraries per species ran for 24 hours on an 8M SMRT cells with a Sequel II instrument and yielded 4,026,288 million P. supina transcripts and 3,689,421 P. infirma full-length Iso-seq transcripts.

Nuclear Genome Assembly

K-mers were extracted from long-read (HiFi) sequencing data using Jellyfish (v2.2.10; Marçais and Kingsford, 2011) with 21-mers and a hash with 100M elements (parameters ‘-m 21 -s 100M’). GenomeScope (v.1; Vurture et al., 2017) was used to plot k-mers and estimate genome size, level of heterozygosity, and amount of repetitive sequence using 15,000 bp read lengths (Supplementary Fig. 19). K-mer analysis confirmed that P. supina was highly heterozygous and P. infirma was highly homozygous (Heide, 2001). As a result, we selected different assembly pipelines for each species that best accommodated its unique biology. The genome of the highly heterozygous and obligate outcrosser, P. supina, was assembled with HiCanu (v2.1; Nurk et al., 2020) and purged to haplotig level using the Purge_Dups (v1.0.1) pipeline (Guan et al., 2020) with manual cutoffs adjusted according to its heterozygosity (calcuts parameters ‘-l 7 -m 40 -u 160’; minimum alignment score (-a) to 80). Poa infirma is self-pollinated and highly homozygous. As a result, we assembled the P. infirma genome using HiFiasm (v0.3; Cheng et al., 2021) with its built-in haplotype purging algorithm that is better suited for homozygous genome assemblies’. The Benchmarking Universal Single-Copy Orthologs (BUSCO; v3.0.2) software was used to estimate assembly completeness and their quality (Simão et al., 2015). We also scanned for incorrectly placed centromeric and telomeric repeats using ‘bedtools nuc’ and a 1Mb sliding window to count the occurrences of common repetitive sequences found in the centromeric and telomeric sequences of Poaceae. The purged haplotype assemblies and raw Omni-C reads were input into the HiRise pipeline (Dovetail Genomics) to scaffold contigs, identify chimeric scaffolds, and build a final genome assembly based on proximity ligation (Supplementary Fig. 1). Taxonomic classification with Kraken2 (v2.1.1; Wood et al., 2019) was used to filter out potential contaminants from the final assemblies and verify that the chromosomes did not contain non-plant DNA, which may indicate a chimeric assembly. The P. infirma genome assembled into seven pseudomolecules and 873 supplementary scaffolds. The P. supina nuclear genome contained seven pseudomolecules and 357 supplementary scaffolds. Poa supina pseudomolecules ranged between 73 Mb and 115 Mb, while P. infirma ranged between 90 Mb and 331 Mb in length. The seven chromosomes of each species were re-oriented, if necessary, to reflect identical strand orientation across all pairs of orthologous chromosomes. Chromosomes were renamed according to pre-established chromosomal nomenclature and large structural modifications between each diploid and the allotetraploid P. annua were verified by sequence alignment using minimap2 with parameters ‘--secondary = no -cx asm10’ (v.2.24; Li, 2018).

Chloroplast Genome Assembly

Raw whole-genome sequenced HiFi reads were mapped to the P. annua chloroplast reference genome (GenBank acc: NC_036973.1) using minimap2 (v2.24) using ‘map-hifi’. Samtools (v1.9; Li, et al., 2009) was used to identify mapped reads with a minimum query length (mlen) > 8000, query value (qval) > 60 and GC content between 32–52%. Reads with a length > 20000 bp were then included in a final de novo assembly of the chloroplast genome with HiCanu (v2.1) using default parameters. A circular genome was predicted by HiCanu, which was subsequently trimmed as projected by HiCanu at the same starting point as the reference chloroplast genome. Sequence alignment of the circular chloroplast genomes for each species verified that P. infirma is the maternal parent to P. annua (Supplementary Fig. 20).

Repeat Masking And Ltr Insertion Times

De novo repeat libraries were created for each diploid assembly using the Dfam (v3.1; Hubley et al., 2016) database to classify transposable DNA sequences. RepeatModeler (v2.0.3; Flynn et al., 2020) with the parameter ‘-LTRStruct’ was used to model TE family relationships and identify repetitive elements by employing programs RECON, RepeatScout, LTRHarvest (Ellinghaus et al., 2008) and LTR_retriever (v2.8.7; Ou and Jiang, 2018). The resulting TE consensus classification libraries were used as input into RepeatMasker (v4.1.2) to softmask each of the genome assemblies using the wublast engine. LTR_FINDER_parallel (v1.1; Ou and Jiang, 2019; with parameter ‘-harvest_out’) and LTR_retriever were run separately on all three species to calculate the insertion times for intact LTR elements. A rice mutational rate of 1.3 ⋅ 10^− 8 substitutions per year was used to calculate insertion times using the formula T = K/2µ, where K is the divergence rate calculated based on LTR sequence identity and µ is the neutral mutational rate in mutations per bp per year (Ma and Bennetzen, 2004).

Genome Annotation

RNA-sequencing runs SRR1634026 and SRR1634028 were downloaded from NCBI’s Sequence Read archive database representing P. supina and P. infirma, respectively. Poa annua sequences from experiments SRR1634028, SAMD00020897, and SAMD00020898 were also acquired. All NCBI sequencing experiments were then trimmed for adapter content and low quality using bbduk with ‘tbo tpe ktrim = r k = 23 mink = 11 hdist = 1’. Cleaned reads from NCBI could be larger than 20 gigabytes so we randomly subset each experiment run into a single 400-megabyte file. Each fastq file was aligned to the respective genome using the splice-aware algorithm, HISAT2 (v2.2.1; Kim et al., 2019). Iso-seq transcripts for each species were aligned using minimap2 with ‘-ax splice:hq -uf’. NCBI and Iso-seq alignment files were sorted by name and converted to bam format. The OrthoDB plant protein database (v10) was downloaded and expanded to include amino acid sequences of Poales annotations available on NCBI refseq and Uniprot TrEMBL. BRAKER2 (v2.1.5) was run in ETP mode to incorporate both the enhanced OrthoDB protein data and the RNA alignment data from NCBI and Iso-seq to train GeneMark-ETP with proteins processed by ProtHint. Augustus was trained based on the GeneMark-ETP predictions and the resulting protein predictions are hints from both sources. BRAKER2 also added 5’ and 3’ UTRs were added using ‘–-addUTR = on’ to call GUSHR. Annotations were filtered using sequence similarity to orthologous groups and phylogenies in the eggNOG (Huerta-Cepas et al., 2019) database (v2.0.5) using diamond alignments to retain only those annotations with fine-grained orthologous relationships. BUSCO (v3.0.2) was used in transcriptome mode to identify the majority (96% and 91%) of the 1,614 conserved embryophyta_odb10 orthologs were present in our P. infirma and P. supina chromosome annotations, respectively, supporting high-quality genome annotations. Long noncoding RNAs were identified using RNAplonc (v1.1; Negri et al., 2019) that uses a classifier approach developed specifically for plants. The chloroplast genome assemblies were functionally annotated using GeSeq (Tillich et al., 2017).

Cytology

C-banded chromosome preparations were made from root-tip meristematic cells according to the protocol described by Mitchell et al. (2003) except that 0.02% colchicine, not trifluralin, was used to arrest microtubule formation for 2–4 hours at room temperature.

Rna-seq Expression Analysis And Homoeolog Expression Bias

Plants were collected from an ongoing field trial from the turfgrass breeding program at Penn State University. For each P. annua breeding line, at least one typical dwarf-type and one aberrant wild-type plant were collected from a genetically pure unmowed stand. Dwarf-types were defined as any genotype with diameter ≤ 1.5cm, while aberrant wild-types had a diameter ≥ 6cm. Plants were transplanted to a greenhouse (27°C high and 17°C low) and clonally propagated over two months. To simulate mowing treatment, one clone of each genotype was trimmed three times per week and maintained at 1.5 cm height and the other clone was left untrimmed. The experiment was conducted on 30 plants representing 15 unique genotypes (six dwarf-types and nine wild-types). Spacing on the bench was randomly assigned. Treatments were applied between May 10th and August 16th, 2020. All plants were allowed to grow unmowed for an additional three weeks prior to tissue collection to reduce the influence of wounding stress in our data analysis. Tissue was collected from the grass’s basal meristem.

Unique libraries for each sample were created using the Lexogen SENSE mRNA-seq library kit with the goal of producing long insert sizes ~ 485 bp for simplified and accurate inference of parental origin across homoeologous pairs (Hu et al., 2020). A pilot study was conducted using a MiSeq with Nano kit reagents (v2) to obtain 500 Mb of 250 ⋅ 250 bp paired-end sequencing. The pilot analysis revealed that insert sizes were generally shorter than anticipated with 75% of inserts being ≤ 260 bp. Adjusting for shorter library lengths, we sequenced the RNAs using an S1 flow cell on an Illumina NovaSeq 6000 (Penn State University, Genomics Core Facilities) to obtain 150 ⋅ 150 bp paired-end reads with ~ 48 million reads/sample.

We used Eagle-RC (v1.1.2; Kuo et al., 2018) to classify RNA-seq reads to their appropriate subgenome using explicit genotypic differences between them to calculate the likelihood that an RNA read came from a particular subgenome. Briefly, variant candidates for statistical inference were generated using reciprocal LAST (v1387; Frith et al., 2010) to identify homoeologous genes and an Eagle-RC python script (homeolog_genotypes.py) to create the variant file (VCF). Reads were mapped to the parental genomes separately using STAR (v2.7.8a; Dobin et al., 2013). The EAGLE model subsequently evaluates the likelihood of each reads subgenome origin based on genotypic variants and assigns a likelihood score. Alignments with SNP evidence to support subgenome origin are dubbed homoeolog-specific and quantified with featureCounts (v2.0.2; Liao et al., 2014). The resulting counts matrix was filtered to retain only the genes that had at least one read per sample. The ‘run_DE_analysis.pl’ script from the trinityrnaseq toolkit (v2.13.0; Haas et al., 2013) was used to run DESeq2 (v1.38; Love et al., 2014) on the counts matrix for subgenome-specific differential expression analysis. Because plant biotypes (dwarf or wild) were nested within treatments (mowed or unmowed), biotype as a variable was removed to prevent erroneous interpretation in our mowed vs unmowed and A vs B subgenome comparisons (Supplementary Fig. 9).

Gene Ontology Enrichment Analysis

Gene ontologies and functional enrichments of differentially expressed genes were classified using the Trinotate pipeline. Blastp and Blastx (v2.12; Camacho et al., 2009) were used to align P. annua amino acids and coding sequence files against the Uniprot Swissprot database with parameters ‘-max_target_seqs 1 -outfmt 6 -evalue 1e-3’. Hmmscan (v3.3.2) was used to incorporate protein domain identification based on query against the pfam database. An id2go formatted file was then generated using the blastx, blastp, and hmmscan results to incorporate the Swissprot and pfam alignments using go-basic and pfam2go annotations from geneontology.org. The id2go formatted file was incorporated into ‘analyze_diff_expr.pl’ (trinityrnaseq toolkit) with the ‘–examine_GO_enrichment’ flag to call the R package Goseq to scan for enriched gene ontologies in our subgenome-specific differential expression matrix. The id2go formatted file was also used as input into Goatools script ‘find_enrichments.py’ (Klopfenstein et al., 2018) to identify enriched ontologies in various subsets of genes of interest. Candidates from enriched subsets were further analyzed using EggNOG-mapper and BLAST for functional annotation at the single-gene level.

Comparative Genomics

P. annua (PaA & PaB) and a concatenated file containing the diploid parents (PiA & PsB) were uploaded into CoGe SynMap tool (Haug-Baltzell et al., 2017) with DAGChainer options ‘-D 20 -A 5’ and tandem duplication distance set to 10. Synonymous mutation (Ks) was calculated on the syntenic CDS pairs using CodeML of the PAML package. Ks values were plotted on a density plot to visualize Ks peaks associated with parental divergence and hybridization. For CoGe’s fractionation bias calculation, syntenic blocks were merged using the ‘Quota Align Merge’ algorithm with a maximum distance between two genes (-Dm) set to 40. Syntenic depth was calculated with the ‘Quota Align’ algorithm and ratio of coverage depth set to 1-to-2. The window size for fractionation bias was adjusted to 100 genes and set to only use syntenic genes in the target genome. MCScanX (Wang et al., 2012) was used to detect syntenic blocks of genes between P. annua and the diploid progenitors. The collinear file was input into SynVisio (Bandi, 2020), an interactive multiscale synteny visualization tool to depict regions of shared homology. Syntenic pairs and macrosynteny in monocots was calculated using the MCscan (python version) with Ananas comosus and Brachypodium distachyon coding sequences and genomes downloaded from Phyotozome (v12; Goodstein et al., 2012). A C-score of 0.99 was used to select only 1:1 orthologous blocks and is stringent enough of to filter out syntenic blocks that were not LAST reciprocal best hit. Translated transcriptomes of model grasses were acquired through Phytozome and put into orthogroups using Orthofinder with the Diamond algorithm for similarity searches. Average nucleotide identity (ANI) was calculated using the gap-compressed per-base sequence divergence output (de tag) of a PAF formatted full genome assembly alignment using minimap2. DupGen_finder (Qiao et al., 2019) was used to identify duplicated single-gene duplicate pairs and classify them as either WGD, tandem, proximal, transposed, or dispersed. A concatenated fasta file containing both parental diploids (PiA & PsB) was used as an outgroup so that the transposed classification included only those genes that were duplicated after the hybridization of P. annua.

Identification Of Homoeologous Exchanges

Homoeologous exchange regions were characterized using several different methods to assure accurate identification. First, CNVkit (v0.9; Talevich et al., 2016) was used to identify and visualize copy number variants by mapping HiFi (ccs) reads from P. annua onto the parental diploid genomes (PiA & PsB). Second, minimpa2 with ‘-x map-hifi’ was used to map P. annua HiFi reads to the concatenated fasta containing the parental diploid genomes (PiA & PsB). The resulting bam file was input into SVIM (v1.0.2; Heller and Vingron, 2019) and used to detect structural variants from our long-read sequencing data and extract split-reads with translocation breakpoints, called BNDs by SVIM. Split-reads were extracted from the bam file and used to detect beginning and endpoint of a homoeologous exchange block. Thirdly, we used mmseqs (Steinegger and Söding, 2017) with parameters ‘easy-rbh -s 7.5’ to identify P. annua coding sequences with reciprocal best hits corresponding to the other subgenome (PaA genes with RBHs on PsB or PaB genes with RBHs on PiA). Finally, we used a primary mapping approach where P. annua HiFi reads are aligned to a fasta file containing both parental diploid genome (PiA & PsB). Reads with primary mapping flag were retained and sorted into two pools, reads that mapped to the PiA genome and reads that mapped to the PsB genome. Both pools of P. annua reads were re-mapped to P. annua. If a P. annua read mapped best to the P. infirma (PiA) parent but subsequently mapped to P. annua’s B (supina) subgenome, it was a candidate for homoeologous exchange. All four homoeologous exchange methods were compared and it was determined that the primary mapping approach was superior as it was visually verifiable in the Integrative Genomics Viewer (IGV; Thorvaldsdottir et al., 2013) and produced HE statistics that were most intermediate to the other methods. The JCVI chromosomal painting tool (jvci.graphics.chromosome) was used to visualize P. annua’s HEs.

Resequencing P. annua

15 geographically distinct P. annua genotypes were sequenced to survey genotypic variation across the species. Samples ‘Germany’ (W6 28152), ‘Nunavut’ (PI 236900), ‘India’ (PI 217625), and ‘Belgium’ (PI 442543) were acquired from the Germplasm Resources Information Network (GRIN) through the US Department of Agriculture. Samples ‘Washington’ (Tacoma), ‘Scotland’ (Galloway), ‘New Zealand’ (Manawata), ‘Arizona’, ‘Quebec’, ‘Wales’ (Aberystwyth), ‘Sweden’ (Särö), ‘New York’ (Pa-33) and ‘Ohio’ (Columbus) were acquired from a breeding collection maintained at Penn State University. Seeds were germinated on moist filter paper. A single genotype of each of the thirteen samples was transferred to potting soil (Promix) and grown in greenhouse. In addition to the thirteen geographically distinct genotypes, two breeding lines were included. ‘Pa-14 dwarf’ and ‘Pa-14 wild-type’ are derived from the same breeding pedigree of an unstable line (Pa-14), where ‘wild’ describes an aberrant wild-type plants and ‘dwarf’ describes an agronomically desirable dwarf individuals within the line. Plants were established from seeds and DNA was extracted from fresh leaf tissue using the CTAB method as described above. Plants were genotyped to confirm their status as authentic P. annua’s using the Trx2 nuclear gene with PCR parameters described in Mao and Huff (2012), and Patterson et al. (2005). Genomic DNA (300ng) from each sample was input into the Illumina DNA PCR-Free Prep to create uniquely indexed libraries. The samples were pooled and an equimolar concentration was verified using a MiSeq Nano 150 ⋅ 150bp. The pooled sample was sequenced on a NovaSeq S1 (Penn State University, Genomics Core Facilities) with 150 ⋅ 150 bp paired-end sequencing to generate a target of 1.3–1.6 billion pairs and 15–20⋅ coverage per genotype across the haploid (1.78 Gb in size; Robbins et al., 2022) genome.

Raw Illumina reads were trimmed for adapter sequences using bbduk as described above and aligned to the P. annua reference genome using bwa-mem2. Scaffolds corresponding to the chloroplast and mitochondrial genomes were included in the P. annua reference genome to prevent erroneous alignment of plastid sequences to the genome. Coverage across chromosomes and scaffolds were plotted using WGSCoveragePlotter.jar (jvarkit). Putative homoeologous exchanges were annotated similarly as described above. Briefly, raw reads were mapped to a file containing the parental P. infirma (A) and P. supina (B) genomes. Primary alignments were re-mapped to the P. annua reference genome. Reads that mapped to a different parental genome than P. annua subgenome were potentially a homoeologous exchange. We then classified each coordinate in the P. annua reference as either derived from P. supina, derived from P. infirma, or novel (not derived from either parent). In contrast to the HE pipeline used above that used HiFi (ccs) data, novel regions were annotated independently as opposed to being unincluded in the bed file. This adjustment allowed more accurate visualization of short-reads with jcvi.graphics.chromosome.

Presence absence variants were analyzed using an SGSGeneloss-based protocol, described in Fernandez et al., (2022). Illumina reads were mapped to the parental genomes and subsequently the P. annua genome to identify HE regions as described above. The alignment file for each sample was converted to bed format using bamToBed from bedtools (v2; Quinlan et al., 2010). The alignment bed was merged with the gene annotation file using bedtools intersect to identify regions of overlap. If a gene’s coordinates contained < 20% coverage in the sample, it was deemed lost (dispensible) in that sample. If it had > 90% coverage in the opposite parent, it was deemed a gene within an HE.

Large-scale structural variants were annotated manually by analyzing the depth of coverage of each sample alignment to both progenitor and allotetraploid genomes. Duplicated and deleted sequences will cause deviation from the mean and median coverage. Duplicated sequences align to the next most homologous coordinates in the reference genome and are visible by elevated coverage at that site. Deleted sequences are detectible by reduced coverage at the missing region. Transposed sequences are represented in equal proportion in the reference and in the sample, therefore they do not cause deviations from the mean coverage. Split reads and improperly paired reads at the junctions of duplicated and deletion breakpoints can be used to further specify the exact coordinated of the exchanges. Commonly used tools that identify structural variants such as Delly2, Manta, and Lumpy are not equipped to identify indels larger than several kilobases in length. Large-scale structural rearrangements were verified using homoeolog-specific primers and Sanger sequencing. Primer pair 1AF (5′- GGCGGACACCTTTGACACC) and 1AR (5′- GGATACTCAGACAATGATAG) amplify using standard PCR settings with a 53°C annealing temperature and 1:00 extension time. Primer pair 1AF (5′- GGCGGACACCTTTGACACC) and 1BR (5′- GGGTGACAGAGTTCCCAGTG) amplify using standard PCR settings with a 65°C annealing temperature and 1:20 extension time. 1AF to 1AR spans a chromosomal breakpoint and only amplifies in the absence of the 32/224 Mb structural modification. 1AF to 1BR spans the same chromosomal breakpoint but only amplifies in the presence of a rearrangement.

SNPs were identified from each of the 15 samples using their corresponding P. annua alignment file. Picard MarkDuplicates was used to tag duplicated reads and reduce the frequency of incorrect SNP calls. The duplicate-marked bam files were used to generate genotype likelihood calls across all samples and chromosomes using parameters ‘-q 40 --ff UNMAP,SECONDARY,QCFAIL,DUP’ with bcftools mpileup and subsequently input into bcftools call with default parameters. Variants were further filtered with vcftools using parameters ‘--remove-indels --maf 0.1 --max-missing 0.9 --minQ 30 --min-meanDP 10 --max-meanDP 80 --minDP 10 --maxDP 80’.

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Availability of data and materials

Genome assembly and gene annotation files are available through the CyVerse CoGe platform. Raw sequence data are available in the Sequence Read Archive under NCBI BioProject PRJNA938153. Relevant plant resources are accessible through the Germplasm Resources Information Network and through the Pennsylvania State University Turfgrass Breeding Program Repository. This research complies with relevant institutional, national, international, and legislative guidelines relating to the handling of wild and cultivated plant materials.

Competing Interests

The authors declare that they have no competing interests.

Funding

This research was supported by the United States Golf Association under the Turfgrass and Environmental Research Program; the Pennsylvania Turfgrass Council (PTC); the Huck Institute of Life Sciences, Pennsylvania State University; the College of Agricultural Sciences, Pennsylvania State University; and Hatch Project PA 4592. This research was also supported by the USDA National Institute of Food and Agriculture, Hatch project 1023293. The funding bodies did not contribute to the design of the study, or the collection, analysis, and interpretation of data.

Authors’ contributions

CWB and DRH conceived and designed the project and all research activities. CWB, DRH, MDR, and BSB collected the samples, extracted DNA and RNA, and directed the Illumina and PacBio sequencing. CWB and MDR assembled and annotated the genomes. CWB preformed the comparative genomics with contributions from MRS, ELP, NDH, and JPM. The chloroplast genome was assembled and annotated by JPM, and ENJ preformed the cytology. CWB preformed the gene expression analysis and analysis of EPSPS alleles. CWB designed and implemented the homoeolog-specific markers. CWB, JPM, ELP, and MDR coordinated data submission. CWB wrote the manuscript with review and revisions from all other authors.

Acknowledgements

Not applicable

Hemp A. Introduced plants on Kilimanjaro: tourism and its impact. Plant Ecol. 2008;197:17–29.
Aronson MFJ, La Sorte FA, Nilon CH, Katti M, Goddard MA, Lepczyk CA, et al. A global analysis of the impacts of urbanization on bird and plant diversity reveals key anthropogenic drivers. Proc Biol Sci. 2014;281:20133330.
Chwedorzewska KJ, Giełwanowska I, Olech M, Molina-Montenegro MA, Wódkiewicz M, Galera H. Poa annua L. in the maritime Antarctic: an overview. Polar Record. 2015;51:637–43.
Gibeault V. A. Perenniality in Poa annua L. Thesis. Oregon State University;1971.
Vargas Jr. JM, Turgeon AJ. Poa annua: Physiology, Culture, and Control of Annual Bluegrass. John Wiley & Sons; 2003.
Nosov NN, Tikhomirov VN, Machs EM, Rodionov AV. On polyphyly of the former section Ochlopoa and the hybridogenic section Acroleucae (Poa, Poaceae): insights from molecular phylogenetic analyses. Nordic Journal of Botany. 2019;37:njb.02015.
Heap I. International Survey of Herbicide Resistant Weeds. http://www.weedscience.org. Accessed 20 Dec 2022.
Lush WM. Biology of Poa annua in a Temperature Zone Golf Putting Green (Agrostis stolonifera/Poa annua). I. The Above-Ground Population. Journal of Applied Ecology. 1988;25:977–88.
Nannfeld JA. The chromosome numbers of Poa sect. Ochlopoa A. & Gr. and their taxonomical significance. Botaniska notiser. 1937;1937:238–54.
Soreng RJ, Bull RD, Gillespie LJ. Phylogeny and Reticulation in Poa Based on Plastid trnTLF and nrITS Sequences with Attention to Diploids. Diversity, phylogeny and evolution in the monocotyledons Aarhus Univ Press, Denmark p 619–643. 2010.
Mao Q, Huff DR. Evolutionary Origin of Poa annua L. Crop science. 2012.
Chen S, Mcelroy S, Dane F, R. Goertzen L. Transcriptome Assembly and Comparison of an Allotetraploid Weed Species, Annual Bluegrass, with its Two Diploid Progenitor Species, Schrad and Kunth. The Plant Genome. 2016;9.
Darmency H, Gasquez J. Spontaneous hybridization of the putative ancestors of the allotetraploid Poa annua. New Phytologist. 1997;136:497–501.
Tutin TG. A contribution to the experimental taxonomy of Poa annua L. A contr. University of Leicester. 1957.
Hovin AW. Meiotic Chromosome Pairing in Amphihaploid Poa annua L. American Journal of Botany. 1958;45:131–8.
Ravi M, Chan SWL. Haploid plants produced by centromere-mediated genome elimination. Nature. 2010;464:615–8.
Dunwell JM. Haploids in flowering plants: origins and exploitation. Plant Biotechnology Journal. 2010;8:377–424.
Paterson AH, Bowers JE, Chapman BA. Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proceedings of the National Academy of Sciences. 2004;101:9903–8.
Stebbins GL. Types of polyploids; their classification and significance. Adv Genet. 1947;1:403–29.
McClintock B. The Significance of Responses of the Genome to Challenge. Science. 1984;226:792–801.
Thomas BC, Pedersen B, Freeling M. Following tetraploidy in an Arabidopsis ancestor, genes were removed preferentially from one homeolog leaving clusters enriched in dose-sensitive genes. Genome Res. 2006;16:934–46.
Schnable JC, Springer NM, Freeling M. Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Proc Natl Acad Sci U S A. 2011;108:4069–74.
Wang X, Wang H, Wang J, Sun R, Wu J, Liu S, et al. The genome of the mesopolyploid crop species Brassica rapa. Nat Genet. 2011;43:1035–9.
Edger PP, Poorten TJ, VanBuren R, Hardigan MA, Colle M, McKain MR, et al. Origin and evolution of the octoploid strawberry genome. Nature Genetics. 2019;51:541–7.
Robbins MD, Bushman BS, Huff DR, Benson CW, Warnke SE, Maughan CA, et al. Chromosome-Scale Genome Assembly and Annotation of Allotetraploid Annual Bluegrass (Poa annua L.). Genome Biology and Evolution. 2023;15:evac180.
Brůna T, Hoff KJ, Lomsadze A, Stanke M, Borodovsky M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genomics and Bioinformatics. 2021;3:lqaa108.
De La Torre AR, Li Z, Van de Peer Y, Ingvarsson PK. Contrasting Rates of Molecular Evolution and Patterns of Selection among Gymnosperms and Flowering Plants. Mol Biol Evol. 2017;34:1363–77.
McKain MR, Tang H, McNeal JR, Ayyampalayam S, Davis JI, dePamphilis CW, et al. A phylogenomic assessment of ancient polyploidy and genome evolution across the Poales. Genome Biology and Evolution. 2016;evw060.
Clark JW, Donoghue PCJ. Whole-Genome Duplication and Plant Macroevolution. Trends in Plant Science. 2018;23:933–45.
McCue AD, Nuthikattu S, Slotkin RK. Genome-wide identification of genes regulated in trans by transposable element small interfering RNAs. RNA Biol. 2013;10:1379–95.
Sigman MJ, Slotkin RK. The First Rule of Plant Transposable Element Silencing: Location, Location, Location. The Plant Cell. 2016;28:304–13.
vonHoldt BM, Takuno S, Gaut BS. Recent Retrotransposon Insertions Are Methylated and Phylogenetically Clustered in Japonica Rice (Oryza sativa spp. japonica). Molecular Biology and Evolution. 2012;29:3193–203.
Panchy N, Lehti-Shiu M, Shiu S-H. Evolution of Gene Duplication in Plants. Plant Physiol. 2016;171:2294–316.
Zhao XP, Si Y, Hanson RE, Crane CF, Price HJ, Stelly DM, et al. Dispersed repetitive DNA has spread to new genomes since polyploid formation in cotton. Genome Res. 1998;8:479–92.
Freeling M. Bias in plant gene content following different sorts of duplication: tandem, whole-genome, segmental, or by transposition. Annu Rev Plant Biol. 2009;60:433–53.
Qiao X, Li Q, Yin H, Qi K, Li L, Wang R, et al. Gene duplication and evolution in recurring polyploidization–diploidization cycles in plants. Genome Biology. 2019;20:38.
Gaeta RT, Chris Pires J. Homoeologous recombination in allopolyploids: the polyploid ratchet. New Phytol. 2010;186:18–28.
Mason AS, Wendel JF. Homoeologous Exchanges, Segmental Allopolyploidy, and Polyploid Genome Evolution. Front Genet. 2020;11.
Cheng F, Wu J, Cai X, Liang J, Freeling M, Wang X. Gene retention, fractionation and subgenome differences in polyploid plants. Nature Plants. 2018;4:258–68.
Joyce BL, Haug-Baltzell A, Davey S, Bomhoff M, Schnable JC, Lyons E. FractBias: a graphical tool for assessing fractionation bias following polyploidy. Bioinformatics. 2016;btw666.
Heide O. Flowering Responses of Contrasting Ecotypes of Poa annua and their Putative Ancestors Poa infirma and Poa supina. Annals of Botany. 2001;87:795–804.
Williams LK, Shaw JD, Sindel BM, Wilson SC, Kristiansen P. Longevity, growth and community ecology of invasive Poa annua across environmental gradients in the subantarctic. Basic and Applied Ecology. 2018;29:20–31.
Law R, Bradshaw AD, Putwain PD. Life-History Variation in Poa annua. Evolution. 1977;31:233–46.
Shimizu‐Inatsugi R, Terada A, Hirose K, Kudoh H, Sese J, Shimizu KK. Plant adaptive radiation mediated by polyploid plasticity in transcriptomes. Molecular Ecology. 2017;26:193–207.
Flagel L, Udall J, Nettleton D, Wendel J. Duplicate gene expression in allopolyploid Gossypium reveals two temporally distinct phases of expression evolution. BMC Biol. 2008;6:16.
Sigel EM, Der JP, Windham MD, Pryer KM. Expression Level Dominance and Homeolog Expression Bias in Recurrent Origins of the Allopolyploid Fern Polypodium hesperium. American Fern Journal. 2019;109:224.
Bird KA, Niederhuth CE, Ou S, Gehan M, Pires JC, Xiong Z, et al. Replaying the evolutionary tape to investigate subgenome dominance in allopolyploid Brassica napus. New Phytol. 2021;230:354–71.
Koshy TK. Evolutionary Origin of Poa annua L. in the Light of Karyotypic Studies. Can J Genet Cytol. 1968;10:112–8.
Mowforth MA, Grime JP. Intra-Population Variation in Nuclear DNA Amount, Cell Size and Growth Rate in Poa annua L. Functional Ecology. 1989;3:289–95.
Schubert I, Lysak MA. Interpretation of karyotype evolution should consider chromosome structural constraints. Trends in Genetics. 2011;27:207–16.
Kent TV, Uzunović J, Wright SI. Coevolution between transposable elements and recombination. Philos Trans R Soc Lond B Biol Sci. 2017;372:20160458.
Beadle GW. A Possible Influence of the Spindle Fibre on Crossing-Over in Drosophila. Proc Natl Acad Sci U S A. 1932;18:160–5.
Baker WK. Crossing Over in Heterochromatin. The American Naturalist. 1958;92:59–60.
Si W, Yuan Y, Huang J, Zhang X, Zhang Y, Zhang Y, et al. Widely distributed hot and cold spots in meiotic recombination as shown by the sequencing of rice F2 plants. New Phytologist. 2015;206:1491–502.
Sammons RD, Gaines TA. Glyphosate resistance: state of knowledge. Pest Manag Sci. 2014;70:1367–77.
Gaines TA, Patterson EL, Neve P. Molecular mechanisms of adaptive evolution revealed by global selection for glyphosate resistance. New Phytol. 2019;223:1770–5.
Brunharo CADCG, Morran S, Martin K, Moretti ML, Hanson BD. EPSPS duplication and mutation involved in glyphosate resistance in the allotetraploid weed species Poa annua L. Pest Management Science. 2019;75:1663–70.
Sabot F, Schulman AH. Parasitism and the retrotransposon life cycle in plants: a hitchhiker’s guide to the genome. Heredity. 2006;97:381–8.
Hollister JD, Gaut BS. Epigenetic silencing of transposable elements: a trade-off between reduced transposition and deleterious effects on neighboring gene expression. Genome Res. 2009;19:1419–28.
Freeling M, Woodhouse MR, Subramaniam S, Turco G, Lisch D, Schnable JC. Fractionation mutagenesis and similar consequences of mechanisms removing dispensable or less-expressed DNA in plants. Current Opinion in Plant Biology. 2012;15:131–9.
Edger PP, Smith R, McKain MR, Cooley AM, Vallejo-Marin M, Yuan Y, et al. Subgenome Dominance in an Interspecific Hybrid, Synthetic Allopolyploid, and a 140-Year-Old Naturally Established Neo-Allopolyploid Monkeyflower. Plant Cell. 2017;29:2150–67.
Freeling M, Xu J, Woodhouse M, Lisch D. A Solution to the C-Value Paradox and the Function of Junk DNA: The Genome Balance Hypothesis. Molecular Plant. 2015;8:899–910.
Marçais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27:764–70.
Vurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, Gurtowski J, et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics. 2017;33:2202–4.
Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021;18:170–5.
Nurk S, Walenz BP, Rhie A, Vollger MR, Logsdon GA, Grothe R, et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 2020;30:1291–305.
Guan D, McCarthy SA, Wood J, Howe K, Wang Y, Durbin R. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics. 2020;36:2896–8.
Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–2.
Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol. 2019;20:257.
Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–100.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–9.
Hubley R, Finn RD, Clements J, Eddy SR, Jones TA, Bao W, et al. The Dfam database of repetitive DNA families. Nucleic Acids Research. 2016;44:D81–9.
Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences. 2020;117:9451–7.
Ellinghaus D, Kurtz S, Willhoeft U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics. 2008;9:18.
Ou S, Jiang N. LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant Physiology. 2018;176:1410–22.
Ou S, Jiang N. LTR_FINDER_parallel: parallelization of LTR_FINDER enabling rapid identification of long terminal repeat retrotransposons. Mobile DNA. 2019;10:48.
Ma J, Bennetzen JL. Rapid recent growth and divergence of rice nuclear genomes. Proc Natl Acad Sci USA. 2004;101:12404–10.
Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37:907–15.
Huerta-Cepas J, Szklarczyk D, Heller D, Hernández-Plaza A, Forslund SK, Cook H, et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Research. 2019;47:D309–14.
Negri T da C, Alves WAL, Bugatti PH, Saito PTM, Domingues DS, Paschoal AR. Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants. Briefings in Bioinformatics. 2019;20:682–9.
Tillich M, Lehwark P, Pellizzer T, Ulbricht-Jones ES, Fischer A, Bock R, et al. GeSeq – versatile and accurate annotation of organelle genomes. Nucleic Acids Research. 2017;45:W6–11.
Mitchell CC, Parkinson SE, Baker TJ, Jellen EN. C-Banding and Localization of 18S-5.8S-26S rDNA in Tall Oatgrass Species. Crop Science. 2003;43:32–6.
Hu G, Grover CE, Arick MA, Liu M, Peterson DG, Wendel JF. Homoeologous gene expression and co-expression network analyses and evolutionary inference in allopolyploids. Briefings in Bioinformatics. 2021;22:1819–35.
Kuo TCY, Hatakeyama M, Tameshige T, Shimizu KK, Sese J. Homeolog expression quantification methods for allopolyploids. Brief Bioinform. 2020;21:395–407.
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21.
Frith MC, Hamada M, Horton P. Parameters for accurate genome alignment. BMC Bioinformatics. 2010;11:80.
Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30:923–30.
Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 2013;8:1494–512.
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology. 2014;15:550.
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421.
Klopfenstein DV, Zhang L, Pedersen BS, Ramírez F, Warwick Vesztrocy A, Naldi A, et al. GOATOOLS: A Python library for Gene Ontology analyses. Sci Rep. 2018;8:10872.
Haug-Baltzell A, Stephens SA, Davey S, Scheidegger CE, Lyons E. SynMap2 and SynMap3D: web-based whole-genome synteny browsers. Bioinformatics. 2017;33:2197–8.
Wang Y, Tang H, DeBarry JD, Tan X, Li J, Wang X, et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Research. 2012;40:e49–e49.
Bandi VK. SynVisio: A Multiscale Tool to Explore Genomic Conservation. Thesis. University of Saskatchewan; 2020.
Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, et al. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 2012;40 Database issue:D1178-1186.
Talevich E, Shain AH, Botton T, Bastian BC. CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing. PLoS Comput Biol. 2016;12:e1004873.
Heller D, Vingron M. SVIM: structural variant identification using mapped long reads. Bioinformatics. 2019;35:2907–15.
Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017;35:1026–8.
Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Briefings in Bioinformatics. 2013;14:178–92.
Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–2.
Tay Fernandez CG, Marsh JI, Nestor BJ, Gill M, Golicz AA, Bayer PE, et al. An SGSGeneloss-Based Method for Constructing a Gene Presence-Absence Table Using Mosdepth. Methods Mol Biol. 2022;2512:73–80.

No competing interests reported.

PoaSupplementaryFiguresAndTable.pptx

Download PDF

Journal Publication

published 26 Jun, 2023

Read the published version in BMC Genomics →

Editorial decision: Major revision
08 May, 2023
Reviewers agreed at journal
11 Apr, 2023
Reviews received at journal
10 Apr, 2023
Reviewers agreed at journal
06 Apr, 2023
Reviewers agreed at journal
04 Apr, 2023
Reviewers invited by journal
31 Mar, 2023
Editor assigned by journal
31 Mar, 2023
Editor invited by journal
31 Mar, 2023
Submission checks completed at journal
31 Mar, 2023
First submitted to journal
23 Mar, 2023

You are reading this latest preprint version

Homoeologous evolution of the allotetraploid genome of Poa annua L.

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Results

Genome assembly and annotation

Genome Characteristics And Synteny

Nucleotide Divergence And Molecular Dating

Single-gene Duplications

Homoeologous Exchanges

Fractionation Bias

Homoeolog Expression And Subgenome Dominance

Whole-genome Resequencing And Large-scale Chromosomal Modifications

Variation Of Epsps

Conclusions

Materials And Methods

Collecting genomic and transcriptomic resources

Nuclear Genome Assembly

Chloroplast Genome Assembly

Repeat Masking And Ltr Insertion Times

Genome Annotation

Cytology

Rna-seq Expression Analysis And Homoeolog Expression Bias

Gene Ontology Enrichment Analysis

Comparative Genomics

Identification Of Homoeologous Exchanges

Declarations

References

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1