Genome assembly and annotation
The P. infirma (2n = 2x = 14) and P. supina (2n = 2x = 14) genomes each assembled into seven pseudomolecules that represented 96% of the estimated genome sizes by k-mer analysis and contained > 97% of the 1,614-core conserved orthologs in the Embryophyta OrthoDB (v10), supporting high-quality chromosome-level genome assemblies for both species (see methods; Supplementary Table 1; Supplementary Fig. 1). The chromosome-level assemblies represent the collapsed haploid (unphased) genomes for each species (n = 7). Chromosomes were named according to a pre-established nomenclature presented by Robbins et al., (2022), where P. infirma contributes the ‘A’ subgenome to P. annua and P. supina contributes the ‘B’ subgenome. A prefix designates the species of origin, such that P. infirma chromosomes are ‘PiA’, P. supina’s are ‘PsB’, and P. annua’s are either ‘PaA’ or ‘PaB’ (Supplementary Fig. 2a).
Repetitive DNA and transposable elements (TEs) were annotated using custom built repeat libraries and included class I retrotransposons as well as class II DNA transposons. Genes were predicted using the BRAKER2 pipeline on the repeat-masked genome assemblies (Brůna et a., 2021). Full-length IsoSeq transcripts from each species was incorporated with protein evidence from Arabidopsis and related grasses for ab initio gene prediction. In addition, we identified 14,743 long noncoding RNAs (lncRNAs) in the P. infirma genome and 13,963 in the P. supina genome. Poa annua contained approximately the additive number of lncRNAs as its diploid parents with fewer lncRNAs in the A (infirma) subgenome (14,394) and more lncRNAs in the B (supina) subgenome (15,057).
Genome Characteristics And Synteny
The P. infirma genome is 1,125 Mb in length, which makes it 489 Mb (1.77⋅) larger than the P. supina genome (636 Mb), despite being closely related species and sister taxa within the section Ochlopoa. Most (76%) of the excess in genome size is due to orthologous chromosomes 1 and 2 being a combined 374 Mb larger in the P. infirma genome (Fig. 2ab; Supplementary Fig. 3). The subgenomes of P. annua are similar in composition to the genomes of the diploid progenitors, with the A subgenome (1,116 Mb) being 1% shorter than the P. infirma genome and the B subgenome (662 Mb) being 4% larger than the P. supina genome (Supplementary Fig. 2c). It’s a similar story at the gene level, where the A (infirma) subgenome had 6% fewer genes than P. infirma (37,123 and 39,420, respectively), and the B (supina) subgenome had 4% more genes than P. supina (39,536 and 37,935 respectively). Overall, the P. annua reference genome is 99% of the length of its progenitor genomes and contains 99% of its parental genes, most of which (95%) are represented as colinear syntenic blocks (Fig. 2c; Supplementary Fig. 2c; Supplementary Fig. 4).
Poa infirma and P. supina chromosomes were 81% and 65% repetitive, respectively. These percentages amount to 489 Mb (1.77⋅) more repetitive DNA in P. infirma than P. supina, suggesting that TEs have played an outsized role in the disparate genome sizes between the two diploids, particularly on chromosomes 1 and 2. The majority of annotated repetitive sequences were classified as Gypsy and Copia long-terminal-repeat (LTR) retrotransposons (598 Mb (53%) of the P. infirma genome and 241 Mb (38%) of the P. supina genome). The sequence length of the non-repetitive portions in each diploid is very similar, totaling 211 Mb in the P. infirma genome and 225 Mb in the P. supina genome. The subgenomes of P. annua have slightly less repetitive DNA than their corresponding diploid progenitor genomes, with 7% less repetitive DNA in the A (infirma) subgenome and 2% less in the B (supina) subgenome.
Nucleotide Divergence And Molecular Dating
Genomic similarity can be assessed at the nucleotide level using measures of average nucleotide identity (ANI) and is a useful indicator of genetic divergence between sequence alignments. The ANI between P. infirma (A) and P. supina (B) orthologous chromosomes is 95%. The ANI when comparing P. annua chromosomes to their corresponding parental sequences was 98% (i.e., PaA to PiA alignments and PaB to PsB; Supplementary Fig. 2b). To estimate divergence and hybridization times, we calculated the synonymous substitutions rate (Ks) between homologous and homoeologous gene pairs. Gene pairs between P. infirma (A) and P. supina (B) have a peak Ks = 0.065 and was used to estimate the date the two species diverged from their common ancestor. Ks between P. annua’s A subgenome and P. infirma (and also P. annua’s B subgenome and P. supina) was very close to zero and used to estimate the date that the two progenitor diploids hybridized to form P. annua. With a Poaceae mutational rate of 5.76174 ⋅ 10− 9 substitutions per synonymous site per year (De La Torre et al., 2017), our Ks values suggest that the diploids diverged from their common ancestor 5.5–6.3 million years ago (Mya) and hybridized to form polyploid P. annua 0–600,000 years ago (Supplementary Fig. 5). The most recent of the ancestral whole-genome duplication (WGD) events in the Poaceae is rho (ρ) and pre-dates the divergence of the BOP (C3 grasses) and PACMAD (C4 grasses) grasses (McKain et al., 2016). Syntenic gene pairs from rho have a Ks = 1 in our Poa species and corresponds to a date of 87 Mya, which largely overlaps with the reported rho WGD date of 85–97 Mya and helps to corroborate our methodology (Supplementary Fig. 6; Clark and Donoghue et al., 2018).
To further evaluate the date of hybridization and explore the 1.7-fold difference in genome size between A and B, we examined the mutation rates between pairs of LTRs. LTRs multiply by escaping host silencing and ‘burst’ into activity for a short time before being re-silenced (McCue et al., 2013; Sigman and Slotkin, 2016). Repeats of an LTR are identical when inserted, owing to their copy-and-paste mode of transposition (vonHoldt et al., 2012). Mutations between an ancestral LTR and its transposed derivative is a reflection of its evolutionary divergence. Our analysis suggests that the A genome experienced a burst in proliferation of LTRs that climaxed ~ 340,000 years ago, while bursts of LTRs in the B genome occurred more recently, with peak rate of transposition dating back to ~ 50,000 years ago (Fig. 3a). Because the density of LTR insertion times in P. infirma and P. supina closely mirror that of P. annua’s A and B subgenomes, it is likely that those bursts occurred during the speciation of the diploids and prior to the hybridization event that formed P. annua. Thus, we suggest a narrower timeframe for P. annua hybridization at 0–50,000 years ago. We expect that the 489 Mb difference in TE content and genome size between P. infirma and P. supina is greatly impacted by the two species varying abilities to silence retrotransposons.
Single-gene Duplications
In addition to the WGD that formed P. annua, smaller scale duplications can also accompany polyploidy and are collectively referred to as single-gene duplications (Panchy et al., 2016). We identified 2,008 tandemly duplicated and 1,815 proximally duplicated genes in the P. infirma genome. These numbers are similar to P. supina with 1,940 tandem and 1,914 proximal duplications. As compared to its progenitor genomes, allotetraploid P. annua has slightly fewer single-gene duplications in the A (infirma) subgenome (1,806 tandem and 1,736 proximal duplicated genes), and slightly more in the B (supina) subgenome (1,999 tandem and 2,160 proximal). Transposed duplications are another type of single-gene duplication and are thought to occur extensively after WGD (Zhao X.P. et al., 1998; Freeling M. et al., 2009; Qiao et al., 2019). We used the progenitor P. infirma and P. supina genomes as outgroup to identify pairs of transposed genes that were mobilized after the diploids hybridized to form P. annua (post-polyploidy). We found 63% more transposed duplications in P. annua’s B subgenome than in P. annua’s A subgenome (5,917 and 3,438 transposed genes, respectively). This result is similar to the pattern observed with proximal and tandem duplications and may point to a post-polyploidy expansion of the B subgenome and contraction of the A subgenome within P. annua.
Interestingly, 74% of transposed duplications in the B subgenome remained within B, while 46% of A duplications remained within the A subgenome, suggesting that inter-subgenomic duplications preferentially move from the A (infirma) subgenome and integrate into B (supina; Fig. 3b; χ2 test, P < 0.0001). Inter-subgenomic transposed duplications are enriched for functions associated with Gypsy and Copia-type LTRs, suggesting that they are heavily involved with retrotransposon activity. Taken together with our molecular dating of LTRs, we expect that the observed bias in inter-subgenome transpositions is a reflection of the two subgenomes uneven abilities to silence retrotransposons and is a continuation of the TE momentum that was established during the independent evolution of the diploids. The observed bias in inter-subgenome transpositions may point to a trans effect, where retrotransposons diffuse from the subgenome with higher TE content to the subgenome with lower TE content.
Homoeologous Exchanges
Crossing over between ancestrally related chromosomes is a common occurrence in newly formed allopolyploids and are referred to as homoeologous exchanges (HEs; Gaeta and Pires, 2010; Mason and Wendel, 2020). We assessed HEs in P. annua (i.e., A segments in the B subgenome and B segments in the A subgenome) using the parental sequences as a guide to assign P. annua reads as either being derived from P. infirma or P. supina. We detected 1,299 homoeologous exchanges in the P. annua genome (Fig. 4; 657A segments in the B subgenome and 642 B segments in the A subgenome). Almost 2% of P. annua gene annotations are within HEs. Of those, 68% are A to B, suggesting that there may be an asymmetric exchange of genic sequence between the two subgenomes (823 A genes in the B subgenome vs 385 B genes in the A subgenome; χ2 test, P < 0.0001). The average length of an HE was 16 kb for A to B subgenome HEs and 13 kb for B to A subgenome HEs. Interestingly, 1.6% of the B subgenome consists of A sequences (10.4 Mb), while 0.7% of the A subgenome is B sequences (8.3 Mb). A to B HEs were most enriched for genes involved in gibberellin 3-beta-dioxygenase activity, while B to A HEs were enriched in genes involved in telomere maintenance. The largest HE is a 2.2 Mb Pa7A to Pa7B exchange containing 103 genes (Fig. 4c). Three of P. annua’s 26 annotated histone H3-K4 methylation genes reside in this 2.2 Mb HE. The differences in HEs between subgenomes points to a visible but tenuous bias accumulation of genes in the B subgenome.
Fractionation Bias
Gene loss (fractionation) occurs via intrachromosomal recombination resulting in short deletions and is a typical behavior of ancient allopolyploids (Cheng et al., 2018). We compared the A and B subgenomes of P. annua to the A and B genomes of its progenitors and identified consistent gene retention (97%) across all chromosomes, likely reflecting the recent timescale of the P. annua WGD event (Supplementary Fig. 7). Although this result seems to clash with our observations at the single-gene and HE levels, it is important to note the distinction between these methodologies. The fractionation analysis used here (Joyce et al., 2016) calculates the number of genes retained in P. annua with respect to the syntenic sequences in the progenitor genomes. Consequently, single-gene duplications would only impact our fractionation analysis if they had duplicated in the progenitor genome but not in P. annua. The impact of HEs on our fractionation analysis is relatively small, since there are only 1,208 genes within HEs and most (~ 61%) have an ancestrally syntenic ortholog in the homoeologous subgenome and therefore would not impact fractionation values.
Homoeolog Expression And Subgenome Dominance
P. annua is typically described as having two distinct biotypes; plants with wild-type morphology and plants with dwarf-type morphology (Fig. 1a; sometimes referred to as annual- and perennial-types, respectively; Heide, 2001). Plants with wild-type habit resemble P. infirma, while the dwarf-types more closely resemble P. supina. Broad phenotypic plasticity has been reported where environmental factors such as animal disturbance, intense wind, soil properties, temperature, elevation, and even golf course-style management can influence P. annua to preferentially favor one biotype over the other (Williams et al., 2018). The two opposed morphologies likely play an important role in P. annua’s ability to infiltrate and persist across a spectrum of climactic conditions (Law et al., 1976).
Shimizu-Inatsugi et al. (2017), introduced the ‘polyploid plasticity hypothesis’ stating that an allopolyploid species might differentially utilize the expression profiles of its progenitor genomes depending on the environment. With agronomic and turfgrass breeding in mind, we aimed to test the hypothesis that P. annua might preferentially express genes from the B (supina) subgenome when exposed to mowing stress and the A (infirma) subgenome when allowed to grow in the absence of mowing stress (unmowed). We vegetatively propagated dwarf- and wild-type P. annua plants and subjected one clone to mowing stress for three months, while leaving the other clone unmowed for three months. We observed no correlation in the expression profiles between biotypes (dwarf or wild) across our biological replicates (Supplementary Fig. 8; Supplementary Fig. 9), indicating that dwarf- and wild-types exhibit similar transcriptional behavior under both mowed and unmowed conditions. After removing biotypes as a variable, we identified 5,505 and 6,400 differentially expressed pairs of homoeologs in our unmowed and mowed comparisons, respectively. We found that both mowed and unmowed plants showed a homoeolog expression bias favoring the B subgenome (Wilcoxon test: p = 0.001 and p = 0.0008, respectively), indicating that P. annua preferentially utilizes B (supina) genes regardless of mowing stress (Fig. 5). Although P. annua’s B subgenome expression bias is statistically significant in both treatment comparisons, the bias is not as evident as reported in other neo-allopolyploids (Flagel et al., 2008; Edger et al., 2017; Sigel et al., 2019; Bird et al., 2020), likely reflecting the recent timescale of the hybridization but perhaps also pointing to a more equitable relationship between P. annua’s subgenomes where primary metabolic function is partitioned across pairs of homoeologs (Supplementary Fig. 10). Only chromosomes one, four, and six showed consistent expression bias toward B homoeologs, suggesting that these three chromosomes contribute disproportionally to homoeolog expression bias at the whole-genome level (Fig. 5). Thus, we conclude that counter to the polyploid plasticity hypothesis, P. annua utilizes genes from both subgenomes with modest homoeolog expression bias favoring B (supina) genes irrespective of our environmental treatments.
In addition to homoeolog expression analysis, we also used our transcriptional data to compare gene expression between pairs of recently transposed gene duplications that were identified during our analysis of single-gene duplications. We identified, 973 pairs of transposed genes as being differentially regulated between their novel and ancestral copies. Of those, 847 (87%) were upregulated in the novel copy and most were A to B transpositions.
Whole-genome Resequencing And Large-scale Chromosomal Modifications
Homoeologous exchanges and bursts of activity in transposable elements contribute to genomic instability in polyploids but do not provide a satisfying explanation for the reported 80% variation in DNA content between P. annua ecotypes (Koshy, 1968; Mowforth and Grime, 1989). To explore intraspecific variation in P. annua at the whole-chromosome and DNA sequence level, we re-sequenced 13 geographically distinct genotypes and two additional elite breeding genotypes. Together, the 15 samples represent nine countries and four continents (Supplementary Fig. 11). The Illumina reads were aligned to the P. annua reference genome with a depth of coverage ranging between 13–26⋅. More than 99% of all reads mapped to the P. annua reference genome. SNP density across a 1Mb sliding window showed large variability in sequence divergence within subgenomes, suggesting that there may have been multiple hybrid origins (Supplementary Fig. 12). Of the 76,541 gene annotations in the reference, we found that 7,808 were absent (dispensable) from at least one of the 15 samples, leaving 68,733 ‘core’ genes approximately evenly split between subgenomes (Supplementary Fig. 13; 52% of core genes were from the B subgenome). Dispensable genes were enriched for function in RNA-mediated transposon integration, suggesting that retrotransposons are actively proliferating in the species in a genotype-specific manner. In addition to core and dispensable genes, we used the diploid genomes to identify HEs and determine the parental origin for P. annua homoeologs across all 15 samples. There were 5,217 genes within HEs in at least one sample. Most (60%) genes within HEs were transferred from A to B subgenome. A to B HEs were enriched for functions associated with primary metabolism, while B to A HEs were enriched for functions associated with telomere maintenance, continuing to point toward a biased accumulation of genes in the B (supina) subgenome (Fig. 4b; χ2 test, P < 0.0001).
Reads mapped to the P. annua reference genome (and diploid progenitor genomes) provide a view of structural modifications at the whole-chromosome level. Using this approach, we identified striking variation in chromosome structure post-polyploidization. The largest is a 224 Mb deletion in the centromeric and pericentromeric region of chromosome 1A in some samples that amounts to 70% of the length of the reference chromosome (Fig. 6; Fig. 7; Supplementary Fig. 14). Coinciding with the deletion at 1A is a 32 Mb duplication at chromosome 1B. Split reads and improperly paired reads at the deletion and duplication breakpoints suggest that the duplicated region at 1B resides within the deleted region of 1A, and indeed, capillary electrophoresis using homoeolog-specific markers across the chromosomal breakpoint confirms this to be the case (Supplementary Fig. 15). The 1B duplication contains the highest density of LTRs across the chromosome, suggesting that the rearrangement most likely spans the centromere (Fig. 7b). There are 1,996 annotated genes and 133 functional enrichments (mostly transposon-associated categories) in the 224 Mb centromeric deletion. The 32 Mb homoeologous centromere brings 1,321 genes back and all but four of the functionally enriched categories.
Perhaps the most parsimonious path to this karyotype involves meiotic error, where 1A and 1B form a quadrivalent and adjacent disjunction leads to two 1A’s going to one pole and two 1B’s going to the other. When fertilized by a normal nucleus, the resulting offspring would be 1A1B1B1B (or 1A1A1A1B). Subsequent generations would lead to introgression of 1A at recombination sites, which would cause most of the genic regions of the displacing 1B chromosome to return to a 1A-like state. Alternatively, it is possible that dysploidy and Robertsonian rearrangements (fusion-fission) played an intermediate role, where again, introgression back to the population resulted in the observed karyotype (Schubert and Lysak, 2011). To our knowledge, cytological studies have not recorded any evidence of dysploidy in P. annua.
Chromosome 1A has more repetitive DNA (90%) than any other chromosome in the P. annua reference genome, which likely plays a role in the observed restructuring in some ecotypes (Kent et al., 2017). Most (99%) of the 224 Mb deleted region is low-complexity repetitive sequence, indicating that it would likely be wound into pericentromeric heterochromatin and suppressed from meiotic recombination (Beadle, 1932; Baker, 1958; Si et al., 2015). Fittingly, rearrangements at 1A appear to reside at the periphery of heterochromatic sequences (Fig. 7b). Large-scale chromosomal rearrangements in P. annua occur in intragenic recombination ‘hotspots’, where individual genotypes display some variability in the exact nucleotide coordinates of the breakpoint but are often within several kilobases of each other (Supplementary Fig. 16; Supplementary Fig. 17). Some individuals appear to contain large-scale variability between their parental haplotypes. For example, sample ‘Ohio’ has a copy of 1A that resembles the P. annua reference genome, while the other haplotype contains the 224/32 Mb centromeric displacement discussed above (Fig. 8; Supplementary Fig. 15). Such variability between haplotypes is surprising given that homologous chromosomes recognize each other by sequence similarity and incorrect pairing could lead to multivalents, which are associated with improper segregation and reduced fertility.
Variation Of Epsps
P. annua is most commonly known as a noxious weed. It can be managed with both pre- and post-emergent herbicides, but repeated application has resulted in the evolution of multiple herbicide resistance pathways (Sammons & Gaines, 2014; Gaines et al., 2019). Glyphosate resistance has been particularly problematic for managers of P. annua. Glyphosate works by inhibiting the enzyme 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) in the shikimate pathway. Resistant P. annua’s have been reported to have increased EPSPS copy number variation and missense mutations (Bruhharo et al., 2019). The A subgenome EPSPS gene is 3,891 bp in length, while the B subgenome copy is 9,092 bp. We identify large genotypic variation in the structure of EPSPS homoeologs, particularly in the B subgenome, where nearly 6 kb is deleted from the two largest introns in some genotypes (Supplementary Fig. 18). We do not see evidence of copy number variation in our resequencing data, but that is likely because none of our genotypes originated from a population with known herbicide resistance.