Inference of ancestral diploid karyotype of C. sativa
Parsimony-based phylogenomic analysis can help understand karyotype evolution. To understand the evolutionary trajectories of ADK before divergence of three C. sativa sub-genomes, we analyzed the syntenic conservation and chromosome repatterning between the genomes of the ancestor of lineage Ⅰ and C. sativa. Here, we take the A. lyrata genome as the reference of ancestral genome of lineage Ⅰ for the sake of the high similarity between their karyotype. By searching homologous genes between them, we drew homologous gene dot-plots (Figs. 2 and 3), which showed orthologous correspondence between ancestral genomes of lineage Ⅰ and C. sativa genomes.
In the homologous gene dot-plots of the two genomes, produced directly by using BLASTP hits and further highlighted by using inferred collinear genes, every chromosome in the ancestral genome has three homoeologous chromosomes or groups of homoeologous chromosome regions in C. sativa genome. We found that 5 ACK chromosomes had nearly perfect orthologous correspondence with at least one or more complete chromosomes in C. sativa (Fig. 3a, b, c, d and e), showing that the integrity of each of these 5 chromosomes in ADK (correspondingly defined as ADK chromosomes 1, 2, 5, 6, 7) inherited the chromosome structure of ACK (AK chromosomes 1, 3, 6, 7, 8) without prominent DNA rearrangements. This means that during the formation process of ADK, 5 ADK chromosomes nearly perfectly retained the chromosome structures of the corresponding ACK ones.
Notably, orthologous correspondence between AK2, 4, 5 and Cs4, 16 (Cs-G1) is nearly the same as that between AK2, 4, 5 and Cs6, 7 (Cs-G2) (Fig. 3g and h), indicating that Cs-G1 and Cs-G2 shared two ancestral chromosomes, which majorly formed through RTA and EEJ between AK2, 4, 5. By searching synteny chains between A. lyrate and C. sativa genomes, we further inferred the crossing-over positions between chromosomes (AK4, 5) are respectively between gene AL482377 (Corresponding C. sativa ortholog: Csa16g006880.1) and AL321151 (Csa04g046610.1) in AK4, and that between gene AL486375 (Csa04g046590.1) and AL486377 (Csa16g006870.1) in AK5. Actually, two trajectories could explain the evolution of these chromosomes. One of the complex evolutionary processes could occur as follows: AK2 and AK4 crossed over near one telomere of each of them, resulting in EEJ to produce AK2/4 and formation of a satellite chromosome of two telomeres (and possibly little DNA); then cross-over between AK5 and neo-AK2/4, which experienced one extra translocation and pericentric inversion, resulting in RTA between the two chromosomes to produce AK5/4 (ADK3) and ADK2/4/5 (ADK4) (Fig. 4c). An alternative trajectory could occur as follows: a cross-over between AK4 and AK5 resulted in reciprocal translocation of arms (RTA) to produce AK5/4, forming ADK3, and intermediate AK4/5. Then, AK4/5 and AK2 crossed over near one telomere of each of them, resulting in chromosome end–end joining (EEJ) to produce AK2/4/5 and likely formation of a satellite chromosome by two telomeres (and possibly little DNA). The neo-chromosome AK2/4/5 experienced one extra translocation and pericentric inversion to form ADK4 (Fig. 4d). No matter which trajectory was the actual one, the satellite chromosome likely produced was lost, eventually reducing the chromosome number from 8 in ACK to 7 in ADK.
Orthologous correspondence between AK2, 4, 5 and Cs5, 9 (Cs-G3) (Fig. 3i) is much different from that between AK2, 4, 5 and Cs4, 16 (Cs-G1) or Cs6, 7 (Cs-G2), showing that Cs5, 9 has particular structures not shared with the other two sets of chromosomes (Cs4, 16 and Cs6, 7). It seems that Cs-G3 does not share the two ancestral chromosomes (ADK3, 4) with Cs-G1 and Cs-G2. However, orthologous correspondence between Cs4, 16 or Cs6, 7 and Cs5, 9 (Fig. 3j and k), showing that Cs5, 9 are majorly formed by RTA between ADK3 and ADK 4 (Fig. 5). By searching synteny chains between A. lyrate and C. sativa genomes, we further inferred the crossing-over positions between chromosomes (ADK3, 4) are respectively between gene AL486375 (Corresponding C. sativa ortholog: Csa04g046590.1) and AL321151 (Csa04g046610.1) in ADK3 (where chromosome arms of AK4, 5 combined), and that between gene AL476152 (Csa09g071500.1) and AL926342 (Csa09g071510.1) in ADK4. This finding provides a clear evidence to support that the Cs-G3 actually inherited karyotype structures of the two ancestral chromosomes (ADK3, 4), which are shared with Cs-G1 and Cs-G2.
Inferring evolutionary trajectories from ADK to extant C. sativa karyotype
Chromosome structure can help understand phylogenomic relationship. In homologous gene dot-plots, orthologous correspondence between AK7 (ADK6) and Cs10, 11, 12 (Fig. 3d) suggested that one paracentric inversion is common to Cs10 and Cs 11, from Cs-G1 and Cs-G2, respectively, but not in chromosome Cs12 from Cs-G3. It suggested that Cs-G1 and Cs-G2 are not directly diverged from ADK, but share a common ancestor with one paracentric inversion as compared to ADK6.
The formation process of the three sub-genomes and C. sativa genomes can occur as follows: ancestral diploid of C. sativa differentiate into species A and B firstly, then species A differentiate into species C and D after one paracentric inversion occurred in ADK6 (Fig. 5). Cross-over between ADK6 and ADK7 near one telomere of each chromosome in species C, resulting in chromosome end–end joining (EEJ) to produce ADK6/7 and formation of a satellite chromosome of two telomeres and little DNA. ADK5 in species D experienced one paracentric inversion independently (Fig. 3c and 5). Cross-over between ADK3 and ADK4 in species B, which experienced one translocation, resulting in reciprocal translocation of arms (RTA) to produce ADK3/4 and ADK4/3, which experienced one pericentric inversion (Fig. 3j, k and 5). RTA between ADK5 and ADK7 in species B occurred to produce ADK5/7 and ADK7/5 (Fig. 3j and 5). The crossing-over positions between chromosomes (ADK5, 7) are respectively between gene AL489681 (Corresponding C. sativa ortholog: Csa20g058860.1) and AL351869 (Csa02g002270.1) in ADK5 (the region where the centromere of ADK5 is located), and that between gene AL494932 (Csa20g041660.1) and AL494934 (Csa02g033470.1) in ADK7 (the region where the centromere of ADK7 is located). An initial hybridization event between species C (Cs-G1) and D (Cs-G2), resulting in a tetraploid genome, followed by an additional hybridization event between the tetraploid genome and species B (Cs-G3), eventually forming the extant hexaploid genome of C. sativa [19] (Fig. 5).
During the formation of the karyotype of C. sativa, 14 chromosomes of C. sativa inherited the chromosome structures of ADK ones. While one paracentric inversion occurred in Cs-G2 to produce one new chromosome, two RTAs occurred in Cs-G3 with one translocation and pericentric inversion to produce four new chromosomes. EEJ occurred in Cs-G1 to produce one new chromosome and one satellite chromosome. The loss of the satellite chromosomes resulted in the chromosome number reduction from 21 to 20.