Nigella is characterized by relatively large genomes
According to flow cytometric estimation of the DNA content, N. orientalis has the largest genome with 12.44 Gbp/1C among the 7 species, followed by N. damascena and N. sativa with 11.72 Gbp/1C and 11.83 Gbp/1C, respectively (Table 1, Fig. 1). The values of the latter two species were quite similar to the previous estimations based on Feulgen densitometry (10.30 Gbp/1C and 10.58 Gbp/1C for N. damascena (Evans et al. 1972a; Evans et al. 1972b; Olszewska and Osiecka 1983) and 10.39 Gbp/1C for N. sativa (Bennett and Smith 1976). The slight differences might be explained by the different methods used (Feulgen densitometry versus flow cytometry) and/or the different reference standards used (P. sativum versus Allium cepa). N. hispanica (8732 Mbp/1C), N. arvensis (7851 Mbp/1C), N. integrifolia (7443 Mbp/1C) and N. bucharica (7398 Mbp/1C) have considerably smaller genomes than those of the other three species mentioned above.
Two different karyotypes are prevailing in Nigella
To understand the chromosomal evolution in the genus Nigella, the karyotypes of the seven species were analyzed. Their chromosomes were mainly metacentric with one or two telocentric chromosome pairs in each species. Based on their basic chromosome number, the seven species can be classied into two groups. The first group, comprising N. arvensis, N. damascena, N. hispanica, N. orientalis and N. sativa, has a basic chromosome number of x = 6 with a karyotype formula of 10m + 2t. N. bucharica and N. integrifolia belong to the second group with a basic chromosome number of x = 7 and a karyotype formula of 8m + 2st + 4t. All these species fell into the 2A category of Stebbin’s asymmetry indices (Stebbins 1971) (Fig. 1). The size of metacentric, mitotic metaphase chromosomes ranged from 5.99 µm (N. bucharica) to 10.14 µm (N. damascena) and the telocentric chromosome size ranged from 4.04 µm (N. bucharica) to 5.51 µm (N. damascena). N. bucharica and N. integrifolia also have a pair of subtelocentric chromosomes with a size range from 4.54 to 4.71 µm. The total metaphase chromosome length was between 38.21 µm in N. bucharica and 53.17 µm in N. damascena (Table S5).
The number of rDNA loci varies severely between the species
The proportions of 45S rDNA repeats in N. sativa, N. damascena and N. bucharica genomes were 0.72%, 0.38% and 0.65%, respectively, while the 5S rDNA proportions were 0.03%, 0.01% and 0.09% as determined by repeatExplorer analysis (Table 2). The consensus monomers of the identified rDNA sequences in N. sativa, N. damascena and N. bucharica identified by TAREAN are listed in Table S6. To determine the karyotype evolution among the seven Nigella species, FISH mapping of 45S and 5S rDNA loci on mitotic chromosomes was performed (Fig. 2). FISH of both ribosomal probes revealed a considerable intraspecific variation regarding the number and position of rDNA loci (Fig. 1 and 2). While three 45S rDNA-positive chromosome pairs were observed in N. sativa, N. orientalis, N. integrifolia and N. bucharica (Fig. 2A, 2B, 2C and 2D), four pairs of 45S rDNA loci were present in N. damascena and N. arvensis (Fig. 2E and 2F). N. hispanica revealed ten 45S rDNA loci, the highest number among the investigated species. Each chromosome displayed at least one ribosomal locus. Interestingly, one of the 45S rDNA sites in N. hispanica did not show a signal on its corresponding homologous chromosome representing hemizygosity (Fig. 1 and 2G). While in N. sativa, N. arvensis, N. hispanica and N. damascena45S rDNA loci were found on metacentric and telocentric chromosomes, they were exclusively found on metacentric chromosomes in N. orientalis or on submetacentric and telocentric chromosomes in N. bucharica and N. integrifolia (Fig. 2).The 5S rDNAwas found on one (N. integrifolia, N. bucharica, N. arvensis and N. hispanica) (Fig. 2C, 2D, 2F and 2G), two (N. sativa and N. damascena) (Fig. 2A and 2E) or three (N. orientalis) (Fig. 2B) chromosome pairs. 45S rDNA loci are located mainly either in distal or proximal regions of the chromosome arms, while 5S rDNA arrays were also found interstitially. The size of hybridization signals varied between chromosome pairs both within and between species (Fig. 2).
Molecular phylogenetic analysis of ITS and rbcL sequences correlate with the basic chromosome number
To determine the phylogenetic relationship among the analyzed Nigella species, the sequences of nuclear ribosomal internal transcribed spacer (ITS) and rbcL gene were used. The sequence length of ITS (ITS1-5.8S-ITS2) varied from 732 to 759 bp (Table S3), whereas rbcL sequences ranged from 871 to 1428 bp (Table S4) in the seven Nigella species. Aconitum carmichaelii, a distantly related species belonging to the same family, was used as an outgroup, and the resulting consensus had high bootstrap support values (Fig. 1). N. bucharica and N. integrifolia formed a robust cluster with 100% bootstrap support, and both of them have a basic chromosome number of n = 7. The other cluster included N. sativa, N. arvensis and N. hispanica (with 84% bootstrap support), to which N. damascena and N. orientalis were jointed with lower support. All members of this cluster possess a chromosome number of n = 6.
The clustering of the seven Nigella species based on molecular phylogenetic analysis correlates with their basic chromosome number (n = 6 or 7). The phylogenetically close N. integrifolia and N. bucharica have the same chromosome number, similar genome size and rDNA-based karyotypes. Nevertheless, among the other five species, despite the same chromosome number, their genome size, the number and chromosomal distribution of rNDA loci are diverse.
Retroelements are the dominating repeat type in Nigella while satellite sequences are rare
Low-pass sequencing of N. sativa, N. damascena and N. bucharica genomes resulted in 4,232,251, 7,553,644 and 15,352,348 Illumina 150 bp paired-end reads corresponding to 0.24×, 0.43× and 1.5× genome coverage, respectively. The GC content for N. sativa and N. damascena genomes showed a value of 38%, while this value was 42% for N. bucharica. The repeat compositions were inferred from the paired-end reads corresponding to approximately ~0.2× of the genome for each analyzed species. The proportions of individual repeat types are presented in Table 2. About 57,52%, 59.01% and 64.73% of N. sativa, N. damascena and N. bucharica genomesare composed ofhigh- or moderate-copy repeats, respectively. The majority of the repeats are retroelements, 47.91% in N. sativa, 39.47% in N. damascena, and 51.25% in N. bucharica, followed by unclassified repeats (6.95, 17.3 and 10.1%) and tandem repeats (0.75, 0.39 and 0.74% of rDNAs and 0.75, 1.21 and 1.45% of satellites). Among the retroelements, LTR retroelements are the most abundant in the N. sativa (47.91%), N. damascena (39.47%) and N. bucharica (51.25%) genomes. LTRs in N. sativa include Ty3-gypsy and Ty1-copia super families with a proportion of 44.01% and 3.76% in the genome, respectively, while they compose 37.77% and 1.64% in N. damascena and 48.55% and 2.58% in N. bucharica. A major part (34.89% in N. sativa, 26.19% in N. damascena and 30.99% in N. bucharica) of Ty3-gypsy belongs to the retrotransposon chromoviral Tekay clade (Table 2). In contrast, DNA transposons contribute to only 1.13%, 0.62% and 1.19% of the N. sativa, N. damascena and N. bucharica genomes, respectively, and only three common DNA transposons, EnSpm_CACTA, MuDR_Mutator and PIF_Harbinger, were identified. EnSpm_CACTA composes 0.72% of the N. sativa genome, but its proportion was much lower in N. damascena (0.24%) and N. bucharica (0.16%). MuDR_Mutator comprises about 0.32% of genomes of N. sativa, 0.34% of N. damascena and 0.71% of N. bucharica genome. Also, PIF_Harbinger composes 0.32% of the N. bucharica genome, but its proportion was lower than 0.1% in N. sativa (0.06%) and N. damascena (0.04%). The DNA transposon hAT was only detected in N. sativa (Table 2).
To compare the repeat compositions between N. sative, N. damascena and N. bucharica genome, a comparative clustering analysis was performed. About a quarter of the top clusters (Fig. 3) are shared between the species. Not all of these clusters had similar abundance in the genomes. Out of the in total 272 major repeat clusters, only 16 clusters (5.88 %) were relatively evenly shared between the three genomes, and they were annotated as Ty1_copia-TAR and Tork, Ty3_gypsy-Athila, DNA transposon-EnSpm CACTA and rDNAs (Fig. 3).Up to 97 clusters (35.66 %) were almost N. bucharica specific, and shared clusters between N. bucharica and either N. damascena or N. sativa was barely detectable. N. damascena and N. sativa contributed to 123 and 120 clusters, respectively, of which 77 clusters were shared between the two genomes, whereas 61 and 37 of them were highly enriched or specific to N. damascena and N. sativa, respectively. The comparative analysis demonstrated that N. bucharica is relatively more distinct from N. damascena and N. sativa. This result is in line with their phylogenetic relationships inferred based on ITS and rbcL sequences. The monomer length and cluster proportion of satellites and high copy retrotransposons identified by TAREAN is listed in Table S7. Most of retrotransposons were common between N. sativa and N. damascena, (Table S7).
(Peri-)centromeric satellites reflect the phylogenetic relationship in Nigella
The application of the TAREAN pipeline (Novák et al. 2017) allowed the identification of repeat clusters in Nigella species. Two satellite repeats in N. sativa, i.e. Ns-Sat1 (CL21) and Ns-Sat2 (CL144) representing 0.52% and 0.013% of thegenome, respectively (Table 3). Only one satellite repeat, named Nd-Sat1 (CL23), was identified in N. damascena which corresponds to 0.52% of the genome. On the other hand, four satellite repeats, Nb-Sat1 (CL21), Nb-Sat2 (CL129), Nb-Sat3 (CL64) and Nb-Sat4 (CL144), were identified in N. bucharica, representing 0.86%, 0.077%, 0.42% and 0.034% of the genome, respectively. All these repeats represented satellite-typical globular graph layouts, and their consensus monomer sequences are available in Table S2. The monomers of Ns-Sat-1, Ns-Sat-2 and Nd-Sat-1 are all 178 bp in length and AT-rich (e.g. 68% AT for Ns-Sat1) (Table 3 and Figure S3). Their sequence similarity ranged from 78.8% (between Ns-Sat1 and Nd-Sat1) to 71.8% (between Ns-Sat1 and Ns-Sat2). In addition, the monomer sequence of the Ty3_gypsy LTR-annotated retrotransposon, Ns-CL6, was reconstructed. To determine the chromosomal distribution of the identified repeats, the corresponding sequences were PCR amplified using the respective primers and labeled as FISH probes (Table 3).
After FISH, all metaphase chromosomes of N. sativa revealed centromeric Ns-Sat1 signals while Ns-Sat2 localized in the (peri)centromeric regions of only chromosomes 1 and 4 (Fig. 4A,B). The Ns-CL6 probe which is a Ty3_gypsy LTR retrotransposon, resulted in evenly distributed signals, although with a lower density toward the distal chromosome regions (Fig. 4A). Nd-Sat1-specific signals were found in the centromeric regions of all N. damascena chromosomes (Fig. 4C). Ns-Sat1 also cross-hybridized to the centromeres of N. arvensis and N. hispanica (Fig. 4D and 4E). The Nd-Sat1 of N. damascena also cross-hybridized to the centromeres of N. orientalis (Fig. 4F). None of the Nd-Sat1, Ns-Sat1 and Ns-Sat2 probes cross-hybridized with N. integrifolia or N. bucharica. The observed clustering of centromeric repeats at one pole of the nuclei indicates a Rabl-like chromosome configuration in interphase nuclei of Nigella (Fig. 4A and E).
The observed hybridization signals of all four Nb satellite probes showed similar intensities, locations and numbers in N. bucharica and N. integrifolia (Fig. 1). Nb-Sat1 seems to co-localize with the 45S-rDNA loci, since it is found in terminal positions on the short arms of the two telocentric and the submetacentric chromosomes. Nb-Sat2 revealed signals on the distal ends of the long arm of the telocentric chromosomes (Fig. 4G), while Nb-Sat3 and Nb-Sat4 showed signals in centromeric positions of all chromosomes (Fig. 1 and Fig. 4H). It seems that at least in some of the chromosomes Nb-Sat3 is extended toward the inner part of the centomers compared with Nb-Sat4 (arrows in Fig 4H). FISH with the Arabidopsis-like telomere repeat (TTTAGGG)n detected corresponding signals exclusively at both ends of all Nigella chromosomes (Fig. 4I).
The genome-wide repetitive analysis in the three Nigella species indicated that retroelements, especially Ty_gypsy LTRs, are the main contributors the relatively large genomes of Nigella. On the contrary, the abundance and diversity of satellite DNAs is relatively low. Most of these satellites locate at (peri)centromeric regions. The (peri)centromeric satellite repeats of N. sativa (Ns-Sat1), N. damascena (Nd-Sat1) and N. bucharica (Nb-Sat3 and Nb-Sat4) are higly distinct and cross-hybridize only to the closely related genomes as indicated in Fig. 1.