Genotyping, sequencing and assembly of diverse cytoplasmic DNA haplotypes centering on Brassica genus
To distinguish the cytoplasmic DNA (cpDNA and mtDNA) haplotypes within Brassica genus, genotyping analysis through High Resolution Melting (HRM) method were performed within our germplasm collection of Brassica crops (Figure S1). Primers were designed being targeted on a set of intra/inter-specific cpDNA polymorphic sites that are identified previously [16] (Table S1). Three major haplotypes were identified in approximately 480 worldwide B. napus accessions. Two major cpDNA haplotypes were identified in 180 B. rapa accessions, while 180 B. juncea accessions seem to contain one major cpDNA haplotype. B. oleracea, B. carinata, B. nigra, B. maurorum (MM, 2n = 16), certain wild C-genome relatives and three B. napus cytoplasmic male sterility (CMS) lines were directly treated as each with a distinct haplotype for subsequent genome sequencing. B. cretica, B. incana, B. insularis and B. villosa represent the wild C-genome relatives. Polima [23, 24], nsa [25] and mori [26, 27] are the CMS lines. Certain relative materials, i.e. Eruca sativa (2n = 22), Raphanus sativus (2n = 18), Sinapis arvensis (2n = 24) and Moricandia arvensis (2n = 28), were also included to enrich this study (Table S2).
Cytoplasmic DNA was synchronously isolated from 72 accessions that represent for the major cytoplasmic haplotypes and morphological varieties (Table S2), using an optimized organelle isolation procedure (Materials and Methods), which can substantially remove nuclei and balance the proportion of cpDNA and mtDNA content. Reads mapping analysis demonstrated that the isolated total DNA contains an average ratio of 37.2% chloroplast DNA and 3.4% mitochondrial DNA, respectively, approximately 5-10 times higher than the ratio of cytoplasmic DNA in the total leaf DNA [28]. The cytoplasmic DNA mixture was then subjected to high-throughput sequencing with high depth (> 500x, Table 1). The obtained paired-end reads (150 bp) were directly mapped to a tandem sequence gather, which consists of 10 published chloroplast genome sequences all over Brassicaceae family. The mapped reads were extracted and de novo assembled by SOAPdenovo software package [29]. Generally, two or three large contigs were eventually generated for chloroplast genomes. Gaps were directly filled through manual jointing of the overlapping ends between each two contiguous contigs, and then verified by Sanger sequencing of the gap-spanning PCR fragments. All the obtained chloroplast genome sequences are provided in Additional file 3 (Appendix A).
General information of the genome-wide cpDNA and mtDNA variants in Brassica
The chloroplast and mitochondrial genome sequences of a B. napus strain 51218 [22] (GenBank: KP161617.1), which is an intermediate breeding material of nap mitotype, were respectively used as references to call the overall cpDNA and mtDNA basic variants for all the sequenced materials. The calling was conducted by standard BWA/Genome Analysis Toolkit (GATK) pipeline with manual inspection [30], and further randomly verified by Kompetitive Allele Specific PCR (KASP) analysis. A total of approximately 4700 reliable basic polymorphic sites, including 3880 SNP and 820 InDels, were identified for all the sequenced chloroplast haplotypes in Brassica genus. While, approximately 3400 polymorphic sites (2700 SNP and 700 InDels) were identified for the mitochondrial haplotypes (Table S3). The average SNP density in the whole chloroplast and mitochondrial genomes was 25 and 12 SNPs per kilo base (kb), respectively. The chloroplast variants were uniformly distributed along the reference genome, except the two 26-kb large inverted repeat regions, IRa and IRb (Figure 1). The mitochondrial variants showed a comparatively even distribution pattern along the reference genome; however, the variation frequencies are obviously much higher at the regions containing the open reading frame (ORF) genes (Figure 2).
Among the overall variants, 13.9 % and 18.1 % of them were identified as nonsynonymous for 47 cpDNA coding genes and 61 mtDNA coding genes, respectively. The materials of two B. napus mitochondrial haplotypes, below known as cam- and polima-types, possess approximately 300 basic variants when referring to the typical nap-type B. napus strain 51218 mitochondrial genome. Polima-type is close to cam-type with a difference of only about 50 conserved cpDNA variants (Table S3). Consistent difference patterns were also found for cpDNA variants as for the three cytoplasmic types. KASP analysis using the primers targeted to the B. napus mitotype-corresponding mtDNA and cpDNA polymorphic sites showed that nap, cam and polima cytoplasms accounted for 87.1%, 7.2% and 5.7% in the investigated B. napus population (Figure S2). Undoubtedly, nap-type is the predominant cytoplasmic DNA haplotype, as identified in previous studies [15, 16]. Most of the B. rapa materials are of the same cam-type in B. napus, another major haplotype accounting for a frequency of approximately 5.8% in the investigated B. rapa population have been identified and named as sarson-type hereinafter, since it mainly exists in B. rapa var. sarson accessions.
Intra-genus phylogeny within Brassica genus based on whole chloroplast genomes
Analyses based on whole chloroplast genomes or genome-wide variations instead of partial cpDNA fragments can infer a phylogeny that is much closer to the natural truth and with high resolution and reliability, even at lower taxonomic levels [14]. To forecast the evolutionary trajectories of Brassica crops, all the above-obtained whole chloroplast genomes were subjected to phylogenetic analysis. The phylogenetic trees tentatively conducted using the Maximum Likelihood method, neighbor-joining method and Bayesian method were almost identical. The trees comprising materials throughout each intra-species, Brassica genus and Brassicaceae family, respectively, were conducted stepwise by Maximum Likelihood method [31].
Chloroplast genome sequences of Raphanus sativus, Isatis tinctoria, Matthiola incana and Arabidopsis thaliana in Brassicaceae family (Data from NCBI, Additional file 3) served as outgroup to root the intra-specific trees. The results indicated that 13 B. rapa accessions, 14 B. juncea accessions, 24 B. napus accessions and 13 C-genome species each clustered well and were separately integrated into a species-specific group. The B. rapa separated a little branch containing only two accessions, which were classified as sarson-type cytoplasm (Figure S3). The B. juncea accessions did not diverge any secondary branches, indicating a lack of cytoplasmic genetic diversity (Figure S4). The B. napus cluster were split into two large branches, one branch containing the nap-type lines (e.g., the nuclear-genome sequenced cultivars Darmor/AC489 and ZS11/AC457), another branch further split into two little branches, containing cam-type (e.g., Shengli Rape/AC32) and polima-type (e.g., Jianyang Rape/AC399) lines, respectively (Figure S5). All the investigated cultivated B. oleracea (e.g., Cauliflower, Broccoli, Cabbage, Kohlrabi) and part of the wild B. oleracea were shown with one nearly identical chloroplast genome sequence. However, the C-genome wild relatives (B. villosa, B. insularis, B. cretica and B. incana) each contains a distinct haplotype and seem to possess abundant genetic diversity. All the C-genome species demonstrated a hierarchically clear pedigree from B. villosa stepwise to the cultivated B. oleracea (Figure S6).
A part of the above intra-specific materials were selected capable of maximumly representing each their intraspecific genetic diversities, and then together with Brassica nigra, B. carinata and B. maurorum, were combined to construct a larger tree comprising of materials all over Brassica genus. The cpDNA sequence data for materials Root mustard-1 (B. juncea), Sarsons-1 (B. rapa), Broccoletto-3 (B. rapa), Black mustard (B. juncea) and Ethiopian mustard (B. carinata) were added from Li et al., [18] to enrich the whole phylogenetic tree. The results indicated that Brassica genus was mainly divided into three clades, from which the maternal origin of the three natural allotetraploid species can be clearly inferred (Figure 3). All the B. rapa, B. juncea and quite a few B. napus accessions of both cam- and polima-type constitute Clade I, which further diverged two little branches containing B. rapa ssp. trilocularis (Sarsons) and polima-type B. napus, respectively. Three B. juncea accessions clustered only in Clade I without any further divergence from their co-clustered B. rapa accessions, thus indicating that B. juncea likely has a monophyletic maternal origin from cam-type B. rapa. Clade II comprises all the B. oleracea lines and other wild C-genome species, parallelly branched with Clade I. The branch, which comprises only the B. napus accessions with a same nap cytoplasmic type, is inserted in the middle of Clade II and separated certain C-genome wild relatives (B. insularis and B. villosa) from the remaining part, which contains all B. cretica, B. incana and the cultivated B. oleracea. Clade III comprises mainly B. nigra, B. carinata and B. maurorum accessions, indicating that B. carinata likely has a monophyletic maternal origin from B. nigra. The major cytoplasmic haplotype of B. nigra was designated as nigra-type cytoplasm. The wild species B. maurorum had been reported to be close to the B-genome species [32] and seems evolved earlier than all the remaining part in Clade III. The topological branches in this tree displayed a clear hierarchical pedigree from Clade III to Clade I (Figure 3). Taken together, Unlike B. juncea and B. caritana, the allotetraploid species B. napus were dispersedly distributed in the B. rapa and B. oleracea clusters, suggesting its diverse maternal origin from A-genome B. rapa and certain C-genome Brassica species (2n = 18), respectively.
Evolutionary relationships among Brassica and its relative genera
Unexpectedly, Raphanus sativus was inserted between Clade II and Clade III and bidirectionally close to B. villosa and B. maurorum in the Brassica phylogenetic tree (Figure 3), suggesting that Raphanus genus was associated with Brassica phylogeny, at least at cytoplasmic DNA level. To explore whether any more other genus (or species) also mingle with Brassica genus, a phylogenetic tree containing 54 (Thirteen in and 41 beyond Brassica genus) chloroplast genome sequences in Brassicaceae family was constructed (Figure 4). The tree displays an evolutionary pedigree of clear hierarchical architecture. The Brassicaceae family was divided into at least two large lineages, containing Arabidopsis/Matthiola and Draba/Brassica genera, respectively, which is congruent with previous studies [33, 34]. The results identified another three materials, Eruca sativa, Moricandia arvensis and Sinapis arvensis, which were also tightly integrated with the evolution of Brassica genus. Eruca sativa and Moricandia arvensis were located at the same position as Raphanus sativus, while three our sequenced and one public Sinapis arvensis (Sinapis-4) accessions displayed a scattered distribution fully merged together with the B-genome containing species in Clade III. These findings imply a tight evolutionary association among Brassica and these relatives. Cakile arabica, Orychophragmus diffusus, Alliaria grandifolia, Isatis tinctona and Scherenkiella parvula in Clade IV were shown to be close to Brassica cluster at cytoplasmic DNA level. Successful germplasm development through inter-specific sexual or somatic hybridization between Brassica species with Orychophragmus violaceus or Isatis tinctona [35, 36] could illustrate that the species in Clade IV are fairly close to Brassica.
Uncoupled evolution of mtDNA with cpDNA in B. napus CMS lines
Mitochondrial genome represents another half set of cytoplasmic DNA. To ascertain how about the Brassica phylogeny if being inferred based on mtDNA, the segmented sequences containing the mitochondrial allelic variants from each corresponding material inside and around Brassica genus were extracted and concatenated as each separate intact sequence. All the assembled sequences were subjected to phylogenetic analysis according to the above procedure used for chloroplast genomes. The obtained mitochondrial tree (Figure 5) displayed a pedigree largely resembling the tree derived based on cpDNA (Figure 3). Likewise, it also diverged into three clades, each of the natural Brassica materials possesses nearly identical evolutionary positions in both the cpDNA and mtDNA deriving trees, the same maternal origin relationships of the three Brassica allotetraploid crops were inferred. The location of four genera (Raphanus sativus, Eruca stivus, Moricandia arvensis and Sinapis arvensis) in the mtDNA derived tree were also integrated into Brassica genus, demonstrating that mtDNA evolved parallelly linked with cpDNA in Brassica genus. Nevertheless, differences happened for a few B. napus cytoplasmic male sterile lines, i.e., mori [26, 27] and nsa [25] CMS lines, which have been successfully utilized in hybrid breeding. Mori and nsa lines located in the cam-type and nap-type B. napus clusters, respectively, in the cpDNA deriving tree (Figure S5), and possess the identical natural cam-type and nap-type chloroplast sequences, respectively. However, they are clustered close to their mtDNA donor species in the mtDNA deriving tree (Figure 5), i.e., the B. napus mori and nsa sterile line each clustered together with Moricandia arvensis and Sinapis arvensis, respectively. The evolution of mtDNA in the B. napus CMS lines was uncoupled with cpDNA were identified.
Estimation of divergence times of Brassica crops
The phylogenetic tree containing 54 chloroplast genome sequences in Brassicaceae family (Figure 4) was subjected to estimate the divergence time for these investigated Brassica species, the timetree was conducted by Reltime [37]. Eucalyptus verrucata was set as outgroup. The timetree was calibrated referring to two previously estimated divergence times: 30-35 million years ago (Mya) which dated the speciation of genus Aethionema and 25-30 Mya which dated the separation of two large Brassicaceae clades including Arabidopsis and B. napus, respectively [33, 38]. The obtained timetree (Figure S7) indicated that Aethionema might be an ancient cruciferous genus and there were two major periods for species radiation in Brassicaceae family. During 25-18 Mya, certain genus emerged and separated from each other; and then during the second radiation period (15-6 Mya), most of the genus speciated and formed several large clades. Brassica genus emerged approximately at 4.85 Mya, begin maybe as a kind of B. nigra or B. rapa. Moricandia arvensis, Eruca stivus, Brassica maurorum, Raphanus sativus and Sinapis arvensis speciated at 3.15 Mya, 2.85 Mya, 2.17 Mya, 2.05 Mya and 1.42 Mya, respectively. The Brassica C-genome species (e.g., B. villosa and B. oleracea) separated from A-genome species (B. rapa) since 1.12 Mya. Three allotetraploid species (B. juncea, B. carinata and B. napus) speciated during the period 0.17-0.01 Mya or much later, which are consistent with the estimated originating time of ~7,500 years ago for B. napus [4] and the cultivation beginning time of ~7,000 years ago for B. juncea [7]. Brassica tetraploid species are much younger than other polyploidy crops, e.g., emerge of cotton (Gossypium hirsutum) at 1-2 Mya [39, 40] and emerge of soybean (Glycine max) at 0.8 Mya [41].