Sequencing of the diverse cytoplasmic Brassica DNA haplotypes
To distinguish the cytoplasmic DNA (cpDNA and mtDNA) haplotypes within Brassica genus, genotyping analysis through High Resolution Melting (HRM) method were performed in our germplasm collections (Figure S1). Primers were designed being targeted on a set of intra/inter-specific cpDNA polymorphic sites that were identified previously [16] (Table S1). Three major haplotypes were identified in approximately 480 worldwide B. napus accessions. Two major cpDNA haplotypes were identified in 180 B. rapa accessions, while 180 B. juncea accessions contain one major cpDNA haplotype. B. oleracea, B. carinata, B. nigra, B. maurorum (MM, 2n = 16), certain wild C-genome relatives and three B. napus cytoplasmic male sterility (CMS) lines were treated as each with a distinct haplotype for the subsequent genome sequencing. B. cretica, B. incana, B. insularis and B. villosa represent the wild C-genome relatives. Polima [23, 24], nsa [25] and mori [26, 27] are the CMS lines. Certain relative materials, i.e. Eruca sativa (2n = 22), Raphanus sativus (2n = 18), Sinapis arvensis (2n = 24) and Moricandia arvensis (2n = 28), were also included to enrich this study (Table S2).
Cytoplasmic DNA was synchronously isolated from 72 accessions that represent for all major cytoplasmic haplotypes and morphological varieties (Table S2), using an optimized organelle isolation procedure (Materials and Methods). This method can substantially help to remove nuclei and balance the proportions of cpDNA and mtDNA content. Reads mapping analysis demonstrated that the isolated total DNA contains an average ratio of 37.2% chloroplast DNA and 3.4% mitochondrial DNA, respectively, which is approximately 5-10 times higher than the ratio of cytoplasmic DNA in the total leaf DNA [28]. The cytoplasmic DNA mixture was then subjected to the high-throughput sequencing (with average sequencing depths above 500 x, Table 1). The obtained paired-end reads (150 bp) were directly mapped to a tandem sequence gather, which consists of 10 published chloroplast genome sequences across Brassicaceae family. The mapped reads were extracted and de novo assembled by SOAPdenovo software package [29]. Generally, two or three large contigs were eventually generated for the chloroplast genomes. Gaps were directly filled through manual jointing of the overlapping ends of each two contiguous contigs, and then verified by Sanger sequencing of the gap-spanning PCR fragments. All the obtained chloroplast genome sequences are provided in Additional file 3 (Appendix A).
Genome-wide cytoplasmic (cpDNA and mtDNA) variations in Brassica
The chloroplast and mitochondrial genome sequences of a B. napus strain 51218 [22], which is an intermediate breeding material of nap mitotype, were respectively used as reference sequences to call the overall cpDNA and mtDNA basic variants. The calling was conducted by standard BWA/Genome Analysis Toolkit (GATK) pipeline with manual inspection [30], and then randomly verified by Kompetitive Allele Specific PCR (KASP) analysis. A total of approximately 4700 reliable basic polymorphic sites, including 3880 SNP and 820 InDels, respectively, were identified for all the sequenced chloroplast haplotypes in Brassica genus. While, approximately 3400 polymorphic sites (2700 SNP and 700 InDels) were identified for the mitochondrial haplotypes (Table S3). The average SNP density in the chloroplast and mitochondrial genomes was 25 and 12 SNPs per kilo base (kb), respectively. The chloroplast variants were uniformly distributed along the reference genome, except the two 26-kb large inverted repeat regions, IRa and IRb (Figure 1), since these genomic regions were skipped due to the repetitive mapping of the same reads. The mitochondrial variants showed a comparatively even distribution pattern along the reference genome; however, their variation frequencies are obviously much higher at the regions containing the open reading frame (ORF) genes (Figure 2).
Among the overall variants, 13.9 % and 18.1 % were identified as nonsynonymous for 47 cpDNA coding genes and 61 mtDNA coding genes, respectively. The materials of two B. napus mitochondrial haplotypes, below known as cam- and polima-types, possess approximately 300 basic variants when referring to B. napus strain 51218 mitochondrial genome of nap-type. Polima-type is close to cam-type with a difference of only about 50 conserved cpDNA variants (Table S3). Consistent difference patterns were also found for cpDNA variants as for the three cytoplasmic types. KASP analysis using the primers targeted to the B. napus mitotype-corresponding mtDNA and cpDNA polymorphic sites detected that nap, cam and polima cytoplasms accounted for 87.1%, 7.2% and 5.7% in the investigated B. napus population (Figure S2). Undoubtedly, nap-type is the predominant cytoplasmic DNA haplotype, as identified in previous studies [15, 16]. Most of the B. rapa materials are of the same cam-type in B. napus, another major haplotype accounting for a frequency of approximately 5.8% in the investigated B. rapa population has been identified and named as sarson-type hereinafter, since it mainly exists in B. rapavar. sarson accessions.
The phylogeny of Brassica genus conducted based on the whole chloroplast genomes
Analyses based on the whole chloroplast genomes or genome-wide variations instead of partial cpDNA fragments can infer a phylogeny with much higher resolution and reliability, even at lower taxonomic levels [14]. To forecast the evolutionary trajectories of Brassica crops, all the above-obtained whole chloroplast genomes were subjected to phylogenetic analysis. The phylogenetic trees tentatively conducted using the Maximum Likelihood method, neighbor-joining method and Bayesian method were almost identical. To reduce the calculating amount and avoid a corpulent tree, the trees comprising materials throughout each intra-species, Brassica genus and Brassicaceae family, respectively, were conducted stepwise by Maximum Likelihood method [31].
Chloroplast genome sequences of Raphanus sativus, Isatis tinctoria, Matthiola incana and Arabidopsis thaliana in Brassicaceae family (Data from NCBI, Additional file 3) served as outgroup to root the intra-specific trees. The results indicated that 13 B. rapa accessions, 14 B. juncea accessions, 24 B. napus accessions and 13 C-genome species each clustered well and were separately integrated into a species-specific group. The B. rapa separated a little branch containing only two accessions, which were classified as sarson-type cytoplasm mentioned above (Figure S3). The B. juncea accessions did not diverge any secondary branches, indicating a lack of cytoplasmic genetic diversity (Figure S4). The B. napus cluster were split into two large branches, one branch containing the nap-type lines (e.g., the nuclear-genome sequenced cultivars Darmor/AC489 and ZS11/AC457), another branch further split into two little branches, containing cam-type (e.g., Shengli Rape/AC32) and polima-type (e.g., Jianyang Rape/AC399) lines, respectively (Figure S5). All the investigated cultivated B. oleracea (e.g., Cauliflower, Broccoli, Cabbage, Kohlrabi) and part of the wild B. oleracea were shown with one nearly identical chloroplast genome sequence. However, the C-genome wild relatives (B. villosa, B. insularis, B. cretica and B. incana) each contains a distinct haplotype. All the C-genome species demonstrated a hierarchically clear pedigree, from B. villosa stepwise to the cultivated B. oleracea (Figure S6).
A part of the above intra-specific materials were selected capable of maximumly representing each their intraspecific genetic diversities, and then together with Brassica nigra, B. carinata and B. maurorum, were combined to construct a larger tree comprising of materials all over Brassica genus. The cpDNA sequence data for materials Root mustard-1 (B. juncea), Sarsons-1 (B. rapa), Broccoletto-3 (B. rapa), Black mustard (B. juncea) and Ethiopian mustard (B. carinata) were added from Li et al., [18] to enrich the whole phylogenetic tree. The results indicated that Brassica genus was mainly divided into three clades, from which the maternal origin of the three natural allotetraploid species can be clearly inferred (Figure 3). All the B. rapa, B. juncea and quite a few B. napus accessions of both cam- and polima-type constitute Clade I, which further diverged two little branches containing B. rapa ssp. trilocularis (Sarsons) and polima-type B. napus, respectively. Three B. juncea accessions clustered only in Clade I without any further divergences from their co-clustered B. rapa accessions, thus indicating that the investigated B. juncea has a monophyletic maternal origin from cam-type B. rapa. Clade II comprises all the B. oleracea lines and other wild C-genome species, parallelly branched with Clade I. The branch, which comprises only the B. napus accessions with a same nap cytoplasmic type, is inserted in the middle of Clade II and separated certain C-genome wild relatives (B. insularis and B. villosa) from the remaining part, which contains all B. cretica, B. incana and the cultivated B. oleracea. Clade III comprises mainly B. nigra, B. carinata and B. maurorum accessions, indicating that the investigated B. carinata has a monophyletic maternal origin from B. nigra. The major cytoplasmic haplotype of B. nigra was designated as nigra-type cytoplasm. The wild species B. maurorum had been reported to be close to the B-genome species [32] and seems evolved earlier than all the remaining part in Clade III. The topological branches in this tree displayed a clear hierarchical pedigree, from Clade III to Clade I (Figure 3). Taken together, different from B. juncea and B. caritana, B. napus was dispersedly distributed in the B. rapa and B. oleracea clusters, suggesting its multiple maternal origins from A-genome B. rapa or certain C-genome Brassica species (2n = 18).
The evolution of Brassica tightly associates with a set of its close genera
Intriguingly, Raphanus sativus was inserted between Clade II and Clade III and bidirectionally close to B. villosa and B. maurorum in the Brassica phylogenetic tree (Figure 3), suggesting certain association between Raphanus genus and Brassica phylogeny. To explore whether any more other genus also mingle with Brassica genus, a phylogenetic tree containing 54 (Thirteen in and 41 beyond Brassica genus) chloroplast genome sequences in Brassicaceae family was constructed (Figure 4). The tree displays an evolutionary pedigree with a clear hierarchical architecture. The Brassicaceae family was basically divided into two large lineages, containing Arabidopsis/Matthiola and Draba/Brassica genera, respectively, which is congruent with the previous studies [33, 34]. Another three materials, Eruca sativa, Moricandia arvensis and Sinapis arvensis, were also identified to be tightly integrated with the evolution of Brassica genus. Eruca sativa and Moricandia arvensis were located at the same positions as Raphanus sativus, while three herein sequenced and one public Sinapis arvensis (Sinapis-4) accessions displayed scattered distribution that is fully merged together with the B-genome containing species in Clade III. These findings imply a tight evolutionary association among Brassica and these relatives. Cakile arabica, Orychophragmus diffusus, Alliaria grandifolia, Isatis tinctona and Scherenkiellaparvula in Clade IV were shown to be close to Brassica cluster at cytoplasmic DNA level. Successful germplasm development through inter-specific sexual or somatic hybridization between Brassica species with Orychophragmus violaceus or Isatis tinctona [35, 36] could partially support that the species in Clade IV are fairly close to Brassica.
Uncoupled inheritance of chloroplast and mitochondrial genomes in B. napus CMS lines
Mitochondrial genome represents another half set of cytoplasmic DNA. To ascertain how about the Brassica phylogeny if being inferred based on mitochondrial genomes, the segmented sequences containing the mitochondrial allelic variants from each corresponding material inside and around Brassica genus were extracted and concatenated as each separate intact sequence. All the assembled sequences were subjected to phylogenetic analysis according to the above same procedure used for chloroplast genomes. The obtained mitochondrial tree (Figure 5) displayed a pedigree largely resembling the tree that was derived based on cpDNA (Figure 3). Likewise, it also diverged into three clades, each of the natural Brassica materials possesses nearly identical evolutionary positions in both the cpDNA and mtDNA deriving trees, the same maternal origin relationships of the three Brassica allotetraploid crops were inferred. The location of four genera (Raphanus sativus, Eruca stivus, Moricandia arvensis and Sinapis arvensis) in the mtDNA derived tree were also integrated into Brassica genus, demonstrating that mtDNA evolved parallelly linked with cpDNA in Brassica genus. Nevertheless, differences happened for the B. napus cytoplasmic male sterile lines, i.e., mori [26, 27] and nsa [25] CMS lines, which have been successfully utilized in heterosis-driving hybrid breeding. Mori and nsa lines located in the cam-type and nap-type B. napus clusters, respectively, in the cpDNA deriving tree (Figure S5), and possess the identical natural cam-type and nap-type chloroplast sequences, respectively. However, they are clustered close to their mtDNA donor species in the mtDNA deriving tree (Figure 5), i.e., the B. napusmori and nsa sterile line each clustered together with Moricandia arvensis and Sinapis arvensis, respectively. The coupled inheritance of mitochondrial genomes and chloroplast genomes in the B. napus CMS lines has been disturbed.
Estimation of divergence times of Brassica crops
The phylogenetic tree containing 54 chloroplast genome sequences in Brassicaceae family (Figure 4) was subjected to estimate the divergence time for these investigated Brassica species, the timetree was conducted by Reltime [37]. It was calibrated referring to two previously estimated divergence times: 30-35 million years ago (Mya) which dated the speciation of genus Aethionema and 25-30 Mya which dated the separation of two large Brassicaceae clades including Arabidopsis and B. napus, respectively [33, 38]. Eucalyptus verrucata was set as the outgroup. The obtained timetree (Figure S7) indicated that Aethionema might be an ancient cruciferous genus and there were two major periods for species radiation in Brassicaceae family. During 25-18 Mya, certain genus emerged and separated from each other; and then during the second radiation period (15-6 Mya), most of the genus speciated and formed several large clades. Brassica genus emerged approximately at 4.85 Mya, and began maybe as a kind of B. nigra or B. rapa. Moricandia arvensis, Eruca stivus, Brassicamaurorum, Raphanus sativus and Sinapis arvensis speciated at 3.15 Mya, 2.85 Mya, 2.17 Mya, 2.05 Mya and 1.42 Mya, respectively. The Brassica C-genome species (e.g., B. villosa and B. oleracea) separated from A-genome species (B. rapa) since 1.12 Mya. Three allotetraploid species (B. juncea, B. carinata and B. napus) speciated during the period 0.17-0.01 Mya or much later, which are consistent with the estimated originating time of ~7,500 years ago for B. napus [4] and the cultivation beginning time of ~7,000 years ago for B. juncea [7]. The Brassica tetraploid species are much younger than certain other polyploidy crops, e.g., the emerge of cotton (Gossypium hirsutum) at 1-2 Mya [39, 40] and the emerge of soybean (Glycine max) at 0.8 Mya [41].