Sequences of DPL genes
According to previous phylogenetic analyses [27], there are six diploid genome types in Oryza, including A-, B-, C-, E-, F- and G-genomes, and rice O. sativa belongs to A-genome. We sampled all 15 diploid species representing the six genome types in Oryza and the diploid Leerisia perrieri as an outgroup and obtained all the sequences of their DPL genes (Additional File 1). There is a striking insertion in the second intron of DPL1 in A-genome species (Additional File 2) that causes the second intron in DPL1 is longer than that in DPL2 [25], so we could distinguish DPL1 from DPL2 in A-genome species depending on the length of their second intron. In A-genome, functional DPL1 and DPL2 could be found in all species of A-genome but not in O. barthii and O. glaberrima in which a large deletion occurs in their DPL1 genes and consequently the function lost [25]. Through analyzing our obtained sequences, we confirmed pseudo DPL1 has fixed in Africa rice O. glaberrima and its wild ancestor O. barthii. Besides the two species that fixed non-functional DPL1 and another three A-genome species that also own non-functional DPL1 but not fixed [25], we detected the same defective DPL1 as O. sativa in O. nivara (Additional File 1), thus the non-functional DPL1 can be found in six out of the eight species in A-genome while defective DPL2 occurs only in japonica subspecies. In B-genome, we didn't found any remarkable difference in length of the second intron between the two DPL copies, but our whole genome analysis showed that the two copies located at chromosome 1 and 6 respectively. The genomic locations of two DPL copies in B-genome are similar to those in O. sativa, so we assumed that B-genome contains both DPL1 and DPL2 like A-genome. In C- and E-genome, highly similar copies were acquired even though we tried various PCR strategies. In F- and G-genome, only one DPL-like sequence was isolated separately. We used L. perrieri from a closely related genus of Oryza as an outgroup and only one DPL-like gene was found by BLAST searches against the whole genome of L. perrieri.
Southern blot analysis
According to previous phylogenetic analyses of Oryza [27, 28], we redraw a simplified rooted phylogenetic tree of the six genome types in Oryza (Figure 2a). The tree indicates that the latest diverged genome types are A- and B-genome within Oryza while the earliest is G-genome and following by F-genome. A recent study on 13 genome types of Oryza showed that F-genome and the common ancestor of A-, B-, C- and E-genome diverged approximately 15 million years ago [29]. Hence, we firstly used Southern blot to detect copy number of DPL in the five diploid genome types for determining whether the duplication of DPLs originated in this time scale.
In our Southern blot analysis, we used three endonucleases including BamHI, ECoRV and HindIII. The results with different endonuclease were showed in Figure 1. As a positive control, A-genome shows two bands respectively in ECoRV and HindIII. B-genome also shows two bands respectively in BamHI and ECoRV, indicating two DPL copies. Among all the five genomes, only E-genome has bands in all three endonucleases and numbers of bands are two and three, suggesting at least two DPL copies. Unlike A-, B- and E-genome that show two or more bands, C- and F-genome was detected only one band separately in one endonuclease thus there may be only one copy. Therefore, we believed that duplication of DPLs happened within Oryza and hence our phylogenetic analysis on DPLs was conducted within the genus for identifying the accurate origin of the duplicate genes.
Phylogenetic analysis
To reconstruct a phylogenetic tree of DPL genes of Oryza covering all diploid genome types, we used all obtained sequences of DPL genes. All DPL genes used in this study contain sequences of coding (CDS) and intron except F-genome in which only CDS were used because its intron sequences cannot be aligned with the others.
Using the aligned nucleotide sequences with a length of 1423 bp and 122 parsimony-informative sites in DPL genes of diploid Oryza species and L. perrieri, we constructed a bootstrap consensus ML tree to illustrate their phylogenetic relationships (Figure 2b). The phylogenetic relationships of the six diploid genome types in our tree are approximately consistent with those revealed by previous studies (Figure 2a) [27, 30]. In our ML tree, the earliest diverged lineage of Oryza is also G-genome and following by F-genome. DPL copies of E-, C-, B- and A-genome form a large monophyletic group with 94% bootstrap support, which could be regarded as the sister lineage of F-genome. This group consists of three monophyletic clades, including E-genome clade with 88% bootstrap support, C-genome clade with 100% support and the clade of A- and B-genome with 88% support. Note-worthily, DPL copies form a monophyletic clade in E- and C-genome respectively, but not in A- and B-genome separately. In the monophyletic clade of A- and B-genome, DPL1 copies gather into one monophyletic branch with 99% bootstrap support but DPL2 gather into three monophyletic branches, involving B-genome, O. meridionalis of A-genome, and other A-genome species.
Collinearity analyses and investigations of transposon elements
Using five whole genome database [29, 31], three species of A-genome (O. sativa, O. rufipogon and O. glumaepatula), one of B-genome (O. punctata) and one outgroup (L. perrieri) were obtained with online database EnsemblPlant (http://plants.ensembl.org). We conducted collinearity analyses between A- and B-genome on DPL segments containing five genes flanking each side of DPL genes (Figure 3, Additional File 3).
Very strong conservations of collinearity between A- and B- genome were found in DPL1 segments of chromosome 1 and DPL2 segments of chromosome 6 in Oryza respectively, but no paralogs were identified between DPL1 and DPL2 segments except DPL genes. We also found two segments in L. perrieri that show good collinearity with DPL1 and DPL2 segments but DPL-like gene only occurs in the segment that has collinearity with DPL2 segments.
We detected TE in intergenic regions of upstream and downstream of DPL1 in A- and B-genome and identified TEs of several DNA Transposon families, including Helitron, PIF, TcMar-Stowaway, CMC-EnSpm, hAT-Ac, MULE-MuDR (Additional File 4). We also found a pair of 9-bp (GAKCTGCCA) repeat sequences that occurs at both the upstream and downstream of DPL1, and the genomic regions between the pair of repeat sequences were orthologous between A- and B-genome (Additional File 4). However, we failed to identify any target site of duplication between these TEs and DPL1.
Test for selection
We performed the program codeml of PAML to detect significant difference of selective pressure on DPL genes under branch models. In the analysis, we focused on the ω ratios (ω = dN/dS) of four branches containing A-, B- and C genomes (Figure 4). ω1 indicates selective pressure on DPL genes in the branch of the most recent common ancestor of A-, B- and C-genomes. ω2 refers to that of A- and B- genome, representing the lineage before the duplication. ω3 and ω4 show selective pressure in DPL2 and DPL1 lineages respectively, representing the two branches after the duplication. At first, we used M0 Model as a null hypothesis in which one single ω ratio was assumed for all branches. There is no any restriction on ω ratio for any branch in the assumption of Model M1 while ω ratios are set to be different between any two branches in that of Model 2.0. We found significant differences between M1 and M0 Models and between M2.0 and M0 Models (Table 1), rejecting the null hypothesis M0 Model. M2.1-M2.6 assume that there are at least two equal ω ratios, but none of the models is significantly different from M2.0, supporting the M2.0 Model that ω1~ω4 are different from each other. Hence we accepted M2.0 Model as the most suitable model to describe the selective pressures on DPL genes. In this Model, we found ω1 is obviously greater than one (10.716) that may caused by nucleotide substitution saturation because it is mainly estimated using the outgroup L. perrieri that has a long divergence time from the most recent common ancestor of A-, B- and C-genome [29]. The ω2 ratio before the duplication is 0.314 that is lower than the ratio of the DPL1 lineage (ω4=0.556) but higher than the DPL2 lineage (ω3=0.186), suggesting a relaxation of selective pressure in DPL1 lineage and an enhancement of selective constraint in DPL2 lineage after the duplication.