Sequences of DPLs
According to previous phylogenetic analyses [26], there are six diploid genome types in Oryza, including A-, B-, C-, E-, F- and G-genomes, and O. sativa belongs to A-genome. We sampled all 15 diploid species representing the six genome types in Oryza and the diploid Leerisia perrieri as an outgroup and obtained all the sequences of their DPLs (Additional File 1: Table S1). Because there is a striking insertion in the second intron of DPL1 in A-genome species (Additional File 2: Figure S1) that causes the second intron in DPL1 is longer than that in DPL2 [24], we could distinguish DPL1 from DPL2 in A-genome species depending on the length of their second intron. According to the previous research [24], two functional DPL copies could be found in all species of A-genome except in O. barthii and O. glaberrima in which a large deletion in their DPL1 genes occurs and consequently the genes’ function lost. In our sequences, we confirmed pseudo alleles of DPL1 have fixed in Africa rice O. glaberrima and its wild ancestor O. barthii. No remarkable difference was found on the length of the second intron in the two DPL copies of B-genome, but our results of whole genome analysis showed that the two copies located at chromosome 1 and 6 respectively. The genomic locations of two DPL copies in B-genome are similar to those in O. sativa, so we assumed that B-genome contains both DPL1 and DPL2 like A-genome. In C- and E-genome, highly similar copies were found even though we tried various PCR strategies. In F- and G-genome, only one DPL-like sequence was isolated separately. We used L. perrieri from a closely related genus of Oryza as an outgroup and only one DPL-like gene was obtained by BLAST searches against the whole genome of L. perrieri.
Southern blot analysis
In the light of previous phylogenetic analyses of Oryza [26, 29], we redraw a simplified rooted phylogenetic tree of the 6 genome types in Oryza (Additional File 3: Figure S2). The tree indicates that the latest diverged genome types are A- and B-genome within Oryza while the earliest is G-genome and following by F-genome. A recent study on 13 genome types of Oryza showed that F-genome and the ancestor of A-, B-, C- and E-genome diverged approximately 15 million years ago [28]. Hence, we firstly used Southern blot to detect copy number of DPL in the five diploid genome types for determining whether the duplication of DPLs originated in this time scale.
In our Southern blot analysis, we used three endonucleases including BamHI, ECoRV and HindIII. The results with different endonuclease were showed in Figure 1. As a positive control, A-genome shows two bands respectively in ECoRV and HindIII in line with our expectation. B-genome also shows two bands respectively in BamHI and ECoRV, indicating two DPL copies. Among all the five genomes, only E-genome has bands in all three endonucleases and numbers of bands are two and three, suggesting at least two DPL copies. Unlike A-, B- and E-genome that show two or more bands, C- and F-genome was detected only one band separately in one endonuclease thus there may be only one copy. Therefore, we speculate that duplication of DPLs happened within Oryza and hence our phylogenetic analysis on DPLs was conducted within the genus for identifying the accurate origin of the duplicate genes.
Phylogenetic analysis
To reconstruct a phylogenetic tree of DPLs of Oryza covering all diploid genome types, we used both sequenced and downloaded DPLs. All genes used in this study contain coding sequence (CDS) and intron sequences except F-genome in which only CDS were used because the sequences of its intron sequences cannot be aligned with the others.
Using the aligned nucleotide sequences with a length of 1423 bp and 122 parsimony-informative sites in DPLs of diploid Oryza species and L. perrieri, we constructed a bootstrap consensus ML tree to illustrate their phylogenetic relationships (Figure 2). The phylogenetic relationships of the six diploid genome types in our tree are approximately consistent with those revealed by previous studies [26, 30]. In our ML tree, the earliest diverged lineage of Oryza is also G-genome and following by F-genome. DPL copies of E-, C-, B- and A-genome form a large monophyletic group with 94% bootstrap support, which could be regarded as the sister lineage of F-genome. This group consists of three monophyletic clades, including E-genome clade with 88% bootstrap support, C-genome clade with 100% support and the clade of A- and B-genome with 88% support. Note-worthily, DPL copies form a monophyletic clade in E- and C-genome respectively, but not in A- and B-genome separately. In the monophyletic clade of A- and B-genome, DPL1 copies gather into one monophyletic branch with 99% bootstrap support but DPL2 gather into three monophyletic branches, involving B-genome, O. meridionalis of A-genome, and other A-genome species.
Collinearity analyses and investigations of transposon elements
Using five whole genome database [28, 31], three species of A-genome (O. sativa, O. rufipogon and O. glumaepatula), one of B-genome (O. punctata) and one outgroup (L. perrieri) were obtained with online database EnsemblPlant (http://plants.ensembl.org). We conducted collinearity analyses between A- and B-genome on DPL segments containing DPLs and five genes flanking each side of DPLs (Figure 3, Additional File 4: Table S2).
Very strong conservations of collinearity between A- and B- genome were found in DPL1 segments of chromosome 1 and DPL2 segments of chromosome 6 in Oryza separately, but no paralogs were identified between DPL1 and DPL2 segments except DPLs. We also found two segments in L. perrieri that show good collinearity with DPL1 and DPL2 segments but DPL-like gene only occurs in the segment that has collinearity with DPL2 segments.
Transposon elements (TEs) were identified in sequences of upstream and downstream intergenic region of DPL1 of A- and B-genome. TEs of several DNA Transposon families were found around DPL1, including Helitron, PIF, TcMar-Stowaway, CMC-EnSpm, hAT-Ac, MULE-MuDR (Additional File 5: Table S3). We found a 9 bp (GAKCTGCCA) repeat sequence at the upstream and downstream of DPL1, and the regions between the repeat sequences were orthologous between A- and B-genome. However, we failed to identify any target site of duplication between TEs and DPL1s.
Test for selection
We performed the program codeml of PAML to detect significant difference of selective pressure on DPLs under branch models. In the analysis, we focused on the ω ratios (ω = dN/dS) of four branches containing A-, B- and C genomes. ω1 indicates selective pressure on DPLs in the branch of the most recent common ancestor of A-, B- and C-genomes. ω2 refers to selective pressure on DPLs in the branch of the most common recent ancestor of A- and B- genome, representing the lineage before the duplication. ω3 and ω4 show selective pressure in DPL2 and DPL1 lineages respectively, representing the two branches after the duplication. At first, we used M0 Model as a null hypothesis in which one single ω ratio was assumed for all branches. There is no any restriction on ω ratio for any branch in the assumption of Model M1 while ω ratios are set to be different between any two branches in that of Model 2.0. We found significant differences between M1 and M0 Models and between M2.0 and M0 Models (Table 1), rejecting the null hypothesis M0 Model. M2.1-M2.6 assume that there are at least two equal ω ratios, but none of the models is significantly different from M2.0, supporting the M2.0 Model that ω1~ω4 are different from each other. Hence we accepted M2.0 Model as the most suitable model to describe the selective pressures on DPLs. In this Model, we found ω1 is obviously greater than one (10.716) that may caused by nucleotide substitution saturation because it is mainly estimated using the outgroup L. perrieri that has a long divergence time from the most recent common ancestor of A-, B- and C-genome [28]. The ω2 ratio before the duplication is 0.314 that is lower than the ratio of the DPL1 lineage (ω4=0.556) but higher than the DPL2 lineage (ω3=0.186), suggesting a relaxation of selective pressure in DPL1 lineage and an enhancement of selective constraint in DPL2 lineage after the duplication.