Insights on the process of reciprocal gene loss in the duplicate DPL genes of rice

Background: Reciprocal gene loss (RGL) of duplicate genes is an important genetic resource of reproductive isolation, which is essential for speciation. In the past decades, various RGL patterns have been revealed, but RGL process is still poorly understood. The RGL of the duplicate DOPPELGANGER1 ( DPL1 ) and DOPPELGANGER2 ( DPL2 ) gene can lead to BDM-type hybrid incompatibility between two rice subspecies. The evolutionary history of the duplicate genes, including their origin and mechanism of duplication as well as their evolutionary divergence after the duplication, remains unclear. In this study, we investigated the evolutionary history of the duplicate genes for gaining insights into the process of RGL. Results: We reconstructed phylogenetic relationships of DPL copies from all 15 diploid species representing six genome types of rice genus and then found that all the DPL copies from the latest diverged A- and B-genome gather into one monophyletic clade. Southern blot analysis also detected definitely two DPL copies only in A- and B-genome. High conserved collinearity can be observed between A-and B-genomic segments containing DPL1 and DPL2 respectively but not between DPL1 and DPL2 segments. Investigations of transposon elements indicated that DPL duplication is related to DNA transposons. Likelihood-based analyses with branch models showed a relaxation of selective constraint in DPL1 lineage but an enhancement in DPL2 lineage after DPL duplication. Sequence analysis also indicated that quite a few defective DPL1 can be found in 6 wild and cultivated species out of all 8 species of A-genome but only one defective DPL2 occurs in a cultivated rice subspecies. Conclusions: DPL rice

neofunctionalization and subfunctionalization, have been described by different authors [1,9,10]. Nonfunctionalization means that one copy of a pair of duplicates loses function by accumulating degenerative mutations, neofunctionalization refers to that one copy acquires a beneficial function while the other retains the original function, and subfunctionalization is that both copies adopt part of the original functions of the ancient gene. Nowadays, theoretical and empirical investigations have been mainly undertaken on neofunctionalization and subfunctionalization for duplicate genes [8,11,12]. In contrast, the evolutionary significance of nonfunctionalization has not been well documented in spite of its prevalence in duplicate genes [10]. Through comparing the genomic databases of several eukaryotic species. Lynch and Conery [10] found that the most of duplicate genes would be silenced within a few million years and the stochastic reciprocal gene loss (RGL) after gene duplication in isolated populations may play an important role in the origin of genomic incompatibilities, which conforms to the Bateson-Dobzhansky-Muller (BDM) model of speciation [10,13]. In other words, RGL following the gene duplication can lead to postzygotic reproductive isolation between the recent diverged species [14][15][16][17].
Currently, it has been commonly accepted that RGL is an important genetic source of postzygotic reproductive isolation, which is essential for the process of speciation. For example, the postzygotic reproductive isolation between tetraodon and zebrafish was found to be created by RGL [17], so was the isolation between polyploid yeasts [16]. Maclean and Greig [15] induced artificial tetraploid hybrid yeasts then found divergent function loss at multiple duplicate loci among populations, indicating that RGL could happen rapidly after gene duplication. In addition, RGL can also occur after the duplication of a small genomic region [14,18]. A pair of duplicated loci S27 and S28 was proved to lead to reproductive isolation between Oryza sativa and O. glumaepatula [19]. RGL in this pair of loci can also result in reproductive isolation between O. sativa and O. nivara despite the independent mechanism of mutation of S27 in O. nivara [20].
RGL is an important mechanism to creating reproductive isolation, but we still know little about its process. Evolutionary history of duplicate genes in which RGL took place is crucial for understanding the process of RGL. Unfortunately, the evolutionary history on such duplicate genes has not been investigated yet very well [21][22][23]. Here, we offer a solid example of the evolutionary history for gaining insights into the process of RGL. We chose DOPPELGANGER1 ( DPL1) and DOPPELGANGER2 (DPL2), a pair of hybrid sterility genes in rice (O. sativa), as our studying duplicated loci. The RGL of DPL1 and DPL2 can lead to BDM-type hybrid incompatibility between O. sativa ssp. japonica and indica [24]. DPL1 and DPL2 that were located on chromosomes 1 and 6, respectively, encode highly conserved proteins of 94 and 95 amino acids and have same gene structure although their lengths of introns are remarkable different [24]. Functional DPLs were found to highly express in mature anther of rice and the pollens carrying defective alleles at both loci of DPL1 and DPL2 failed to germinate. The defective allele of DPL1 in indica aroused from a 518-bp insertion in the coding region of the second exon while the defective DPL2 allele found in japonica was caused by a splicing site mutation in the second intron that resulted in a readthrough protein. Therefore, one fourth of pollens in the F 1 hybrid between indica and japonica has defective alleles at both DPL1 and DPL2 loci, and thus fail to germinate. The recent research confirmed that DPL2 is the ancient copy and the duplication event occurred after the divergence of O.sativa and Brachypodium distachyon [24], but the accurate origin of the gene duplication remains unknown. As we know, there have been roughly 50 million years since B. distachyon and O. sativa diverged [25], but functional loss of duplicated genes usually happens in a few million years after the duplication [10]. For this reason, we speculate that DPLs were produced by a more recent duplication within a few millions years and their functions may be redundant initially. Therefore, in this research, we are to explore the evolutionary history of DPLs, including origin and mechanism of the duplication, and the evolutionary divergence of two DPL copies following duplication, which, we believe, can provide insights into the process of RGL.

Sequences of DPLs
According to previous phylogenetic analyses [26], there are six diploid genome types in Oryza, including A-, B-, C-, E-, F-and G-genomes, and O. sativa belongs to A-genome. We sampled all 15 diploid species representing the six genome types in Oryza and the diploid Leerisia perrieri as an outgroup and obtained all the sequences of their DPLs (Additional File 1: Table S1). Because there is a striking insertion in the second intron of DPL1 in A-genome species (Additional File 2: Figure   S1) that causes the second intron in DPL1 is longer than that in DPL2 [24], we could distinguish DPL1 from DPL2 in A-genome species depending on the length of their second intron. According to the previous research [24], two functional DPL copies could be found in all species of A-genome except in O. barthii and O. glaberrima in which a large deletion in their DPL1 genes occurs and consequently the genes' function lost. In our sequences, we confirmed pseudo alleles of DPL1 have fixed in Africa rice O. glaberrima and its wild ancestor O. barthii. No remarkable difference was found on the length of the second intron in the two DPL copies of B-genome, but our results of whole genome analysis showed that the two copies located at chromosome 1 and 6 respectively. The genomic locations of two DPL copies in Bgenome are similar to those in O. sativa, so we assumed that B-genome contains both DPL1 and DPL2 like A-genome. In C-and E-genome, highly similar copies were found even though we tried various PCR strategies. In F-and G-genome, only one DPL-like sequence was isolated separately. We used L. perrieri from a closely related genus of Oryza as an outgroup and only one DPL-like gene was obtained by BLAST searches against the whole genome of L. perrieri.

Southern blot analysis
In the light of previous phylogenetic analyses of Oryza [26,29], we redraw a simplified rooted phylogenetic tree of the 6 genome types in Oryza (Additional File 3: Figure S2). The tree indicates that the latest diverged genome types are A-and B-genome within Oryza while the earliest is G-genome and following by F-genome. A recent study on 13 genome types of Oryza showed that F-genome and the ancestor of A-, B-, C-and E-genome diverged approximately 15 million years ago [28]. Hence, we firstly used Southern blot to detect copy number of DPL in the five diploid genome types for determining whether the duplication of DPLs originated in this time scale.
In our Southern blot analysis, we used three endonucleases including BamHI, ECoRV and HindIII. The results with different endonuclease were showed in Figure 1. As a positive control, A-genome shows two bands respectively in ECoRV and HindIII in line with our expectation. B-genome also shows two bands respectively in BamHI and ECoRV, indicating two DPL copies. Among all the five genomes, only E-genome has bands in all three endonucleases and numbers of bands are two and three, suggesting at least two DPL copies. Unlike A-, B-and E-genome that show two or more bands, C-and F-genome was detected only one band separately in one endonuclease thus there may be only one copy. Therefore, we speculate that duplication of DPLs happened within Oryza and hence our phylogenetic analysis on DPLs was conducted within the genus for identifying the accurate origin of the duplicate genes.

Phylogenetic analysis
To reconstruct a phylogenetic tree of DPLs of Oryza covering all diploid genome types, we used both sequenced and downloaded DPLs. All genes used in this study contain coding sequence (CDS) and intron sequences except F-genome in which only CDS were used because the sequences of its intron sequences cannot be aligned with the others.
Using the aligned nucleotide sequences with a length of 1423 bp and 122 parsimony-informative sites in DPLs of diploid Oryza species and L. perrieri, we constructed a bootstrap consensus ML tree to illustrate their phylogenetic relationships ( Figure 2). The phylogenetic relationships of the six diploid genome types in our tree are approximately consistent with those revealed by previous studies [26,30]. In our ML tree, the earliest diverged lineage of Oryza is also Ggenome and following by F-genome. DPL copies of E-, C-, B-and A-genome form a large monophyletic group with 94% bootstrap support, which could be regarded as the sister lineage of F-genome. This group consists of three monophyletic clades, including E-genome clade with 88% bootstrap support, C-genome clade with 100% support and the clade of A-and B-genome with 88% support. Note-worthily, DPL copies form a monophyletic clade in E-and C-genome respectively, but not in A-and B-genome separately. In the monophyletic clade of A-and B-genome, DPL1 copies gather into one monophyletic branch with 99% bootstrap support but DPL2 gather into three monophyletic branches, involving B-genome, O. meridionalis of Agenome, and other A-genome species.

Collinearity analyses and investigations of transposon elements
Using five whole genome database [28,31] Table S2).
Very strong conservations of collinearity between A-and B-genome were found in DPL1 segments of chromosome 1 and DPL2 segments of chromosome 6 in Oryza separately, but no paralogs were identified between DPL1 and DPL2 segments except DPLs. We also found two segments in L. perrieri that show good collinearity with DPL1 and DPL2 segments but DPL-like gene only occurs in the segment that has collinearity with DPL2 segments.
Transposon elements (TEs) were identified in sequences of upstream and downstream intergenic region of DPL1 of A-and B-genome. TEs of several DNA Transposon families were found around DPL1, including Helitron, PIF, TcMar-Stowaway, CMC-EnSpm, hAT-Ac, MULE-MuDR (Additional File 5: Table S3). We found a 9 bp (GAKCTGCCA) repeat sequence at the upstream and downstream of DPL1, and the regions between the repeat sequences were orthologous between A-and Bgenome. However, we failed to identify any target site of duplication between TEs and DPL1s.

Test for selection
We performed the program codeml of PAML to detect significant difference of selective pressure on DPLs under branch models. In the analysis, we focused on the ω ratios (ω = dN/dS) of four branches containing A-, B-and C genomes. ω1 indicates selective pressure on DPLs in the branch of the most recent common ancestor of A-, B-and C-genomes. ω2 refers to selective pressure on DPLs in the branch of the most common recent ancestor of A-and B-genome, representing the lineage before the duplication. ω3 and ω4 show selective pressure in DPL2 and DPL1 lineages respectively, representing the two branches after the duplication. At first, we used M0 Model as a null hypothesis in which one single ω ratio was assumed for all branches. There is no any restriction on ω ratio for any branch in the assumption of Model M1 while ω ratios are set to be different between any two branches in that of Model 2.0. We found significant differences between M1 and M0 Models and between M2.0 and M0 Models (Table 1) In this Model, we found ω1 is obviously greater than one (10.716) that may caused by nucleotide substitution saturation because it is mainly estimated using the outgroup L. perrieri that has a long divergence time from the most recent common ancestor of A-, B-and Cgenome [28]. The ω2 ratio before the duplication is 0.314 that is lower than the ratio of the DPL1 lineage (ω4=0.556) but higher than the DPL2 lineage (ω3=0.186), suggesting a relaxation of selective pressure in DPL1 lineage and an enhancement of selective constraint in DPL2 lineage after the duplication.

Discussion
Origin of the DPL duplication RGL has been confirm to be an important source to reproductive isolation [14,18,19,24,[32][33][34], but their evolution processes have not been clarified very well. It has been reported that the duplication of the DPLs in rice occurred after the divergence of O. sativa and Brachypodium distachyon [24], but there have been roughly 50 million years since the divergence of the two species [25] and functional loss of duplicated genes usually happens in a few million years after the duplication [10]. Therefore, we conducted the analyses of Southern blot and phylogeny to reveal the accurate origin of the DPL duplication.
Our results suggest that the duplication of DPLs in rice happened in the most recent common ancestor of A-and B-genome. At first, our analysis of Southern blot, whole genome BLAST and sequencing implicated that the duplication happened within Oryza, supporting the speculation in former research [24]. A previous research, however, believes that F-genome, the second most ancient lineage in Oryza, has two DPLs and one exists on chromosome 1 and pseudogenized right after the species-specific duplication by double-strand break repair between non-allelic homologous chromosomes [27]. The Southern blot experiment showed F-genome has only one DPL copy. In the whole genome of O. brachyantha we also found only one copy on chromosome 6 whose location is similar to DPL2 in rice genome. Furthermore, we tried various PCR strategies to sequence DPLs in the most ancient lineage of G-genome, and also obtained only one copy. Therefore, both ancient Gand F-genome have only one DPL copy, suggesting that the duplication originated within Oryza.
The phylogenetic relationships of diploid Oryza species in our ML tree are almost same to those in previous studies [26,29], offering us a good phylogenetic framework to explore the origin of DPLs in rice. All DPL copies of A-and B-genome formed a monophyletic branch with high bootstrap support, indicating that the duplication of DPLs in rice originated in the common ancestor of A-and B-genome.
The estimated divergence time of A-and B-genome is about 6.76 million years ago [28], thus the duplication of rice DPLs might happened much later than that estimated in the previous research in which the duplication was thought to occurred after the divergence of O. sativa and B. distachyon [24]. The DPL copies of C-and Egenome used in the phylogenetic analysis did not occur in the clade of A-and Bgenome, indicating that they are independent from the duplication of DPLs in rice.
The Southern blot analysis indicates that C-genome has only one DPL copy, which is not in consistence with the former research [24], so we tried more than 10 pairs of primers in all three species of C-genome (Additional File 6: Table S4) and obtained still only one copy though we retained two or more clones for each species in phylogenetic analysis. In contrast to C-genome, the Southern blot analysis indicates that E-genome may have two or more copies, but we obtained two highly similar sequences with various PCR strategies and both of them in phylogenetic analysis clustered in one branch. Like A-genome, B-has two copies locating on chromosome 1 and 6 respectively. All DPL copies of A-and B-genome formed a monophyletic clade, indicating that duplication of DPL originated in the most recent common ancestor of A-and B-genome.

Mechanism of the DPL duplication
In a recent research, it was thought that chromosome 1 and 6 went through an event of double-strand break repair, resulting in the duplicated DPL1 in O. brachyantha genome [27]. However, by this way new copy losses functional structure and pseudogenizes immediately, thus double-strand break repair can hardly be the cause of the duplication event of DPLs in rice in that there are both functional copies in DPL1 and DPL2 in A-and B-genome.
Our collinearity analysis indicates that there is conserved collinearity of DPL1 and DPL2 segment respectively but not between them, indicating that they don't belong to a pair of syntenic blocks derived from WGD. This means DPLs in O. sativa are not produced by whole genome duplication. In ancestor of Poaceae, a whole genome duplication [25,35,36] has been confirmed, but no shared ancestral region of genome was found between chromosome 1 and 6 [37]. DPL segment of L. perrieri showed good collinearity with DPL2 segments, suggesting DPL2 is an original copy in consistence with the form study [24]. Unequal cross-over happens between homologous chromosome during meiosis, and results tandem duplication [2]. Therefore, the duplication of DPLs was not produced by unequal cross-over as they

Evolutionary divergence of DPLs
The hypothesis of asymmetric evolution ratio in duplicate genes has been commonly accepted [10,21,42,43]. The hypothesis assumes that the evolutionary ratios after gene duplication are more likely to be different between the two duplicate copies.
This bias is common and can occur in any sort of gene duplication [3,44]. An example concerning the bias is found in wild rice. The duplicated loci of S27 and contains the specific duplicated segment in S27 locus, but several mutations at coding and promoter regions may lead to the inactivity of S27. Therefore, it is reasonable to assume S27 is more likely to accumulate mutations than S28.
Our study on DPLs can offer a comprehensive example with not only the difference of selective pressures between duplicate copies, but also the bias in losing function of the copies. Our results indicated that the selective pressures of the two DPL copies are different after the duplication in that the selective constraint on DPL1 relaxed and that on DPL2 strengthened. It is in accord with the phenomenon that the pseudo copies are much more likely to occur in DPL1 than DPL2 in A-genome since the lack of functional DPL1 was found in 6 out of the 8 species in A-genome. rufipogon, and functional mRNA of DPL1 is absent in O. glumaepatula [24]. Besides, in our study (unpublished), we found that the same DPL1 defective allele as O.
sativa occurs in O. nivara. In O. barthii and O. glaberrima, DPL1 has lost the first CDS. On the contrary, the defective DPL2 allele caused by a splicing site mutation in the second intron only occurs in japonica rice. These evidences suggest that the new copy DPL1 is more likely to accumulate mutations and be in the process of losing function while the original copy DPL2 is more conservative and more likely to be retained. However, functional DPL1 were retained while DPL2 defective allele takes up high proportion in japonica rice [24,45], which caused RGL within rice.
Therefore, we believe that the retaining of DPL2 defective allele in rice is caused by artificial selection during the domestication.

Conclusion
In summary, the duplication of DPLs in rice originated in the common ancestor of Aand B-genome about 6.76 million years ago, which is much later than that estimated in the previous research. DPL1 was duplicated from DPL2, which might be caused by DNA transposons. After the duplication, the selective pressures are obviously different between the DPL1 and DPL2 lineage. The DPL1 underwent a relaxation of selective pressure while the DPL2 experienced a stronger selective constraint. More pseudo copies of DPL1 in A-genome indicated that DPL1 is a redundant and may be in the process of pseudogenization. On the contrary, the defective DPL2 allele occurs in japonica rice with a high frequency, suggesting that artificial selection may play an important role in forming the RGL in rice during the domestication.

Species samples, Acquiring online data, DNA isolation and Sequencing
To investigate variation of sequences in DPLs within Oryza, we sampled all 15 diploid species covering 6 genome types (A-, B-, C-, E-, F-and G-genome) in Oryza and a diploid species, L. perrieri , from a closely related genus of Oryza [26]. One to three accessions were sampled for each species and their sequences of DPLs were obtained either by clone sequencing or from online database (Additional File 1: Table S1). We obtained considerable sequences of DPLs from online database.
Besides those of A-and C-genome used in the previous study [24], we obtained sequences of DPLs of B-, F-genome and L. perrieri by whole genome BLAST searches. The genomic sequences of B-and F-genome were downloaded from The National Center for Biotechnology Information Center (https://www. ncbi. nm. nih. gov/) [27,28]. We searched DPL against their whole genomic sequences using BLAST in BioEdit [46]. For sequences of homologous gene of DPLs in L. perrieri, we obtained them by using online BLAST on EnsemblPlant Database (http://plants.

ensembl. org).
We used the DNA secure plant kit (TIANGEN, Beijing, China) for our DNA isolation.
Genomic DNA for Southern blot was only isolated from fresh leaves while DNA for PCR amplification was isolated from fresh or silica-gel desiccated leaves. DNA for Southern blot was evaluated by NanoDrop (Gene Company Limited, Hong Kong, China) and 0.8% agarose gel electrophoresis. DNA for PCR amplification was evaluated by 1.5% agarose gel electrophoresis.
PCR primers were designed with Primer Premier 6 (Premier Biosoft Interpairs, CA, USA) or obtained from the previous research [24,45]. All primers were listed in Additional Files 6 and 7 (Tables S4 and S5). PCR amplification was prepared in a volume of 25 μl reaction using 2×Taq PCR MasterMix (TIANGEN, Beijing, China). All the PCR products were cloned into pEASY-T1 vectors (TransGen Biotech, Beijing, China) and at least 5 independent clones were sanger sequenced (Sangon Biotech, Shanghai, China). For samples from C-, E-, F-and G-genome, we used various PCR strategies to amplify DPL copies. The DNA sequences of 12 diploid species in Oryza obtained by our clone sequencing were submitted to GenBank (MK569018-MK569047).

Southern blot analysis
To detect the copy number of DPL in Oryza, we conducted a Southern blot analysis.
We designed a probe at length of 320 bp based on the most conservative region of DPL1 and DPL2 in rice (Additional File 2: Figure S1). Five diploid genome types, including A-, B-, C-, E-and F-genome, were used in the experiments. For each of them, we sample one species (Additional File 1: Table S1). Samples with DNA concentration higher than 500 ng/µl and no degradation were chosen, and all samples used in the same experience have similar concentration.
In each experience, genomic DNA was divided into two equal parts. Each part was digested by a restriction endonuclease for 18 h at 37°C. Altogether we used three endonucleases including BamHI, ECoRV, or HindIII (Promega, Madison, USA). Every single enzyme digestion contained 4 mg DNA and 50 units of restriction enzyme.
The digested genome DNAs were fractionated by the 0.8% agarose gelelectrophoresis in TAE buffer at 60 V for 2 h and then at 40 V for 6 h. After electrophoresis blotted onto Biodyne Plus nylon membrane with a capillary blotting system using 20×SSC (3 M NaCl, 300 mM sodium citrate, pH 7.0) as transfer buffer, prepared membranes were hybridized with a digoxigenin-labelled DNA probe produced by the DIG-High Prime DNA Labeling and Detection Starter Kit II (Roche, German). Membranes were covered evenly with prepared probe under 37°C for 24 h in hybridization oven (HL-2000 HybriLinker, LABRepCo, UK). Bands were visualized by the kit mentioned above and the fluorescence signals were caught by X-ray film.

Phylogenetic analysis
To determine the origin of duplicate DPL copies in rice, we conduct a phylogenetic analysis. We used all the 15 diploid species [26] and an outgroup L. perrieri to reconstruct a phylogenetic tree with all available sequences of DPL copies. All sequences of DPLs were aligned by a combination of CLUSTAL X [47] and Muscle [48] initially and then the alignments were manually refined. We employed the final alignments to conduct a phylogenetic analysis using the Maximum Likelihood (ML) method under the Jukes-Cantor model [49] built in MEGA 6 [50]. The reliability of the phylogenetic tree was evaluated by 3,000 bootstraps.   The ML phylogeny of Oryza reconstructed using the sequences of DPLs. The numbers below o