Balanced gene retention, transcription and methylation levels between subgenomes produced by soybean-specific paleo-tetraploidization

Background Polyploidization creates duplicated sets of genomes in a plant. Dominance by a subgenome is often observed in a polyploid species. Here, we studied the duplicated genomic regions in soybean, which was affected by a tetraploidization 13 million years ago. By referring to eudicot relatives, we detailed that duplicated genes are retained in a balanced manner among homoeologs. Moreover, we found that duplicated genes in colinearity in extant genome are expressed at balance. Notably, the DNA methylation patterns of duplicated genes shared similar methylation levels among three different soybean lines (W931A, WR016, and ZAYOU1), suggesting that either the epigenetic information could be maintained transgenerationally for millions of years or homoeologs in this paleo-autopolyploid species have similar methylation patterns under similar environment. These findings support the paleo-autotetraploidy of soybean. We proposed that, if taking maize as a model plant for studying paleo-allotetraploids, soybean is a model plant for studying paleo-autotetraploids.

corresponding soybean subgenome regions produced by the SST, it was proposed that the SST seems to have an autotetraploid nature [2]. A higher number of chromosomal rearrangements and higher frequency of retention of duplicated genes in soybean than in maize [3]. A similar proposition was raised by checking genomic fractionation statistically [4].
A relatively high degree of nuclear DNA methylation is a specific feature of plant genomes. Targets for cytosine DNA methylation in plant genomes are CG, CHG and CHH (H is A, T, or C) sequences. Gene Body CG and CHG methylation and suppression of centromeric CHH methylation are mediated by DECREASE IN DNA METHYLATION1 in promoters affecting the splicing of genes in rice [30].
In contrast to the autotetraploid nature of SST in soybean, maize was proposed to have a paleoallotetraploid nature. The maize duplicated regions display significant differences in gene loss and gene expression, transposable element accumulation, small interfering RNAs, and DNA methylation around genes [3,31]. Recently, a study shows that he DNA methylation levels of the two pear subgenomes show little deviation, and there is no expression advantage between the two subgenomes [32]. Here, we would use new data to check the retention, transcription, and methylation difference between the SST duplicated genes, shedding light on the evolutionary divergence and genomic stability of them after evolution during millions of years.

Calculation of Gene Expression Levels
For all expression sets, reads were aligned to the soybean genome using an ultrafast and memoryefficient tool for aligning sequencing reads to long reference sequences (Bowtie2) [35]. The align results were sorted and transfer 'bam' to 'sam' format by SAMTOOLS [36]. Following the alignment, estimates the expression levels for each soybean gene model using Cufflinks [37]. Using the published annotations of the Williams 82 v2.0 working gene list.

Methylation characterization
To characterize DNA methylation, three soybean samples (ZAYOU1, WR016, and W931A) were sequenced on an Illumina Hiseq 2000/2500 platform, and 100 bp/50 bp single-end reads were generated. Image analysis and base calling were performed with the standard Illumina pipeline, and finally 100 bp paired-end reads were generated.
Bismark software (version 0.12.5) [38] was used to perform alignments of bisulfite-treated reads to the soybean (Williams82) genome with the default parameters. The reference genome was first transformed into a bisulfite-converted version (C-to-T and G-to-A converted) and then indexed using bowtie2 [35]. Sequence reads were also transformed into fully bisulfite-converted versions (C-to-T and G-to-A converted) before alignment to similarly converted versions of the genome in a directional manner. Then sequence reads that produced a unique best alignment from the two alignment processes (original top and bottom strand) were compared with the normal genomic sequence, and the methylation state of all cytosine positions in the read was inferred. Reads that aligned to the same regions of the genome were regarded as duplicated reads.
To identify the methylation site, we modeled the sum s + ij of methylated counts as a binomial (Bin) random variable with methylation rate r ij , [Formula 1 Due to technical limitations the formula could not be printed here. You can find it in the manuscript file or in a separate supplemental file. ] We employed a sliding-window approach, which is conceptually similar to the approaches that have been used for bulk bisulfite sequencing (http://www.bioconductor.org/packages/2.13/bioc/html/bsseq.html). With window size w = 3,000 bp and step size 600 bp [39], the sum of methylated and unmethylated read counts in each window was calculated. Methylation level (ML) for each C site shows the fraction of methylated Cs, and is defined as: [Formula 2 Due to technical limitations the formula could not be printed here. You can find it in the manuscript file or in a separate supplemental file. ] The ML calculated was further corrected with the bisulfite non-conversion rate according to previous studies [40]. Given the bisulfite non-conversion rate r, the corrected ML was estimated as: [Formula 3 Due to technical limitations the formula could not be printed here. You can find it in the manuscript file or in a separate supplemental file. ] All P-values were calculated using cumulative binomial distributions assuming an equal chance of gene copies on two sets of homoeologous regions corresponding to a specified referring chromosome.

Results
Homoeologous gene colinearity By using our established software ColinearScan [41], we revealed 25,302 pairs of gene blocks, each with 5 or more colinear genes, within soybean; 4,391 pairs in barrel medic; and 50,672 colinear gene pairs between soybean and barrel medic. Owing to the existence of more ancient polyploidizations, the homologous pairs in soybean may not all be produced by the SST. By checking homologous correspondence between barrel medic and soybean (1:2), synonymous substitution rates or Ks values of colinear gene pairs, chromosomal rearrangement breakpoints, and colinear block complementarity relative to barrel medic chromosomes (Additional file 1: Figure S1), we tried to locate SST-derived homologous blocks. Even for genes produced by the same event, the variation of Ks values among duplicated genes could be sufficiently large to interfere the inferring whether genes were derived from that specific event. We used the median Ks to help distinguish the blocks from different events. is not rare at present [42]. Regardless of the cause, such incongruity does not substantially affect our interpretation of soybean genome formation below.
An interesting observation is that SST homoeologous regions have very similar gene retention patterns (Fig. 1). With barrel medic chromosomes as reference, we calculated gene retention rates in sliding windows of 100 genes and steps of 1 gene. Here, for homoeologous regions mapped onto a barrel medic chromosome, there were prominent variations in gene retention. In some regions, only a tiny fraction of genes was preserved, while in other regions, gene preservation reached 40%.
However, while the retention curves along the barrel medic chromosomes were volatile, we observed similar gene retention or loss between homoeologous/duplicated regions (Fig. 1): a total of 57.5% of homoeologous regions had less than 5% difference in gene retention rates (Additional file 3: Table   S1). This precludes their division into two subgenomes, and is in sharp contrast to the two easily distinguished maize subgenomes, a dominant one and a sensitive one [31]. Homoeologous regions with different gene retention levels often involve large chromosome patches that are missing, possibly due to occasional DNA removal (Fig. 1).

Similar homoeologous gene expression
Corresponding pairs of homoeologous soybean regions also showed similar gene expression (Additional file 4: Table S2). There were 8,205 SST-derived homoeologous gene pairs expressed, and 56.7% (4,653 pairs) had similar expression levels ( Fig. 2a; Additional file 5: Figure S3a-l). Along the barrel medic reference chromosomes, as with gene retention, there were prominent differences between certain peaks and troughs, but expression levels of preserved homoeologous genes showed similar patterns (Fig. 2b). Similar to the analysis of gene retention rate, we divided the homoeologous chromosomes into sliding windows and checked for differences in gene expression (see Methods for details). We found that the expression activity levels between homoeologous genes in nearly 80% of homoeologous regions for all chromosomes were not significantly different (p < 5%).

Similar homoeologous DNA methylation
Corresponding homoeolog pairs of homoeologous soybean regions were also similarly methylated. We characterized DNA methylation of three different soybean lines (W931A, WR016, and ZAYOU1). By referring to the barrel medic chromosomes, DNA methylation levels were statistically indistinguishable between most corresponding homoeologous regions ( Fig. 3a; Additional file 6: Figure   S4a-b; Additional file 7: Figure S5a-b). More than 45.0% (3,354 pairs) of homoeologous genes had methylation levels that were less than twofold different (Additional file 8: Table S3). Along the barrel medic reference chromosomes, the DNA methylation levels of homoeologous regions showed similar zigzagging patterns ( Fig. 3b; Additional file 6: Figure S4c-d). Just like gene retention and expression, we divided homoeologous chromosomes into sliding windows (see Methods), finding for all chromosomes that nearly 43-44% of regions had methylation levels that were indistinguishable at a significance level < 0.05 (Additional file 9: Table S4a); and 69%-71% regions were indistinguishable at a significance level P < 0.1 (Additional file 9: Table S4b).

Discussion
Polyploidization would produce duplicated copies of genomes, or subgenomes in a plant. Often nonequivalence is observed between subgenomes. Maize was an ancient allotetraploid, and it was found that the duplicated regions in maize genome are significantly unbalanced in terms of the preservation of duplicated genes [31]. Actually, maize duplicated regions that are orthologous to each sorghum chromosome showed distinctive gene retention differences, suggesting existence of a dominant subgenome and a sensitive one. Besides, genes constituting the dominant subgenome show a higher expression level. This makes a good proof for its paleo-allotetraploidy origin.
Here, we found that corresponding pairs of homoeologous soybean regions are similar in gene retention, expression, and methylation. This is distinctively different from the observation in maize. If the maize pattern is typical of an ancient allotetraploid, the situation in soybean shows an ancient tetraploid with two largely similar subgenomes, and therefore, can only explain by autotetraploidy.
This supports the previous classification of soybean as having a class I polyploid ancestor, or autotetraploid [4]. Taking maize as a model plant to study ancient allotetraploidy, soybean can be a model plant to explore the effect of ancient autopolyploidy.
As a typical epigenetic feature, DNA methylation was well documented in recent years at a wholegenome level [43,44]. As compared to genetic patterns, one may expect that the patterns of DNA methylation might be more easily lost over time. Although there has been recent evidence showing transgenerational inheritance of epigenetic changes imprinted by environmental factors [45], none has ever expected how long epigenetic changes to pass through time. In the present study, we found a high similarity of homoeologs in DNA methylation among the three soybean genotypes. Two hypotheses can be proposed to interpret the similarity of DNA methylation among homoeologs (duplicated gene copies): The first hypothesis is that the DNA methylations transgenerationally passed since the paleo-autotetraploidization ~13 million years ago, which means that epigenetic features could be kept for millions of years, even eternally. The epigenetic changes may be imprinted into genetic materials to translate into genetic information. If this is the case, the finding may obscure the boundary, to certain extent, between epigenetic and genetic information. The second hypothesis is that the similarity among homoeologs is due to sequence-based determination of DNA patterns.
The homoeologs in subgenomes of this ancient autotetraploid (soybean) are highly similar in sequence and gene order conservation and therefore could also have similar DNA methylation patterns since these three soybean genotypes are all from soybean breeding programs and their plants were all grown under similar conditions, a very different climate condition from13 millions year ago. The high similarity of DNA methylation among homoeologs in soybean is a very interesting discovery, regardless whether the DNA methylation was preserved transgenerationally for 13 million years or occurred similarity due to the sequence and gene order similarity.

Conclusion
Polyploidization creates duplicated sets of genomes in a plant. Dominance by a subgenome is often observed in a polyploid species. Here, we studied the duplicated genomic regions generated by tetraploidization 13 million years ago in soybean. By referring to barrel medic species, we detailed that duplicated genes are retained in balance among the same duplicated region. Duplicated genes in colinearity in extant genome are also expressed in balance. Notably, the DNA methylation levels of duplicated genes are similar in three different soybean lines (W931A, WR016, and ZAYOU1). Putting together, these findings support the ancient autotetraploid nature of soybean. We proposed that, if taking maize as a model plant for studying ancient allotetraploids, soybean can be a model plant for studying ancient autotetraploids.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Competing interests
The authors declare no competing financial interests.
Publisher's Note