Balanced duplicated gene expression supports an autotetraploid ancestor of Salicaceae plants CURRENT STATUS:

Background Autopolyploids refer to the increase in the genome from the same species, usually produced by direct doubling of diploid chromosomes. The polyploid formed by chromosome doubling of the same species is called homologous polyploid. In order to further check whether the Salicaceae-common tetraploid is homologous or heterologous, with grape as the outer group, by performing gene collinearity analysis, we explored whether two sets of poplar chromosomes or chromosomal regions have balanced gene expression levels and similar gene function. Paired T-test showed that duplicated genes in colinearity were balanced in expression, which is expected if the tetraploid ancestor was homologous whole-genome duplication, or autopolyploidization. Moreover, KEGG enrichment analysis and pathway annotation showed that most of the differentially expressed genes were related to metabolism. A comparison of different groups of flowering plants suggests that autopolyploidization may not provide comparable biological and evolutionary vigor to establish large plant groups, as observed in poaceae and brassicaceae families. The present analysis contributes to understanding the biology and evolution of Salicacea plants and beyond.

As to GO analysis, duplicated regions did not show any divergence about gene ontology enrichment.
For example, we checked poplar duplicated regions orthologous to grape chromosome 2, the collinear genes preserved show no difference in GO item enrichment (Figure 4).

Tetraploidization and gene evolution
An example of gene family can help understand gene copy number variation, gene loss and divergent evolutionary rate. Calcium-dependent protein kinases ( CDPKs) play crucial roles in regulation of plant development and tolerance of various environmental stresses. Gene expression profiling showed that a number of Populus CDPK differentially expressed across different tissues and developmental stages.
So we downloaded the CDPK gene family sequence of Arabidopsis thaliana and searched and retrieved their homologs in the poplar and grape genomes. These poplar and grape genes were constructed into phylogenetic trees by MEGA( Figure 5). There are 9 genes in grape, and 17 genes in poplar, showing a near doubled number of CDPK genes in poplar as to grape. Actually, all these genes are in colinearity within/between genomes, suggesting the copy number increase in poplar is a direct outcome of its specific tetraploidization. No recent tandem duplication was found.
There is clear evidence of genome fractionation by gene loss. At least three poplar paralogs (of pt16G00564, pt05G01136, pt14G01035, respectively) were lost after the tetraploidization, and one grape gene orthologous to pt16G01172 and pt06G01013 was lost. There are six subgroups each with a grape gene and two corresponding poplar orthologs, duplicated in the poplar tetraploidization.
In five out of six subtrees, as expected the grape gene is the outgroup of the poplar duplicates.
However, there is one subtree, in which a poplar duplicate is outgroup to the grape gene and the other poplar duplicate, showing an aberrant subtree topology. This can be explained by elevated evolutionary rate in the poplar duplicate coming to be the outgroup.

Disscussion
Previously, the analysis of Salicaceae-common tetraploidization suggested balanced gene losses between the duplicated regions its autopolyploidization [12]. Here, by analyzing gene expression data, GO and KEGG pathway, we obtained duplicated regions did not show any divergence about gene ontology enrichment, providing further lines of evidence that the event may have an autotetraploidization nature. Our analysis came to results that are distinct from those with maize, which found divergent gene loss rates and expression levels between duplicated regions, suggesting an allotetraploid nature of maize-specific whole-genome duplication [25]. The present findings lay a foundation for further analysis of Salicaceae genomes and understand their biology.
More and more evidence shows that polyploidization has recursively affected the evolution of land

Inferring gene colinearity.
Gene collinearity is precious to understand ancestral genome structure and evolution. Colinear genes are those genes having preserved ancestral gene orders in extant genomes, which may be often intervened by other genes without collinear counterparts, aroused due to gene loss, relocation, or insertion. Here, to find collinear genes, we inferred putative homologous genes by using BlastP (E_value 1e-5) [18]. A loose definition of gene homology here will not jeopardy the aim of understanding genome structure, but accommodate often much diverged duplicated gene sequences after tens of million years. By using ColinearScan [19] implementing a dynamic programming algorithm (Parameter: non-colinear genes in gap between collinear genes = 40), we inferred colinear gene blocks in poplar and between poplar and grape. By using Nei-Gojobori approach implemented in PAML [20], we estimated synonymous nucleotide substitution rates or Ks between the above inferred collinear genes. Then with the gene colinearity and Ks information, we constructed Dotplot map between poplar and grape, and showed mean Ks between poplar-grape collinear gene blocks.

Gene expression analysis
We downloaded expression data of five tissues (xylem, phloem, shoot, leaf, and root) and two wood forming cell types (fiber and vessel) of poplar [17]. Genes with at least two counts-per-million (cpm) in at least three samples were retained and normalized using the trimmed mean of M value (TMM)[21].
We used grape as a reference. Based on 19 grape chromosomes, we tested whether there were expressional differences between the two sets of poplar genomes by using paired T-test. .

GO and KEGG analysis
As to gene function analysis, we retrieved their GO annotation and KEGG annotation for poplar genes [22,23]. Because the KEGG annotations are almost all single annotations, we compared the differences of KEGG annotations between the tetraploidy-produced duplicated genes in poplar, and did GO functional enrichment analysis by checking GO functional annotations (default parameters).
Then, we screened the top 500 genes with the greatest difference in expression by T-test, and performed KEGG enrichment analysis and KEGG pathway annotation.

Gene family analysis
As to gene family analysis, We downloaded all the genes of the CDPK family in Arabidopsis thaliana        Phylogenetic tree of poplar and grape CDPK genes. CDPK protein sequences were aligned by Clustalw and the phylogenetic tree was constructed using Mega by the maximum likeihood method. Bootstrap value are based on 1000 replicates. Purple circle, pink triangle and green triangle respectively represent grape and two sets of poplar genes. The number on the branches is support value by bootstraping.