UXS family members in cotton and other plants
Six Arabidopsis UXS protein sequences were used in BLAST search to identify cotton UXS proteins. Finally, 112 UXS genes involved plants were identified (Table S1). A total of 10, 10, 18 and 20 UXS genes were found in Gr, Ga, Gh, and Gb, respectively. This shows a 1–1 orthologous correspondence between two diploid cottons, and between two tetraploids, respectively, and near 1–2 correspondence between the diploids and tetraploid, showing likely loss of copies after the tetraploidization. Using the same approach, a total of 54 UXS homologs were identified from other 12 plants (Table S1). Except Vv08G0862 and GhA05G097200, the core catalytic domain of the UXS, GXXGXXG and YXXXK, was conserved.
Seven primitive plant species have fewer UXS genes than higher plants. For example, there is only one UXS gene in Ch. Reinhardti, C. subellipsoidea, V. carteri and M.pusilla (Table S1).
Phylogeny, Gene Structural and Conserved Motifs
A total of 112 UXSs were split into 8 groups representing 4 subfamilies (I, II, III and IV) by the phylogenetic tree (Fig. 1). Group I, II, and III were like those in A. thaliana identified previously [11, 27]. Seven UXS genes from 5 different green alga species were included in group IV, indicating that they might be orthologs originated from a single ancestral gene. Each of the dicotyledon, monocot, moss, and lycophyte species contributed at least one UXS gene to group II and group III, and these two clades were further divided into three different subgroups. Group II contained 3 subgroups, named group IIa, IIb and IIc. Group III also included three subgroups, named group IIIa, IIIb and IIIc. Group I only possessed member from dicotyledon and rice species, including 2 subgroups (group Ia and Ib).
UXS members from phylogenetically closer species were clustered together within four groups. For example, group Ia contained members from dicots, group Ib from monocot-specific UXS genes. Multiple homologs in subtrees showed duplication in each genome of cotton and durian. Most of UXS members were clustered reflecting the species evolutionary relationship. However, there were UXS members have subtree topologies not consistent to species relationship. The cotton genes were often split into subgroups by cacao and durian homologs, which should be their outgroups. This suggests that UXS genes expanded after species formation, possibly related to polyploidization, and the expansion might involve unbalanced evolutionary changes among duplicated genes. These findings will be further discussed below.
Notably, Gossypium UXS genes have longer evolutionary branches, which illustrates Gossypium have faster evolutionary rates than homologs from the other plants. This is evidence of adaptive evolution of UXS genes in cotton.
The above classification is supported by analysis of exon-intron structures and conserved motifs as shown below. To investigate the possible structural evolution of UXSs, we characterized the exon-intron organizations of UXS genes (Fig. 2). Generally, the exon-intron structure is highly conserved within a certain group. Number of introns (0–13) and the length of introns (22–1970 bp) in the UXS family genes are significantly diverse. All genes in group I have a relatively small number (1–8) of exons. In group II, there were 4 to 7 introns. Groupe III genes have more (12–14) exons. Ol09G33309, Ol14G43376, Ol21G29801 and Mp07G109504 and Cs03G64859 in group IV have none intron. As shown in Fig. 2, the UXS genes clustered in the same subfamily shared similar exon-intron patterns. As an older species, Vitis vinifera had distinctively longer introns.
To better characterize the UXS gene family, conserved motifs were identified in UXS proteins (Table S2) by MEME. In total, 20 distinct motifs were identified. UXS typical domain GXXGXXG and YXXXK were located in the motif 6 and motif 1. Although the core catalytic domain of the UXS is conserved, except Vv08G0862and GhA05G097200, variable regions are mainly focused on the N termini. Overall, the same subgroup shares similar motif compositions, indicating a highly functional conservation. Group I and III have more motifs than group II and IV. Group I and III had the particular motif 20 and motif15 at N-terminal. In group III, in most of genes motif 18 and 6 replace motif 10 and 12 of group I. In group I, several cotton genes have a special motif 19. Group II and IV genes do not have the N-motif, as compared to genes in group I and III. Motif 15 is at the N-terminal and motif 11 is its special motif in group II. Motif 11 is partially exemplified the distribution of the N-terminal domains. All these specific motifs may have contributed to the functional divergence of UXS genes.
In addition, analysis of the UXS using PSORT and TargetP1.1 program indicated that UXS were predicted in cytoplasm, chloroplast, endoplasmic reticulum (E.R.), and mitochondrion (Table S1). Similar subcellular localization genes are distributed in the same branch, for example, At02G28760, At03G46440, At05G59290, Pt01G237200, GhA05G097200, GbA05G009370, Dz06G0335 and Tc10G032030 are all located in chloroplast. This illustrates that UXS gene function is related to its cell location [9, 27]
Chromosomal synteny and duplication in Gossypium
The chromosomal location of each UXS genes was established in cotton. As shown in Fig. 3, UXS genes are distributed on 9 chromosomes in Ga, 9 in Gr, 18 in Gh, and 18 in Gb. Most chromosomes have a single UXS gene and two genes are present in chromosome Gr09, Ga04, GbA05, and GbD04.
Gene duplications in genomes could provide important information for gene evolution analysis. UXS gene duplication analysis in genome A, genome D and genome AD were performed respectively. A total of 40 pairs of syntenic paralogs are identified among the 4 cotton species (Fig. 3, Table S3). UXS genes form 7, 12, 6 and 16 synteny-supported paralogous pairs in Ga, Gr, Gh, and Gb, respectively. Checking gene collinearity, gene structure and phylogeny, we found that syntenic paralogs had similar gene structure or motif and were in the same branch in the phylogenetic tree, such as, Ga11G2843, Ga08G0887, Ga11G2843 and Ga03G0563. This shows UXS gene expansion through the ancestral decaploidy and more ancient genome duplication [28, 29].
Ks was estimated of duplicated UXS genes (Table S3). Two paralogs with Ks < 0.03 between allotetraploid cotton (Gh and Gb), consistent with the time of the divergence of Gb and Gh [24]. There were also 18 paralogs with Ks to be 0.40–0.80, associated with the decaploidy occurring approximately 16.6 (13.3–20.0) million years ago in the Gossypium. In addition, around one fourth of pairs occurred within the Ks at 1.3-2.0, corresponding to the paleo-hexaploidization (ancient hexaploidization) event shared among the dicots. In view of species collinearity, many UXS are whole genome duplication (Table S3). Therefore, expansion of UXS genes in cotton might have occurred due to the large-scale duplication events in evolution.
The micro conservative collinear indicated that UXS gene may be lost much in cotton plants after decaploidization. For example, though affected by a decaploidy event, the number of UXS genes in G. raimondii is much fewer than five times of those homologs in grape (Table S4). Based on gene colinearity analysis, we found that each of four cacao UXS genes had one to three orthologous copies in cotton A or D subgenomes, 2 to 6 copies in the tetraploid cotton (Table S4), suggesting extensive gene losses after polyploidization.
Selective pressure analysis
The ratio of Ka/Ks can be used to show selective pressure acting on coding sequences [30]. The Ka/Ks of cotton UXS paralogous pairs were estimated. The Ka/Ks ratios of UXS pairs were all < 1 (Table S3), suggesting that most of them were subjected to negative selection.
According to the homeologous UXS gene colinearity (Table S4), we found that the tree of UXS genes was often not well reconstructed. Paralogs produced by the cotton-specific decaploidy are expected to group together on the tree, with their Dz and Tc orthologs to form outgroups. However, we found five branches of Tc and Dz orthologs were grouped together, which were not in accordance with the species relationship (Fig. 1). These aberrant groupings account 100% of homologous subgroups of Tc, Dz, and Gossypium genes. Actually, at least four times the Tc and Dz orthologs came between their cotton orthologs, splitting the cotton orthologs into two subgroups. For examples, unexpectedly each of Tc02G030560, Tc06G005310, Tc0910g032030, and Tc11G003610 was grouped with a subset of their cotton orthologs, with the other cotton orthologs being the outgroup. These aberrant groupings account 80% of 5 cases of homologous subgroups of Tc, Dz, and Gossypium genes. This intervening phenomenon of Tc and Dz gene into its cotton orthologous cluster on the phylogenetic tree can be explained by increasing substitution accumulation in partial cotton paralogs.
In order to speculate natural selection effect on the evolution, we must have a correct phylogenetic tree. Here, based on gene colinearity, for each subgroup we restricted the Tc and Dz UXS genes, as outgroups, to be clustered with their corresponding collinear cotton orthologs. Therefore, we made trees with possibly corrected topology. Based on the corrected trees of UXS genes (Fig. 4), we inferred positive selective pressure along each lineage. We found that different subgroups of cotton UXS genes have been under divergent evolutionary pressure. The subgroup displayed in Fig. 4a, with a common Tc ortholog Tc02G030580, has two branches under positive selection, with one decaploid-produced lineages of gene (Ga08G0887) was likely subjected to positive selection after the decaploidization event. As to the subgroup displayed in Fig. 4b, one decaploid-produced lineages of gene and one D subgenome gene were likely subjected to positive selection. In subgroup displayed in Fig. 4c, none selection was detected. In subgroup displayed in Fig. 4d, with a Tc ortholog Tc10G032030, three A subgenome gene (GhA05G097200, GbA05G009370 and GhA03G040100) of them were likely under positive selection after the formation of the tetraploid cotton. In the subgroup displayed in Fig. 4e, with a GbA05G035510 and Ga04G1413 genes were likely under positive selection after the origination of A genome.
Expression of UXS among various tissues and development
To better understand the tissue-specific expression profile of cotton UXS genes, FPKM (Fragments per Kilobase of transcript per Million mapped reads) values were used to assess their expression levels across different organs and developmental stages (Table S5). As shown in Fig. 5, of these 18 GhUXS genes, GhD11G119700, GhD10G225110, GhA11G114600 and GhA10G216100 were ubiquitously abundant in most tissues, suggesting that they might execute some universal roles in plant growth and development processes. GhD11G208200 and GhD10G225110 is highestly expressed in petal. GhA11G114600 is preferentially expressed in fiber at 10DPA. GhA10G216100 was highestly expressed in root. On the contrary, expression of GhA02G050000, and GhD02G055100 were relatively low in all tissues. With respect to fiber, all transcripts are abundant in 10 and 20 DPA and GhA11G114600 is the highest. During fiber development, all the transcripts are preferentially expressed at 10 or 20DPA.
GhA11G114600 is highly expressed in fiber. Therefore, UXSs play a role in cotton growth. Further analysis indicated that all these highly expressed genes belonged to clade I of the phylogenetic tree. And also, these genes mainly located in cytoplasm, except GhD10G225110.
We carried out quantitative real-time RT-PCR analysis of GhUXS genes in fiber development (Table S6, Fig. 6). The data shows that all of the 18 genes are expressed during fiber development stage and have distinct but partially overlapping expression profiles. The expressions of most of genes increase significantly from 10 DPA to 25 DPA, with peak values about 20 DPA or 25 DPA. This implied that UXS genes expressed highly in the overlapping stage of fiber primary and secondary cell wall synthesis. Contrary to RNA-seq, some of genes are highest expressed at 20DPA. For example, GhD11G119700 is in high expression at 20 DPA, whereas RNA-seq was lower. The inconsistency probably was the consequence of genotype-dependent expressions. In addition, GhA11G114600, GhD11G119700, GhD05G097300 and GhD03G127900 located in cytoplasm have relative high expression. All of this illustrated that UXS gene subcellular location may affect their function.