Identification and comparative analysis of VQs in plants
To identify VQ family in G. hirsutum, G. barbadense, G. raimondii and G. arboretum, the AtVQ proteins were used as query sequences to search against the protein databases of the four Gossypium species, and the VQ-domain Pfam (PF05678) was also applied. In total, 89 GhVQs, 89 GbVQs, 45 GrVQs, and 45 GaVQs were identified and named in G. hirsutum, G. barbadense, G. raimondii and G. arboretum, respectively (Additional File 1, Supplemental Table S1). In addition, the physiological and biochemical properties of 268 VQs in Gossypium species were determined, including CDS length, GC count, isoelectric point (pI) and molecular weight (MW) (Additional File 1, Supplemental Table S1). The CDS lengths of these Gossypium VQs ranged from 279 bp (GhVQ89 and GbVQ89) to 1443 bp (GbVQ15), the average GC content of the transcripts was 46.01, their exon numbers varied from 1 to 9, and only a small percentage of VQs contained introns (3.37% GhVQs, 3.37% GbVQs, 6.67% GaVQs, and 31.11% GrVQs). The pI values varied from 4.159 (GbVQ33 and GbVQ78) to 11.496 (GhVQ07) and the MW values ranged from 10.346 kDa (GbVQ89) to 52.058 kDa (GbVQ15) (Additional File 2, Supplemental Fig. S1).
To perform comparative genomic analyses, we searched another 11 plant species for VQ proteins. The evolutionary relationships of the species and the number of their VQ genes are shown in Fig. 1. The data revealed that the numbers of VQs in A. trichopoda, P. dactylifera, V. vinifera, P. trichocarpa, and T. cacao were less than those in the four Gossypium species (Additional File 3, Supplemental Table S2). The comparative structure analysis of VQs showed that almost all the VQs had a few introns and encoded relatively small proteins, and only 3 GhVQs, 3 GbVQs, 3 GaVQs and 14 GrVQs had more than one intron. We speculated that the WGD events that occurred during the evolution of angiosperms increased the numbers of the Gossypium VQs, and these events helped the VQs to gain new functions through neofunctionalization. However, the evolutionary forces that shaped the current intron/exon gene structures remain unknown.
Phylogenetic analysis of VQs
To explore the relationships among VQs in Gossypium, we conducted a phylogenetic analysis of the VQs from the 15 plant species (Fig. 2), and a phylogenetic tree between Gossypium spp. and Arabidopsis was also constructed (Additional File 4, Supplemental Fig. S2). The tree contained 656 VQs. These proteins were divided into ten clades based on the nomenclature of the Arabidopsis VQs. The largest group (Group Ⅰ) contained 20 GhVQs, 20 GbVQs, 10 GaVQs, and 10 GrVQs. Group Ⅸ was the smallest group, including 4 GhVQs, 5 GbVQs, 3 GaVQs and 3 GrVQs. Previous research has verified that VQ proteins contain a conserved motif composed of F**hVQ*hTG [4, 5]. In our study, among the 656 VQs, 212 proteins (in Group Ⅰ and Group Ⅱ) had the amino acid “M” next to “VQ” (simple M-VQ model); 159 proteins (in Group Ⅲ, Group Ⅳ, and Group Ⅶ) had the amino acid “V” next to “VQ” (simple V–VQ model); and 285 proteins (in Group Ⅴ, Group Ⅵ, Group Ⅷ, Group Ⅸ and Group Ⅹ) had the amino acid “L” next to “VQ” (simple L-VQ model) (Additional File 5, Supplemental Fig. S3). The VQs with rarer amino acids of the Gossypium species were also scattered in Group Ⅰ to Group Ⅹ, and the clusters of VQs were similar to those in angiosperms [3].
Cis-regulation elements and structural composition of the VQs
The cis-regulation elements in the promoters (from 2000 bp to -1 bp) of the four Gossypium VQs were analyzed using the PlantCARE tool. 715, 701, 386 and 399 cis-regulation elements from the GhVQs, GbVQs, GaVQs and GrVQs were identified, respectively. Among these, seven kinds of hormone-responsive cis-regulation elements, ABRE (ABA-responsive element), P-box, TGA-box, TGA-element, TCA-element, CGTCA-motif, and GARE-motif were associated with ABA, ethylene, salicylic acid (SA), methyl jasmonate (MeJA), auxin (IAA), and gibberellin (GA), respectively; and six regulatory elements, MBS, TC-rich repeats, LTR, DRE-motif, W-box, and CCAAT-box were related to drought, cold stress, and pathogen defense (Fig. 3). Moreover, the promoters of 66 GhVQs, 69 GbVQs, 36 GrVQs and 34 GaVQs possessed WRKY-binding sites (W-box) (Fig. 3). The diversity of the cis-regulation elements in the promoters of VQs indicated that VQs might participate in regulating the Gossypium response to endogenous hormones and diverse environmental stimuli.
Motif compositions and exon-intron structures of the VQs were shown in Fig. 4. Combining the phylogenetic data groups of the four Gossypium VQs, we found that there were more motif types in Group Ⅳ, including Motif 1, Motif 4, Motif 7, Motif 9, and Motif 10; and followed by Group Ⅱ, containing Motif 1, Motif 2, Motif 3, Motif 6, and Motif 8. Not surprisingly, Motif 1 existed in almost all of the VQs, suggesting that it was the most conservative motif. The differences in motif composition among the four Gossypium VQs suggested that they might perform different functions in diverse Gossypium species. Most of the VQs had no intron, including 96.63% (86/89) of GhVQs, 96.63% (86/89) of GbVQs, 68.89% (31/45) of GrVQs, and 93.33% (42/45) of GaVQs. The remaining VQs, which were widely distributed in Group Ⅶ and Group Ⅹ, contained one to eight introns. In general, VQs in the same clades would share similar motif elements and structural compositions, indicating that the VQ members in the same subgroup could have similar functions.
Chromosomal distribution, synteny and duplication analysis
In this study, VQs were detected located in most chromosomes with a few exceptions. For example, Gh_A09 and Gh_D02 were in G. hirsutum, Gb_A09, Gb_A13 and Gb_D02 were in G. barbadense, Ga_Chr03 and Ga_Chr09 in G. arboretum, and Gr_Chr05 were G. raimondii (Fig. 5). For the two allotetraploid species of Gossypium, Gh_D05 (eight genes/~9%), Gb_A05 (eight genes / ~9%), and Gb_D05 (eight genes / ~9%) contained more VQ genes than other chromosomes, while Gh_A02, Gh_A03, Gh_A13, Gh_D08, Gh_D13, Gb_A02, Gb_A03, Gb_D08, Gb_D09, and Gb_D13 only contained one gene. For the two diploid species, Ga_Chr05 (eight genes / ~17.8%), Gr_Chr02 (seven genes / ~17.8%), Gr_Chr07 (seven genes / ~17.8%) and Gr_Chr09 (seven genes / ~17.8%) contained more VQs, and Ga_Chr08, Ga_Chr13, Gr_Chr04, Gr_Chr12, and Gr_Chr13 only contained one gene. Most VQs in the four Gossypium species were distributed at both ends of the chromosomes, which corresponded to the position of the telomere.
The orthologous VQs were first identified between G. hirsutum and G. arboreum with G. raimondii. A total of 83 GhVQs were orthologous genes in the two-diploid Gossypium, of which 40 gene pairs showed A genome origin, while 43 gene pairs showed D genome origin (Additional Files 6 and 7, Supplemental Fig. S4 and S7, Table S3). Subsequently, orthologous gene identification was also conducted between G. barbadense and G. arboreum with G. raimondii, and there were 84 orthologous GbVQs. Of which, 40 gene pairs showed A genome origin, while 44 gene pairs showed D genome origin (Additional Files 7 and 8, Supplemental Fig. S5 and S3). Orthologous genes between G. hirsutum and G. barbadense were also identified. It was found that 39 gene pairs in Gh_At and Gb_At subgenomes and 42 gene pairs in Gh_Dt and Gb_Dt subgenomes (Fig. 6 and Additional File 7, Supplemental Table S3). In addition, GhVQ27, GhVQ61, GhVQ68, GbVQ28, GbVQ60 and GbVQ67 had no orthologous genes in the diploid Gossypium species.
As previously described, duplication contributed to the expansion of genes in the polyploid events in plants [36]. The tetraploid species Gossypium of have undergone a genome duplication since their diverge from the two diploid species of Gossypium. In our study, we have identified the VQ duplication event, and the WGD/segmental event likely contributed to the expression regulation of VQs in Gossypium. The percentages of VQs derived from WGD were 60.47% in the At-subgenome of G. hirsutum, 63.04% in the Dt-subgenome of G. hirsutum, 69.04% in the At-subgenome of G. barbadense, 59.57% in the Dt-subgenome of G. barbadense, 62.22% in G. raimondii, and 57.78% in G. arboretum (Additional File 9, Supplemental Table S4). Gene duplication events after the divergence of Gossypium species resulted in a high number of paralogous genes in both allotetraploid Gossypium species.
Prediction of miRNA target sites
miRNA had been predicted to target the VQ genes in Arabidopsis [37, 38] and tea [39]. To determine the miRNA-mediated post-transcriptional regulation of VQs in two allotetraploid species of Gossypium, we predicted the target sites of miRNAs in the coding (CDS) regions of the GhVQs and GbVQs. In G. hirsutum, 46 sites of 34 GhVQs were detected that could be targeted by 22 miRNAs, while 46 sites of 32 GbVQs could be targeted by 21 miRNAs (Fig. 6 and Additional File 10, Supplemental Table S5). Of these, six VQ genes (GhVQ02, GhVQ40, GhVQ86, GbVQ02, GbVQ40 and GbVQ86) were predicted to be targeted by Ghr-miR172 in the CDS regions; and six VQ genes (GhVQ39, GhVQ52, GhVQ85, GbVQ39, GbVQ51 and GbVQ85) were targeted by Ghr-miR156 (Ghr-miR156a, Ghr-miR156b, Ghr-miR156c and Ghr-miR156d) at ten prediction sites. Ghr-miR172 and Ghr-miR156 were reported to be involved in some biological processes, including the responses to developmental cues and abiotic stress in plants [40-42]. However, it requires further experiments to verify the regulation mechanism and functions of those predicted miRNAs and their targets in Gossypium.
Expression pattern analysis and function verification
Expression profiles of the VQs in the two allotetraploid kinds of Gossypium were analyzed with available transcriptome data (Additional Files 11 and 12, Supplemental Fig. S6 and S7 and Supplemental Table S6). In this study, the GhVQs and GbVQs with average FPKM values > 5 and being present in at least two samples were identified as potentially expressed transcripts. Fifty-seven GhVQs and 39 GbVQs were selected, and their expression profiles tested in ten tissues, including the anther, bract, filament, leaf, petal, pistil, root, sepal, stem and torus (Fig. 7). For 57 GhVQs, 22 genes were highly expressed in roots and GhVQ82 had the highest expression level, and 14 genes were highly expressed in leaves. There were only a few genes expressed in the anther, bract, filament, petal, pistil, sepal, stem and torus (Fig. 7A). For 39 GbVQs, 17 GbVQs were strongly highly expressed in roots, and 12 genes were strongly expressed in leaves (Fig. 7B). The different expression profiles of VQ genes suggest that they have different functions in distinct tissues and developmental stages.
VQs were widely identified in the abiotic stress responses in angiosperms [3, 32]. In this study, the expression patterns of VQs in the allotetraploid Gossypium types under salt, drought, cold and heat stresses were analyzed using the published data. In total, 43 GhVQs and 37 GbVQs had different expression profiles under the four abiotic stress treatments (Fig. 8). Under salt stress, 29 GhVQs were significantly up-regulated at 12 h, and 21 GbVQs were up-regulated at 6 h (Fig. 8A and C). Upon PEG treatment, most of the GhVQs and GbVQs were highly expressed at 12 h (Fig. 8B and D). During cold stress, 33 GhVQs and 19 GbVQs were up-regulated at 24 h (Fig. 8E and G). Most of the GhVQs and GbVQs under the hot treatment were highly expressed at 1 h (Fig. 8F and H). To validate of the expression results of GhVQs in response to salt and drought stresses, we conducted qRT-PCR analyses of 12 GhVQs after treatments with PEG and salt treatment. In the presence of PEG, GhVQ08, GhVQ18, GhVQ62, GhVQ64, GhVQ80, and GhVQ84 had high expression levels at 48 h, while these GhVQs except GhVQ18 and GhVQ84, were highly expressed during 24–48 h under salt treatment (Fig. 9). The qRT-PCR results were slightly different from the RNA-seq data, but these findings suggest that some GhVQs were involved in plant response to drought and salt stresses.
Co-expression and interaction networks of GhVQs
To understand the putative roles of VQs in plant adaptation to drought and salt stresses, we conducted a co-expression analysis. Ten GhVQs were found to co-express with another 227 functional genes (Fig. 10 and Additional File 13, Supplemental Table S7). Among these, six and seven VQs were identified in different modules of drought stress and salt stress, respectively, while GhVQ37, GhVQ59 and GhVQ83 were detected coexisting during the two stress treatments. Moreover, these 227 genes co-expressing with ten GhVQs, contained multiple TFs, including domain AP2, bHLH, F-box, GRAS, p450, PLB03212, WD40 and WRKY (Fig. 10 A–E and Additional File 13, Supplemental Table S7). The functional regulation networks of the GhVQs were constructed using the website of STRING11.0 with the module reference of Arabidopsis association, and the results revealed that the GhVQs participated in plant defense interaction networks, including WRKYs, MYB15, MPK4, AR781, CSN5B and SIGAs (Fig.10 F). Indeed, VQ proteins could interact with WRKYs and other TFs to defend against abiotic stresses in cotton.