Identification of the FBP family members
A total of 41 FBP genes from four Gossypium species, including 14 in G. hirsutum (GhFBP), 15 in G. barbadense (GbFBP), 6 in G. arboreum (GaFBP), and 7 in G. raimondii (GrFBP), were identified(Supplementary file 1). In addition, in order to elucidate the evolutionary and phylogenetic relationship of the FBP genes, we identified 73 FBP family genes in nine other species, including 4 in A. thaliana, 5 in T. cacao, 12 in P. trichocarpa, 12 in G. max, 11 in Z. mays, 5 in V. vinifera, 6 in S. moellendorffii, 11 in P. patens, and 7 in O. sativa (Supplementary file 1). The number of FBP genes in tetraploid genomes of G. hirsutumand and G. barbadense (AD genome) is almost twice as those in diploid genomes of G. raimondii (D genome) and in G. arboreum (A genome). These two tetraploid Gossypium genomes were arisen from a natural hybridization between two ancestors of diploid G. raimondii and G. arboreum [34, 36, 38].
Phylogenetic analysis of the FBP Gene Family
To elucidate the evolutionary relationship of the identified FBP proteins among the Gossypium and other species, the amino acid sequences of all the FBP proteins were aligned to identify their phylogenetic similarities with orthologous using the neighbor-joining model of MEGA 7 and the phylogenetic tree was thus constructed as shown in Figure 1a. According to their evolutionary relationships, 114 FBP proteins were divided into 2 groups, cyFBPs that include 40 members and cpFBPs that include 74 members [7, 14]. The result of phylogenetic analysis indicated that the FBPs had a closer evolutionary relationship among the four Gossypium species as compared with those among other species. The phylogenetic result also indicated that among all the other species, cocoa had the closest evolutionary relationship to cotton species [34, 36]. Further phylogenetic analysis of FBPs in the four cotton species indicated that the cyFBPs were assorted into 3 subgroups, while the cpFBPs 4 subgroups (Figure 1b). Each subgroup of GossypiumFBPs consisted of 6 members including one from A genome (G. arboreum) and one from D genome (G. raimondii), two from G. hirsutum, and two from G. barbadense. As both of G. hirsutum and G. barbadense are consisted of At and Dt subgenomes, each subgenome provided one member in a sub-group of FBP family. There is only one subgroup in cpFBPs that had 5 FBPs, in which there is no FBP from G. arboreum identified (Figure 1b).
Gene structure and protein domain of FBP family members
The length of amino acid (aa) sequence of FBP proteins ranged from 341 to 608, 341 to 412, 341 to 428, and 341 to 606 in G. arboreum, G. raimondii, G. hirsutum, and G. barbadense, respectively. The cyFBP group had 18 members (42.87%), which had uniform length of 341 aa sequences with only two exceptions of Gorai.005G080300.1 and GB_A02G1288.1. The cpFBP group had 23 members (57.13%), which had varied length of aa sequences (Figure 1, Supplementary file 2). And The PIs values of the four cotton FBPs ranged from 5.00 to 7.68.
Totally 10 motifs were identified in FBP family in the four Gossypium species with each FBP contains 7 to 9 motifs in general (Figure 2a, Figure S1). The significant difference between cyFBP and cpFBP is that motif 5 is identified exclusively in the former while motif 9 in the latter. Each phylogenetic subgroup had a similar composition and arrangement of motifs, which were highly consistent with the result of phylogenetic analysis. The results also showed some minor variance of motif composition and arrangement between subgroups (Figure 2a).
Gene structure analysis also showed a consistent result as phylogenetic and protein motif analysis (Figure 2b). The exon number of FBP genes ranged from 3 to 12. cyFBPs had 11 to 12 exons while cpFBPs only 3-5 exons. The gene structure of each subgroup is almost the same, which indicated conserved evolution patterns of FBP family members. The cyFBP genes structure could be divided into three types (Figure 2). Both subgroups cyFBP 1 and cyFBP 2 had 12 exons and 11 introns with varied distribution between them. Subgroup cyFBP 3 has 11 exons and 10 introns. Contrast to cyFBPs, cpFBPs had much fewer exons. The cpFBP genes could be assorted into four subgroups. Subgroups cpFBP 1 and cpFBP 2 had 4 exons and 3 introns with different distribution between them. Subgroup cpFBP 4 had 8 exons and 7 introns; while subgroup cpFBP 3 had varied number of exons and introns, the exon number of this subgroup ranged from 3 to 8. The result also indicated that only FBP genes of G. raimondii had UTR structures. It indicated that cyFBPs had more complicated gene structure than cpFBPs did.
Analysis of cis-acting elements in the promoter regions of homologous FBP genes
To further understand how the FBP gene function, the composition and distribution of cis-regulatory elements (CRE) were identified in 5' untranslated region of 2000 bp of each gene in the PlantCare website (Figure 3). The results indicated that composition and distribution of CRE varied significantly across the whole FBP gene family. It also could be seen that the CRE had a highly consistency with the result of gene structure and protein domain and phylogenetic analyses. Each subcategory of FBP genes had same or similar composition and distribution of CREs in their 5’ upstream regions (Figure 3).
Further analysis indicated that the 5’ up-stream regions of FBP genes contain almost all the following categories of CRE: constitutive, inducible and tissue-specific. The constitutive CRE included the typical basic components such as TATA-Box and CAAT-Box. The inducible CRE included photo-responsive elements, ATCC-motif, Box 4, I-Box, Sp1, TCCC-motif, GAG-motif; gibberellin response elements (GARE-motif), P-box; abscisic acid responsive element (ABRE); salicylic acid reaction element, TCA-element; anaerobic induction element, ARE; stress-responsive element, TC-rich repeats, and MBS. In addition, GARE-motif was exclusively identified in the promoter region of one subcategory genes of GH_A02G0701.1, GH_D02G0715.1, GB_A02G0693.1, GB_D02G0741.1, GH_A02G1268.1 and GB_A02G1288.1.
Distribution and collinearity analysis of FBP gene family Gossypium species
In the genome of G. arboreum, FBP genes were identified on chromosomes A02, A03, A04, A10, A11 and A12, while in the genome of G. raimondii, on chromosomes D02, D05, D07, D08, D11, and D12. In the tetraploid genomes of G. hirsutum and G. barbadense, FBP genes had similar distribution on chromosomes At02, At04, At10, At11, At12, Dt02, Dt03, Dt04, Dt10, Dt11 and Dt12. Homologous analysis indicated that the homologous gene identified on A03 of G. arboreum was identified on chromosome At02 of G. hirsutum and G. barbadense.
Tandem and fragmental DNA duplication provided major forces that drive the formation of gene families [39, 40] as well as the genome evolution. In the current study, the duplicating events of cotton FBP genes were analyzed. Although the results did not support any tandem repeat event in the cotton FBP gene family, the collinearity analysis showed that in the two diploid species, the FBP genes were perfectly chromosome-pair-wise homologous (Figure 4a). While in the two tetraploid species, each FBP gene in one species (hirsutum or barbadense) had two homologous genes in both At and Dt subgenomes in its counterpart species (barbadense or hirsutum) (Figure 4d). Collinearity analysis between diploid and tetraploid species indicated that, in G. hirsutum each gene had two homologous genes in the two diploid species (Figure 4b), while in G. barbadense, two FBP genes on GbAt02 did not have homologous genes in raimondii and one FBP gene on GbDt12 did not have homologous gene in arboreum (Figure 4c).
Analysis of selection pressure of FBP genes in four cotton species
Calculating non-synonymous (Ka) and synonymous (Ks) substitution rates is a useful method for assessing sequence variation of protein orthologous in different species or taxa with unknown evolutionary states. The value of Ka/Ks represents the ratio between Ka and Ks of two homologous protein-coding genes. Ka/Ks>1 indicated that the gene was positively selected, and Ka/Ks=1 indicated that the gene was neutrally selected, while Ka/Ks<1 indicated that the gene was purified selected [41]. The Ka/Ks values of homogous FBP genes between G. arboreum and G. raimondii ranged from 0.05 to 0.62, those between G. hirsutum and G. arboretum or G. raimondii ranged from 0 to 0.8; those between G. barbadense and G. arboreum or G. raimondii ranged from 0 to 0.6; those between At and Dt paralogous genes in G. hirsutum and G. barbadense ranged 0.07 to 0.76 and 0.02 to 0.52, respectively (supplementary file 3). These results indicated that the FBP genes in four Gossypium species were under a purifying selection pressure.
FBP gene expression in fiber development, in responses to biotic and abiotic stresses
To explore the potential function of FBP genes in the growth and development of cotton fibers, we downloaded the cotton fiber transcriptome data from NCBI SRA database and reanalyzed the expression profiling of FBP genes. The results of FBP gene expression analysis showed that homologous genes GH_A02G0701.1 and GH_D02G0715.1 from G. hirsutum, and GB_A02G0693.1 and GB_D02G0741.1 from G. Barbadense have higher FPKM values in developing fibers at the stages of 20 days post-anthesis (DPA) and 25 DPA (supplementary file 4). Homologous genes GH_A02G1268.1 and GB_A02G1288.1 have high expression FPKM values in the early stage of the fiber development (0 DPA and 1 DPA ovule) (Figure 6a, b). Expressions of GH_D02G0715.1 and GH_A02G0701.1 in the secondary cell wall synthesis stage of fiber development through qRT-PCR verifications were highly consistent with transcriptome analysis (Figure 6c, d).
In plant response to Verticillium wilt stress, the FPKM values of FBP gene family that were extracted from previous transcriptome data showed that the homologous genes GH_A04G1526.1 and GH_D04G1869.1 had much higher expression value at 24 and 48 hours after inoculation (HAI) of Verticillium dahliae, with their high peaks were reached at 24 HAI (Figure 7a, supplementary file 4). These results suggested a certain biological functions of FBP genes in plant responses to Verticillium wilt stresses.
The results of qRT-PCR showed that both GH_A04G1526.1 and GH_D04G1869.1 had different expression behaviors in root tissues between the susceptible and resistant cultivars at the different developmental stages of Verticillium dahliae in plant after inoculation. In VW tolerant cultivar Jimian 11(J11), both GH_A04G1526.1 and GH_D04G1869.1 had immediate response to the inoculation of Verticillium dahliae and their expression levels reached a maximum at 12 HAI. Then it dropped rapidly and kept fairly low expression levels (Figure 7b and 7c). In VW susceptible cultivar ZZM, GH_A04G1526.1 and GH_D04G1869.1 acted differently, with GH_A04G1526.1 slightly increased its expression after inoculation till 48 HAI and then its expression increased rapidly and reached a peak at 72 HAI (Figure 7b), while GH_D04G1869.1 kept low expression throughout the experiment procedure (Figure 7c). These different responses suggested that GH_A04G1526.1 might take part in resistant reaction while GH_D04G1869.1 in susceptible reaction to Verticillium wilt of cotton.
The responses of FBP genes to salt stress were also evaluated using RNA transcriptome data analysis [42] under salt stresses (Figure 8, supplementary file 4). The transcriptome analysis indicated that six members of FBP family GH_A10G2530.1, GH_D10G2661.1, GH_A11G3741.1, GH_D11G3768.1, GH_A02G1268.1, and GH_D03G0740.1 had significant higher responsive expression to salt stress treatments (Figure 8a). In salt susceptible cultivar CCRI12, the tested genes had similar expression tendency in responses to salt pressure. The expressions of all 6 genes were significantly inhibited within three hours after the salt stress was imposed. This inhibition continued and reached highest till 12 hours after the initiation of stress. After this time, with time proceeds, the plant might develop some sorts of “adaption” mechanisms, their expression recovered to a certain level. In the salt tolerant semi-wild species (MAR85), the inhibition of these gene expressions was to a much less extent. It could be seen from the results the expression level of these genes at 12h in salt resistant material was almost twice than those in the salt sensitive materials (Figure 8).