Identification of FBP family members
A total of 41 FBP genes from four Gossypium species, including 14 in G. hirsutum (GhFBP), 15 in G. barbadense (GbFBP), 6 in G. arboreum (GaFBP), and 7 in G. raimondii (GrFBP), were identified in this report (Supplementary file 1). The number of FBP genes in the tetraploid genomes of G. hirsutum and G. barbadense (AD genome) was almost double those in the diploid genomes of G. raimondii (D genome) and G. arboreum (A genome). These two tetraploid Gossypium genomes arose from a natural hybridization between two ancestors of diploid G. raimondii and G. arboreum [36, 38, 40].
In addition, in order to elucidate the evolutionary and phylogenetic relationship of these FBP genes, we identified 73 FBP family genes in nine other species, including 4 in Arabidopsis thaliana, 5 in Theobroma cacao, 12 in populus trichocarpa, 12 in Glycine max, 11 in Zea mays, 5 in Vitis vinifera, 6 in Selaginella moellendorffii, 11 in Physcomitrella patens, and 7 in Oryza sativa (Supplementary file 1).
Phylogenetic analysis of the FBP Gene Family
To elucidate the evolutionary relationship of the identified FBP proteins between Gossypium and other species, the amino acid sequences of all the FBP proteins were aligned to identify their phylogenetic similarities with orthologs using the neighbor-joining model from MEGA 7, and a phylogenetic tree was thus constructed as shown in Figure 1a. According to their evolutionary relationships, 114 FBP proteins were divided into 2 groups: cytosolic FBPs, (cyFBPs) which included 40 members; and chloroplast FBPs, (cpFBPs) which included 74 members [7, 14]. The result of phylogenetic analysis indicated that FBPs had a closer evolutionary relationship between the four Gossypium species as compared with other species. The phylogenetic results also indicated that between all the other species, cocoa had the closest evolutionary relationship to the examined cotton species [36, 38]. Further phylogenetic analysis of FBPs from the four cotton species indicated that the cyFBPs were assorted into three subgroups, while the sorted into cpFBPs four subgroups (Figure 1b). Each subgroup of Gossypium FBPs consisted of six members, including one from the A genome (G. arboreum) and one from the D genome (G. raimondii), two from G. hirsutum, and two from G. barbadense. As both of G. hirsutum and G. barbadense are comprised of At and Dt subgenomes, each subgenome provided one member in each sub-group of the FBP family. There is only one subgroup in cpFBPs that had 5 FBPs, but there was no FBP from G. arboreum identified in these analyses (Figure 1b).
Gene structure and protein domain of FBP family members
The length of amino acid (aa) sequences of FBP proteins ranged from 341 to 608, 341 to 412, 341 to 428, and 341 to 606 in G. arboreum, G. raimondii, G. hirsutum, and G. barbadense, respectively. The cyFBP group had 18 members (42.87%), which had a uniform length of 341 aa with only two exceptions, namely Gorai.005G080300.1 and GB_A02G1288.1. The cpFBP group had 23 members (57.13%), which had a varied length of aa sequences (Figure 1, Supplementary file 2). The PI values of the four cotton FBPs ranged from 5.00 to 7.68.
In total, 10 motifs were identified in the FBP family in the four Gossypium species, with each FBP containing 7 to 9 motifs in general (Figure 2a, Figure S1). The significant difference between cyFBPs and cpFBPs was that motif 5 was identified exclusively in cyFBPs, while motif 9 was exclusively present in cpFBPs. Each phylogenetic subgroup had a similar composition and arrangement of motifs, which was highly consistent with the results of phylogenetic analysis. The results also showed some minor variance in motif composition and arrangement between the subgroups (Figure 2a).
Gene structure analysis also showed consistent results to our phylogenetic and protein motif analyses (Figure 2b). The exon number of FBP genes ranged from 3 to 12. cyFBPs had 11 to 12 exons, while cpFBPs only had 3–5 exons. The gene structure of each subgroup was almost the same, which indicated conserved evolution patterns for FBP family members. The cyFBP gene structures could be further divided into three types (Figure 2). Both subgroups cyFBP 1 and cyFBP 2 had 12 exons and 11 introns, with a varied distribution between them. Subgroup cyFBP 3 had 11 exons and 10 introns. In contrast to cyFBPs, cpFBPs had much fewer exons. The cpFBP genes could be sorted into four subgroups. Subgroups cpFBP 1 and cpFBP 2 had 4 exons and 3 introns, with different distributions between them. Subgroup cpFBP 4 had 8 exons and 7 introns, while subgroup cpFBP 3 had a varied number of exons and introns, and the exon number of this subgroup ranged from 3 to 8. The results also indicated that only FBP genes from G. raimondii had UTR structures. This indicated that cyFBPs had more complicated gene structures than cpFBPs had.
To further understand how FBP genes function, the composition and distribution of cis-regulatory elements (CRE) were identified in the 5' untranslated regions 2000 bp upstream of each gene from the PlantCare website (Figure 3). The results indicated that the composition and distribution of CREs varied significantly across the whole FBP gene family. It also could be seen that the CREs had a high congruency with the results of gene structure, protein domain, and phylogenetic analyses. Each subcategory of FBP genes had identical or similar compositions and distributions of CREs in their 5’ upstream regions (Figure 3).
Further analysis indicated that the 5’ up-stream regions of FBP genes contained almost all of the following categories of CREs: constitutive, inducible and tissue-specific. The constitutive CREs include typical basic components such as TATA-Boxes and CAAT-Boxes. Inducible CREs included photo-responsive elements, ATCC-motifs, Box 4, I-Boxes, Sp1, TCCC-motifs, GAG-motifs, gibberellin response elements (GARE-motifs), P-Boxes, abscisic acid responsive elements (ABREs), salicylic acid reaction elements, TCA-elements, anaerobic induction elements (AREs), stress-responsive elements, TC-rich repeats, and MYB binding site (MBS). In addition, the GARE-motif was exclusively identified in the promoter region of one subcategory of genes including GH_A02G0701.1, GH_D02G0715.1, GB_A02G0693.1, GB_D02G0741.1, GH_A02G1268.1, and GB_A02G1288.1.
Distribution and collinearity analysis of the FBP gene family Gossypium species
In the genome of G. arboreum, FBP genes were identified on chromosomes A02, A03, A04, A10, A11, and A12, while in the genome of G. raimondii, FBP genes were identified on chromosomes D02, D05, D07, D08, D11, and D12. In the tetraploid genomes of G. hirsutum and G. barbadense, FBP genes had similar distribution on chromosomes At02, At04, At10, At11, At12, Dt02, Dt03, Dt04, Dt10, Dt11, and Dt12. Homologous analysis indicated that a homologous gene identified on A03 of G. arboreum was identified on chromosome At02 in G. hirsutum and G. barbadense.
Tandem and fragmental DNA duplication provides major forces that drive the formation of gene families [41, 42] as well as whole genome evolution. In the current study, the duplication events of cotton FBP genes were analyzed. Although the results did not support any tandem repeat events occurring during the evolution of the cotton FBP gene family, collinearity analysis showed that in these two diploid species the FBP genes were perfectly chromosome-pair-wise homologous (Figure 4a). Meanwhile, in the two tetraploid species, each FBP gene from one species (hirsutum or barbadense) had two homologous genes in both the At and Dt subgenomes in its counterpart species (barbadense or hirsutum) (Figure 4d). Collinearity analysis between diploid and tetraploid species indicated that in G. hirsutum each gene had two homologous genes in the two diploid species (Figure 4b), while in G. barbadense, two FBP genes on GbAt02 did not have homologous genes in raimondii and one FBP gene at GbDt12 did not have a homologous gene in arboreum (Figure 4c).
Analysis of selection pressure of FBP genes in four cotton species
Calculating non-synonymous (Ka) and synonymous (Ks) substitution rates is a useful method for assessing sequence variation of protein orthologous in different species or taxa with unknown evolutionary states . The value of Ka/Ks represents the ratio between Ka and Ks of two homologous protein-coding genes. Ka/Ks > 1 indicates that a gene has been positively selected, while a Ka/Ks = 1 indicates that a gene has been neutrally selected, and a Ka/Ks < 1 indicates that a gene has been selectively purified . The Ka/Ks values of homologous FBP genes between G. arboreum and G. raimondii ranged from 0.05 to 0.62, while those between G. hirsutum and G. arboretum or G. raimondii ranged from 0 to 0.8. Those between G. barbadense and G. arboreum or G. raimondii ranged from 0 to 0.6, and the values between At and Dt paralogous genes in G. hirsutum and G. barbadense ranged 0.07 to 0.76 and 0.02 to 0.52, respectively (supplementary file 3). These results indicated that the FBP genes in these four Gossypium species were under purifying selection.
FBP gene expression in fiber development and in response to biotic and abiotic stresses
To explore the potential function of FBP genes in the growth and development of cotton fibers, we downloaded cotton fiber transcriptome data from the NCBI SRA database and reanalyzed the expression profiling of FBP genes. The results of FBP gene expression analysis showed that the homologous genes GH_A02G0701.1 and GH_D02G0715.1 from G. hirsutum, and GB_A02G0693.1 and GB_D02G0741.1 from G. Barbadense had higher FPKM values in developing fibers at 20 days post-anthesis (DPA) and 25 DPA (supplementary file 4). The homologous genes GH_A02G1268.1 and GB_A02G1288.1 had high expression FPKM values in the early stage of the fiber development (0 DPA and 1 DPA ovule) (Figure 6a, b). The expression of GH_D02G0715.1 and GH_A02G0701.1 in the secondary cell wall synthesis stage of fiber development through qRT-PCR validation assays were consistent with in silico transcriptome analysis (Figure 6c, d).
In plant response to Verticillium wilt stress, the FPKM values of the FBP gene family members that were extracted from the previously mentioned transcriptome data showed that the homologous genes GH_A04G1526.1 and GH_D04G1869.1 had much higher expression values at 24 and 48 hours after inoculation (HAI) with Verticillium dahliae, with their highest peaks being reached at 24 HAI (Figure 7a, supplementary file 4). These results suggested a certain biological function of FBP genes in plant responses to Verticillium wilt stress.
The results of qRT-PCR analysis showed that both GH_A04G1526.1 and GH_D04G1869.1 had different expression behaviors in root tissues between susceptible and resistant cultivars at different developmental stages of Verticillium dahliae after inoculation. In the VW tolerant cultivar Jimian 11(J11), both GH_A04G1526.1 and GH_D04G1869.1 had immediate responses to inoculation with Verticillium dahliae and their expression levels reached a maximum at 12 HAI. The levels then dropped rapidly and maintained fairly low expression levels (Figure 7b and 7c). In the VW susceptible cultivar ZZM, GH_A04G1526.1 and GH_D04G1869.1 acted differently, with GH_A04G1526.1 slightly increasing its expression after inoculation up to 48 HAI, followed by its expression increasing rapidly and reaching a peak at 72 HAI (Figure 7b), while GH_D04G1869.1 maintained low expression throughout the entire experimental procedure (Figure 7c). These different responses suggested that GH_A04G1526.1 might take part in resistant reactions, while GH_D04G1869.1 participated in susceptible reactions to Verticillium wilt in cotton.
The responses of FBP genes to salt stress were also evaluated using RNA transcriptome data analysis  under salt stress (Figure 8, supplementary file 4). Our transcriptome analysis indicated that six members of the FBP family, GH_A10G2530.1, GH_D10G2661.1, GH_A11G3741.1, GH_D11G3768.1, GH_A02G1268.1, and GH_D03G0740.1, had significantly higher responsive expression to salt stress treatments in foliage and two members, GH_A04G1526.1 and GH_D04G1869.1, had significantly higher responsive expression in roots (Figure 8). In the salt susceptible cultivar CCRI12, the tested genes that had expressions in foliage had similar expression tendencies in responses to salt pressure. Their expressions were significantly inhibited within three hours after salt stress was imposed. This inhibition continued and reached its highest at 12 hours after the initiation of stress. After this time, as time proceeded, the plant began to develop some sorts of “adaption” mechanisms, and their expression recovered to a certain level. In the salt tolerant semi-wild species MAR85, the inhibition of these genes was to a much smaller extent. It could be seen from our results that the expression levels of these genes at 12 h from salt resistant material were almost double those from the salt sensitive materials. These expression differences between two cultivars reached significant level at least in one treatment stage (Figure 8). Both GH_A04G1526.1 and GH_D04G1869.1 had significant higher responsive expressions in root tissues of CCRI12 than in root tissues of MAR85 (Figure 8).