Identification and sequence analysis of the cotton expansin gene family
We have identified the expansin gene family in the G. hirsutum genome. As a result, 98 candidate expansin genes were initially obtained. According to the analysis of conserved expansin domains, 93 expansin genes with both DPBB-1 (domain I) and Pollen_allerg_1 (domain II) domains were ultimately identified for further analysis. Each expansin gene was named according to nomenclature guidelines. The detailed results are shown in Table S1. The expansin gene family contained four subfamilies, including EXPA, EXPB, EXLA, and EXLB. For the expansin gene family in G. arboreum and G. raimondii. the same analysis methods were performed. As a result, 49 and 45 expansin genes were identified in the G. arboreum and G. raimondii genomes, respectively. These expansin genes were also divided into 4 subfamilies. The detailed results are shown in Table S2 and Table S3.
We analysed the biochemical properties of expansin proteins (Table S1). The pI values of expansin family members ranged from 4.65 (GhEXLB1l) to 12.01 (GhEXPA4c), with an average of 8.47. The pI values of all EXPA and EXLA members were above 7.0, except for those of GhEXPA8b and GhEXPA7d. However, the pI values of all EXPBs and EXLBs were below 7.0, except for those of GhEXPB3a, GhEXPB3b, GhEXPB1a, GhEXPB1b, GhEXLB1d, and GhEXLB1j (Additional file 1: Table S1). The average MW of expansin family members was 27.42 kD, ranging from 14.29 (GhEXPA17e) to 41.53 (GhEXPA5e) kD. The length of expansin protein sequences ranged from 150 (GhEXLA17e) amino acids (aa) to 366 aa (GhEXPA5e), and the signal peptide length ranged from 17 (GhEXPA15d, GhEXPA15g and GhEXLA1d) to 35 (GhEXLA1a) aa (Additional file 1: Table S1).
The multiple sequence alignment results of 93 expansin proteins from G. hirsutum showed that they had similar sequence characteristics: the majority of them consisted of a signal peptide and conserved domains I and II (Additional file 2: Figure S1), which was consistent with the findings of a previous study [3]. The amino acid sequence of domain I was more conserved than that of domain II, especially among EXPA members (Additional file 2: Figure S1). Notably, almost all of the EXPAs (excluding GhEXPA13a, GhEXPA13b, and GhEXPA15d) and three EXLA members (GhEXLA17a, GhEXLA17b, and GhEXLA17c) contained a conserved motif (HFD) in domain I (Additional file 2: Figure S1). Members of EXPB, EXLB, and the six other EXLA members did not have the HFD motif. Six EXLAs contained an extra segment named the EXLA extension of the C terminus. The EXLA extension sequence feature was found only in the EXLA subfamily, and the amino acid sequences of the EXLA extension were as follows: “DIAK(Q)EGCS(F)P(H)CDD(Y)S(G)H(N)WR(-)”. In addition, a conserved motif named BOX 1 was found in almost all the expansin members (Additional file 2: Figure S1).
Phylogenetic relationships, gene structure and protein motifs of cotton expansin genes
To evaluate the evolutionary relationships of cotton expansins, a phylogenetic tree was constructed. The expansins were divided into four major subfamilies, namely, EXPA, EXPB, EXLA and EXLB. The EXPA subfamily was the largest group, with 67 members, and the other subfamilies contained eight (EXPB), six (EXLA), or 12 (EXLB) members. The four expansin subfamilies comprised 15 subgroups (Fig. 1). We discovered that EXPA-IV was the largest subgroup, which included 17 expansin members, and EXPA-VII, EXPA-VIII, and EXPA-IX were the smallest subgroups, with only two expansin members each.
The results of gene structure (exon-intron organization) analysis showed that the expansin members included two to five exons, and the same subfamilies had similar characteristics of exon types (Fig. 2a, b). Most of the EXPA members had three exons (51 of 67 EXPA members). Twelve EXPAs had two exons, and four EXPAs had four exons. All members of the EXPB subfamily had four exons except for GhEXPB1a (five exons). Four EXLA members contained five exons, and two members had four exons. EXLB members had four (seven EXLBs) or five exons (five EXLBs).
We identified the conserved motifs in expansin proteins. As a result, a total of ten distinct motifs were identified (Fig. 2c, Additional file 2: Figure S2). The motifs of all cotton expansins had unifying features; for example, each expansin protein contained motif 5, and all of them contained motif 4, except for GhEXPA15d, GhEXPA17e, and GhEXLB1j. In addition, the type, arrangement, and number of motifs were similar within the same subfamily. More than half of the EXPA members (38/67) had seven motifs, and 21 members had six motifs. The EXPB, EXLA, and EXLB subfamilies possessed similar motif characteristics, and most of them contained five motifs (motifs 4, 5, 7, 8, and 9). GhEXPB2d and GhEXPB3b of the EXPB subfamily included four motifs, and three members (GhEXLB1g, GhEXLB1c, and GhEXLB1j) of EXLB also had the same number of motifs. These results showed that the EXPB, EXLA, and EXLB subfamilies had close evolutionary relationships. The similarities between gene structures and sequence motifs implied that cotton expansin family genes underwent duplication over evolutionary time.
Chromosomal location and collinearity analysis of the expansin gene family
The chromosomal location of GhEXP genes was identified in G. hirsutum. The results are shown in Fig. 3. A total of 93 expansin genes were distributed on 24 chromosomes, excluding Ghir_A02 and Ghir_D06. The chromosome Ghir_A05 contained eight expansin genes, whereas Ghir_A06 included only one expansin gene. The numbers of expansin genes located on other chromosomes ranged from two to seven. In addition, some of the expansin genes were located on the chromosome in clusters; for example, both Ghir_A08 and Ghir_D08 possessed a gene cluster with four distinct EXLBs (Fig. 3). These results showed that the expansin genes were unevenly distributed on each chromosome. Collinearity analysis showed that expansin genes were frequently collinear between the A and D sub-genomes (Fig. 4), which indicated that expansin genes with collinear relationships may have similar functions.
Investigation of cis-acting elements in the promoter regions of expansin genes
We identified the cis‑acting regulatory elements of the cotton expansin gene family. The results showed that the cis‑acting regulatory elements of expansin genes were extremely diverse (Additional file 1: Table S4; Table S5). These elements were divided into 7 categories and 111 types, including 31 light-responsive elements, 7 development-related elements, 13 hormone-responsive elements, 5 environmental stress-related elements, 3 promoter-related elements, 7 site-binding-related elements and 44 other elements (no functions). Among them, the light and hormone responsive types were especially abundant (Additional file 1: Table S4; Table S5).
All 93 GhEXP genes possessed 15,200 elements, including 1,268 light-responsive elements, 144 development-related elements, 779 hormone-responsive elements, 409 environmental stress-related elements, 9416 promoter-related elements, 81 site-binding-related elements and 3,103 other elements (Additional file 1: Table S4). Out of 93 GhEXP genes, 83 possessed a Box 4 element (part of a conserved DNA module involved in light responsiveness), 70 possessed a GT1 motif (light-responsive element), 57 had a G-box (cis-acting regulatory element involved in light responsiveness) with 70 enriched ABRE elements (the cis-acting element involved in abscisic acid responsiveness), 75 contained an ERE, 56 had a TGACG motif as well as a TGACG motif, which are the cis-acting regulatory elements involved in MeJA responsiveness, 73 harboured an ARE (cis-acting regulatory element essential for anaerobic induction), and 40 possessed an MBS (MYB binding site involved in drought inducibility). Moreover, these relatively abundant elements were also more conserved among the GhEXP gene family. In addition, all the GhEXP genes contained a CAAT-box and TATA-box, which are the core elements of the promoter in eukaryotes, and these genes contained the largest numbers of these elements (Additional file 1: Table S5).
Expression patterns of the expansin genes in cotton fibre
To comprehensively investigate the temporal expression patterns of the cotton expansin gene family, fibre samples at different developmental stages were used for transcriptome analysis. A heat map was constructed with these transcriptome data (Fig. 5). The 86 expansin genes displayed different expression patterns. The remaining seven expansin genes were not detected in the transcriptome data. Although the expression patterns of expansin genes displayed obvious differences, clustered expansin genes generally possessed similar expression patterns. For example, GhEXPA1d, GhEXPA15d, GhEXPA15a, GhEXPA4o, GhEXPA4a and GhEXPA4b were the preferentially expressed genes during the fibre initiation and elongation stages (0 to 15 DPA), whereas GhEXLA1f and GhEXLA1c had higher expression during the middle and later cotton fibre developmental stages (after 15 DPA). In addition, GhEXPA4f and GhEXPA2, two homologous genes located on the A and D sub-genomes, respectively, were sharply up-regulated from 3 DPA, with very similar expression patterns (Fig. 5), suggesting that they may have similar or complementary functions in cotton fibre development. To verify our transcriptome results, the GhEXP gene expression profiles were further confirmed using publicly available RNA-seq data. The expression profiles of GhEXP genes were generally consistent with our transcriptome results (Fig. 5 and Figure S3).
To avoid missing possible important expansin genes, we also analysed the transcript levels of seven expansin genes that were not detected in the transcriptome data (Fig. 5; Table S1). qRT-PCR showed that the seven expansin genes were scarcely expressed, except for GhEXLB1h, with low expression levels in ovules and fibres. In addition, we found that these genes can be detected in other tissues, but their expression levels were not high (Additional file 2: Figure S4).
qRT-PCR analysis of the special expansin genes in cotton fibres
To further identify the key expansin genes involved in fibre cell growth, 14 expansin genes that are predominantly expressed in different developmental stages of cotton fibres were selected to verify their expression level using a qRT-PCR experiment. These expansin genes were evidently up-regulated at the initiation, elongation, or transition stage (Fig. 6) and displayed almost consistent expression tendencies when compared to those in the transcriptome data (Additional file 2: Figure S5).
We found that GhEXPA4o, GhEXPA1a, and GhEXPA8h were predominantly expressed at 0 DPA (Fig. 6a), suggesting that these three genes may function in the initial developmental stage of fibre cells. Nine expansin genes showed higher expression levels at the fibre elongation stage, with distinct expression characteristics (Fig. 6b). The expression of GhEXPA4a reached a peak at 3 DPA, and that of GhEXPA13a and GhEXPA4f peaked at 5 DPA. The expression levels of GhEXPA4q, GhEXPA8f, and GhEXPA2 were the highest at 7 DPA, and those of GhEXPA8g, GhEXPA8a, and GhEXPA4n peaked at 10 DPA (Fig. 6b). GhEXPA4f and GhEXPA2 are homologous genes in allotetraploid cotton species that are located in the A and D sub-genomes of the 10th chromosome, respectively, and both genes have specific expression in cotton fibre cells. Moreover, we found that GhEXPA8a and GhEXPA8g were important genes during cotton fibre elongation. These results revealed that the expression peaks of the majority of genes appeared from 7 to 10 DPA, which are usually called the fast elongation stages. In addition, we obtained two expansin genes that were predominantly expressed at transition stages, named GhEXLA1c and GhEXLA1f (Fig. 6c). Both of them belong to the EXLA subfamily and have unclear biological roles. The expression levels of GhEXLA1c and GhEXLA1f were the highest at 20 DPA, which is the transition stage of fibre cells from fast elongation to secondary cell wall synthesis.
To better understand the potential functions of 14 expansin genes, their expression profiles were detected in 11 different tissues, including roots, hypocotyls, stems, leaves, calycles, petals, pollen, stigmas, and fibres at 0 DPA, 10 DPA and 20 DPA. The results showed that these genes presented distinct but partially overlapping expression patterns (Additional file 2: Figure S6).