Identification and sequence analysis of the cotton expansin gene family
We have identified expansin gene family in G. hirsutum genome. As a result, candidate 98 expansin genes were initially obtained. According to the analysis of expansins conserved domain, the results showed that 93 expansin genes with both DPBB-1 (domains I) and Pollen_allerg_1 (domain II) domain were ultimately confirmed for further analysis. Each expansin gene was named according to nomenclature guidelines. The detailed results were shown in Table S1. The expansin gene family contained four subfamilies, including EXPA, EXPB, EXLA, and EXLB. As to the expansin gene family in Gossypium. arboreum and Gossypium. raimondii, the same analysis methods were performed. As a result, total 49 and 45 expansin genes were identified in G. arboreum and G. raimondii genome, respectively. These expansin genes were also divided into 4 subfamilies. The detailed results were shown in Table S2 and Table S3.
We have analysed the biochemical properties of expansin proteins (Table S1). The pI values of expansin family members ranged from 4.65 (GhEXLB1l) to 12.01 (GhEXPA4c), with an average of 8.47. The pI values of EXPA and EXLA members were above 7.0 except for GhEXPA8b and GhEXPA7d. However, the pI values of EXPBs and EXLBs were below 7.0 except for GhEXPB3a, GhEXPB3b, GhEXPB1a, GhEXPB1b, GhEXLB1d, and GhEXLB1j (Additional file1: Table S1). The average MW of expansin family members was 27.42 kD, ranging from 14.29 (GhEXPA17e) to 41.53 (GhEXPA5e) kD. The length of expansin protein sequences ranged from 150 (GhEXLA17e) amino acids (aa) to 366 aa (GhEXPA5e), and the signal peptide size ranged from 17 (GhEXPA15d, GhEXPA15g and GhEXLA1d) to 35 (GhEXLA1a) aa in length (Additional file1:Table S1).
The multiple sequence alignment results of 93 expansin protein from G. hirsutum showed that they had similar sequence characteristics: the majority of them consist of a signal peptide, conserved domains I and II (Additional file 2: Figure S1), which was consistent with the previous study [3]. The amino acid sequence of domain I was more conserved than that of domain II, especially among EXPA members (Additional file 2: Figure S1). Notably, almost all of the EXPAs (excluding GhEXPA13a, GhEXPA13b, and GhEXPA15d) and three EXLA members (GhEXLA17a, GhEXLA17b, and GhEXLA17c) contained a conserved motif (HFD) in domain I (Additional file 2: Figure S1). Members of EXPB, EXLB, and the other six EXLA members did not have the HFD motif. Six EXLA contained an extra segment named EXLA extension of the C-terminus. Of the EXLA extension sequence, the feature only possessed in EXLA subfamily, the amino acid sequences of EXLA extension showed as follow: “DIAK(Q)EGCS(F)P(H)CDD(Y)S(G)H(N)WR(-)”. In addition, a conserved motif named BOX 1 was found in almost all the expansin members (Additional file 2: Figure S1).
Phylogenetic relationships, genes structure and protein motifs of the cotton expansin genes
In order to evaluate the evolutionary relationships of cotton expansins, a phylogenetic tree was constructed. The expansins were divided into four major subfamilies, EXPA, EXPB, EXLA and EXLB. The EXPA subfamily was the largest group with 67 members, and the other subfamilies contained eight (EXPB), six (EXLA), and 12 (EXLB) members. The four expansin subfamilies comprised 15 subgroups (Fig. 1). We discovered that EXPA-IV was the largest subgroup, which included 17 expansin members, and EXPA-VII, EXPA-VIII, and EXPA-IX were the smallest subgroups with only two expansin members each.
The results of gene structure (exon-intron organization) analysis showed that the expansin members included two to five exons, and the same subfamilies had similar characteristics of exon types (Fig. 2a, b). Most of the EXPA members had three exons (51 of 67 EXPA members). 12 EXPAs had two exons, and four EXPAs had four exons. All members from EXPB subfamily had four exons except for GhEXPB1a (five exons). Four EXLA members contained five exons and two members had four exons. EXLB members had four (seven EXLBs) or five exons (five EXLBs).
We have identified the conserved motifs in expansin proteins. As a result, a total of ten distinct motifs were identified (Fig. 2c, Additional file 2: Figure S2). The motifs of all cotton expansins had unifying features; for example, each expansin protein contained motif 5 and almost all of them contained motif 4 except for GhEXPA15d, GhEXPA17e, and GhEXLB1j. In addition, the type, arrangement, and number of the motifs had similar characteristics in the same subfamily. More than half of the EXPA members (38/67) had seven motifs, and 21 members had six motifs. EXPB, EXLA, and EXLB subfamilies possessed similar motif characteristics and most of them contained five motifs (motifs 4, 5, 7, 8, and 9). GhEXPB2d and GhEXPB3b of the EXPB subfamily included four motifs, and three members (GhEXLB1g, GhEXLB1c, and GhEXLB1j) of EXLB also had the same number of motifs. These results showed that EXPB, EXLA, and EXLB subfamilies had close evolutionary relationships. The similarities between gene structures and sequence motifs implied that cotton expansin family genes had duplication events over evolutionary time
Chromosomal location and collinear analysis of the expansin gene family
The chromosomal location of GhEXP genes was identified in G. hirsutum. The results were shown in Fig. 3. Total 93 expansin genes were distributed on 24 chromosomes, excluding Ghir_A02 and Ghir_D06. The chromosome Ghir_A05 contained eight expansin genes, whereas Ghir_A06 included only one expansin gene. The numbers of expansin genes located on other chromosomes ranged from two to seven. In addition, some of the expansin genes were located on the chromosome in clusters, for example, both Ghir_A08 and Ghir_D08 possessed a gene cluster with four distinct EXLBs (Fig. 3). These results showed that the expansin genes unevenly distributed on each chromosome. Collinearity analysis showed that expansin genes were collinear frequently between A and D sub-genomes (Fig. 4), which indicated that expansin genes with collinear relationships may have a similar function.
Investigation of cis-acting elements in the promoter regions of expansin genes
We have identified the cis‑acting regulatory elements of the cotton expansin gene family. The results showed that cis‑acting regulatory elements of expansin genes were extremely diverse (Additional file 1: Table S4; Table S5). These elements were divided into 7 categories and 111 types, including 31 light responsive elements, 7 development-related elements, 13 hormone responsive elements, 5 environmental stress-related elements, 3 promoter-related element, 7 site-binding related elements and 44 other elements (no functions). Among them, the types of light and hormone responsive were especially abundant (Additional file 1: Table S4; Table S5).
All of 93 GhEXP genes possessed 15,200 elements, including 1,268 light responsive elements, 144 development-related elements, 779 hormone responsive elements, 409 environmental stress-related elements, 9416 promoter related elements, 81 site-binding related elements and other elements 3103 (Additional file 1: Table S4), respectively. Out of 93 GhEXP genes, 83 possessed Box 4 element (the part of a conserved DNA module involved in light responsiveness), 70 owned GT1-motif (light responsive element), 57 had G-box (cis-acting regulatory element involved in light responsiveness), with 70 enriched ABRE elements (the cis-acting element involved in the abscisic acid responsiveness), 75 contained ERE (ethylene-responsive element), 56 had TGACG-motif as well as TGACG-motif, of which were the cis-acting regulatory element involved in the MeJA-responsiveness, 73 harbored the ARE (cis-acting regulatory element essential for the anaerobic induction) and 40 possessed the MBS (MYB binding site involved in drought-inducibility). Moreover, these relatively abundant elements were also more conserved among the GhEXPs gene family. In addition, all the GhEXP genes contained the CAAT-box and TATA-box, which were the core elements of promoter in eukaryotes, and the number of them was also the largest (Additional file 1: Table S5).
Expression patterns of the expansin genes in cotton fibre
To comprehensively investigate the temporal expression patterns of the cotton expansin gene family, fibre samples of different developmental stages were used for transcriptome analysis. A heat map was constructed with these transcriptome data (Fig. 5). The 86 expansin genes displayed the different expression patterns. The remaining seven expansin genes were not detected in this transcriptome data. Although the expression patterns of expansin genes displayed obvious differences, clustered expansin genes generally possessed a similar expression pattern. For example, GhEXPA1d, GhEXPA15d, GhEXPA15a, GhEXPA4o, GhEXPA4a and GhEXPA4b were the preferentially expressed genes during fibre initiation and elongation stages (0 to 15 DPA), whereas GhEXLA1f and GhEXLA1c had higher expression during the middle and later cotton fibre development stages (after 15 DPA). In addition, GhEXPA4f and GhEXPA2, two homologous genes located on the A and D sub-genomes, were sharply up-regulated from 3 DPA with a very similar expression pattern (Fig. 5), suggesting that they may have close or complementary functions during cotton fibre development. To verify our transcriptome results, the GhEXP genes expression profiles were further confirmed using publicly available RNA-seq data. It was showed that the expression profiles of GhEXP genes were basically consistent with our transcriptome results (Fig. 5 and Figure S3).
In order not to miss the possible important expansin genes, we have also analysed the transcriptical levels of seven expansin genes which were not detected in the transcriptome (Fig. 5; Table S1). qRT-PCR showed that the seven expansin genes scarcely expressed exclusive GhEXLB1h with low expression level in ovules and fibres. In addition, we have found these genes can be detected in other tissues but expression levels were also not high (Additional file 2: Figure S4).
qRT-PCR analysis of the special expansin genes in cotton fibres
In order to further identify the key expansin genes involved in fibre cell growth, 14 expansin genes that are predominantly expressed in different stages of developmental cotton fibres were selected to verify their expression level using qRT-PCR experiment. These expansin genes were evidently up-regulated at the initiation, elongation, or transition stages (Fig. 6) and displayed almost consistent with the expression tendency when compared to transcriptome data (Additional file 2: Figure S5).
We found that GhEXPA4o, GhEXPA1a, and GhEXPA8h were predominantly expressed at 0 DPA (Fig. 6a), suggesting that these three genes may function in the initial stage of fibre cells. Nine expansin genes showed higher expression levels at the fibre elongation stages with distinct expression characteristics (Fig. 6b). The expression of GhEXPA4a reached a peak at 3 DPA and GhEXPA13a and GhEXPA4f peaked at 5 DPA. The expression levels of GhEXPA4q, GhEXPA8f, and GhEXPA2 were the highest at 7 DPA and GhEXPA8g, GhEXPA8a, and GhEXPA4n peaked at 10 DPA (Fig. 6b). GhEXPA4f and GhEXPA2 are homologous genes in allotetraploid cotton species that are located in the A and D sub-genomes of the 10th chromosomes, respectively, and both genes have specific expression in cotton fibre cells. Moreover, GhEXPA8a and GhEXPA8g are two important genes that we have found during cotton fibre elongation. These results revealed that the expression peaks of the majority of genes appeared from 7 to 10 DPA, which are usually called the fast elongation stages. In addition, we obtained two expansin genes that were predominantly expressed at transition stages, named GhEXLA1c and GhEXLA1f (Fig. 6c). Both of them belonged to the EXLA subfamily with unclear biological roles. The expression levels of GhEXLA1c and GhEXLA1f were the highest at 20 DPA, which is the transition stage of fibre cells from fast elongation to secondary cell wall synthesis.
To better understand the potential functions of 14 expansin genes, their expression profiles were detected in 11 different tissues, including roots, hypocotyls, stems, leaves, calycles, petals, pollens, stigmas, fibres from 0 DPA, 10 DPA and 20 DPA. The results showed that these genes presented distinct but partially overlapping expression patterns (Additional file 2: Figure S6).