Genome-wide identification of PUB gene family members in cotton
The hidden Markov model (HMM) of the U-box domain (PF04564) was downloaded from the Pfam30.0 database, and used as a query to identify the candidate PUB members in four cotton genomic database using HMMER3.0. SMART. In addition, Pfam30.0 was also used for further identification to confirm every PUB members containing U-box domain. Finally, 93, 96, 185, and 208 PUBs were identified from the four sequenced cotton species G. raimondii (D5), G. arboreum (A2), G. hirsutum acc. TM-1 (AD1), and G. barbadense (AD2), respectively, and these PUBs were named GrPUB1-93, GaPUB1-96, GhPUB1A-89A/1D-91D/181-185 and GbPUB1A-98A/1D-98D/197-208 according to their location on the chromosome. The number of PUB genes in tetraploid cottons was twice as high as that in diploid cottons, showing that PUB genes were relatively conservative. The essential information about the gene name, chromosome locations, length of the open reading frame (ORF), type of protein domain, position of the U-box domain and subcellular localizations of these gene family members could be found in additional files (Additional file 2: Table S2, Additional file 3: Table S3, Additional file 4: Table S4 and Additional file 5: Table S5). The length of the PUB protein sequence in the cotton ranged from 49 to 1492 AA, and the U-box domain contained approximately 75 amino acids. However, the length of the U-box domain was almost identical except for a few PUBs; for example, proteins GaPUB39 and GhPUB40D had only 32 and 50 amino acids, respectively. Results of the subcellular localization analysis showed PUB proteins could be found throughout the cell, including nuclear, cytoplasmic, chloroplast, plasma membrane, mitochondrial, and extracellular locations. However, most PUB proteins were localized inside the nucleus. Twenty different domains were found among all the cotton PUBs (Table 1), and the primary mode was “U-box+ARM/HEAT”. Different domain modes may be associated with different functions of cotton PUBs.
Structure and evolution analysis of PUBs in cotton
A Gene structure diagram of the PUBs and an evolution tree were constructed (Additional file 6: Figure S1, Additional file 7: Figure S2, Additional file 8: Figure S3 and Additional file 9: Figure S4). Based on the evolutionary relationship, the PUB genes could be categorized into five subgroups (I-V). Among these subgroups, subgroup I was composed of the domains “U-box + ARM” and “U-box only”, and the remaining subgroups were composed of the other domains. The exon number of PUB genes in cotton was greatly divergent, ranging from 1 to 25. Among all the PUBs, approximately 1/3 of the PUBs contained only one exon. Generally, the evolutionary relationship is correlated with gene structure in some way, that is, exons with the more similarities in terms of the number and size of the exon, have a closer evolutionary relationship. In G. hirsutum, the length of GhPUB1A is 47 Kb, much larger than the other PUB genes, which may be correlated with the assembly and annotation of the cotton genome. Members in each subgroup of G. barbadense (AD2) was much different with those in G. raimondii (D5), G. arboreum (A2) and G. hirsutum (AD1), and this difference may be correlated with the different origins of these species. Therefore, the PUBs in G. raimondii (D5), G. arboreum (A2) and G. hirsutum (AD1) were used for the evolution relationship analysis, and the results also indicated five subgroups namedⅰ-ⅴwere found (Additional file 10: Figure S5), and this was similar with the evolution of PUBs in one genome, indicating the PUB members were highly conservative. Furthermore, closer evolution relationships of GhPUB1A-89A with GaPUB1-96 and GhPUB1D-91D with GaPUB1-93 were found through the evolutionary analysis.
Chromosomal localization analysis of PUB genes in three cotton genomes
The MapInspect software was used to analyze the localization of PUB genes on the chromosomes based on the position information. Among 93 genes in G. raimondii, 91 were localized unevenly on the chromosome and the others were found on scaffolds (Figure 1A). These results indicated that only a few genes were present on chromosomes 3, 4, and 12, and the chromosome 5 contained the highest number of PUB genes (11 PUBs). In addition, PUB genes on chromosomes 4, 6, 7, 11 and 12 were preferentially enriched towards the end of the chromosome. All of 96 PUB genes identified in G. arboreum were localized on different chromosomes (Figure 1B). The results showed uneven distribution of PUBs on each chromosome in G. arboretum, chromosome 1 containing the most PUB genes (up to 14) and chromosome 3 containing the least PUB genes (only 2). In addition, the length of chromosome 5 was approximately 6 Mb, however 9 PUB genes were found on it, presenting the highest distribution density. In G. hirsutum, 91.4% (169/185) of the PUB genes were anchored onto chromosomes, among which 82 and 87 genes were found in the At- and Dt- subgenome, respectively (Figure 2). The number of PUB genes on chromosome D07 was the most and chromosome D08 was the least compared with other chromosomes in both At- and Dt- subgenomes of G. hirsutum, showing that PUBs on these two chromosomes were relatively conserved and significant for cotton growth. The situation for G. barbadense was different with that of G. hirsutum (Additional file 11: Figure S6). These results indicated that the PUB genes were equally distributed in At- and Dt- subgenomes but unevenly localized on each chromosome, which may be correlated with the differentiation of these species.
Gene duplication analysis
Fragment duplications in the genome region may result in the scattering of the gene family members. Compared with other eukaryotes, plants always have a higher rate of gene replication. Recent studies have shown that G. raimondii have had at least two complete genome-wide replicates [26]. The segregation of diploid cotton A genome and D genome occurred about 5-10 Myr years ago [18], and allotetraploid G. hirsutum was generated from the hybridization of diploid cottons and the number of chromosomes were doubled 1-2 Myr years ago. In the study, BLAST2.2.31+ (ftp://ftp.ncbi.nlm.nih.gov/blast/executables /blast+/LATEST/.) was used for BLASTN and BLASTP (value 10) screening of homologous gene pairs from the cotton PUB genes identified. The uneven distribution of PUB genes on the chromosome may be correlated with the gene duplication or partial fragment replication events during the long evolutionary history of the cotton genome. Each time the replication event occurs, the entire genetic sequence of the cotton is doubled, and over time, these redundant genes are recombined or lost [23]. Previous studies have shown that gene duplication and post-segregation phenomena are two major driving forces of evolution [27]. Based on the multiple sequence alignment of the encoding sequences and the proteins in diploid cotton, 18 and 27 homologous gene-pairs were discovered with MCScanX [28] in G. raimondii (D5) (Additional file 12: Figure S7A) and G. arboreum (A2) (Additional file 12: Figure S7B), respectively. Among these homologous gene-pairs, 15 segmental duplications and 3 tandem duplications were found in G. raimondii, and 25 segmental duplications and 2 tandem duplications were found in G. arboreum. The relationship between these two diploid cottons and G. hirsutum was analyzed (Additional file 13: Figure S8). Totally 197 homologous gene-pairs were found between G. raimondii and G. hirsutum, among which 58.89% (116/197) were located in the Dt-subgenome, and 191 homologous gene-pairs were found in both G. arboreum and G. hirsutum, among of which 55.50%(106/191) were located in the At-subgenome. All these results indicated that more than half of homologous genes in G. hirsutum were derived from the corresponding diploid cotton genomes. Furthermore, approximately 41.11% - 44.50% of these homologous genes were originated from other diploid genomes.
Expression pattern analysis of PUB genes in cotton
Based on previous transcriptome data of the PUBs under different stresses (including salt, drought, hot and cold) in G. hirsutum, 117, 148 and 119 PUB genes were found with FPKM >1 in roots, stems and leaves, respectively, displaying tissue specificity. Among all the PUB genes, approximately 21 non-expressed PUB genes were identified in three tissues, and they may be associated with other specific regulation functions. All the PUB genes were categorized into five subgroups (I, II, III, IV and V), and similar expression patterns were found among all PUB genes (Additional file 14: Figure S9 and Additional file 15: Figure S10). In subgroup I, 18 PUB genes with profound expression differences were discovered; in addition, other PUB genes in subgroup II- IV were found to have a consistent expression pattern under different stresses. However, 4 PUB genes (GhPUB32A - GhPUB38D) in subgroup V showed a small expression difference under different stresses.
The evolution relationship in Additional figure S5 showed GhPUB68A, GhPUB85A, GhPUB45D and GhPUB69D were belonged to subgroup III, indicating that their close relationship with each other. The transcriptome data showed that GhPUB85A and GhPUB45D were highly expressed whereas GhPUB68A and GhPUB69D were negligibly expressed. To investigate the functions of the homologous genes in cotton, qRT-PCR was used to investigate the expression difference in G.hirsutum TM-1. Drought, salt and cold treatments were applied and the results were present in Figure 3. High expression of GhPUB85A and GhPUB45D under three stresses suggested that they were actively respond to the abiotic stresses, but GhPUB68A and GhPUB69D were not, which was in line with previously reported transcriptome data. Interestingly, we found that GhPUB85A and GhPUB45D were highly expressed at 6h under drought stress, while the expressions at 12h were the highest under salt and cold stress, indicating that GhPUB85A and GhPUB45D responded to drought stress faster than they did to salt and cold stresses. However, the expression values of GhPUB85A and GhPUB45D were significantly different under the same stress conditions, showing their different contributions in responding to abiotic stresses.
In addition, GhPUB85A and GhPUB45D were cloned using cDNA from G.hirsutum TM-1, and ligated to pEASY-Blunt Cloning Vector for sequencing to verify whether the vector was correctly ligated. The sequencing and enzyme digestion results showed that the recombined vectors were correctly constructed. Red fluorescence vectors pBI121-GhPUB85A:RFP and pBI121-GhPUB45D:RFP were constructed to research their subcellular localizations (Figure 4), and the results showed that these two genes were located at the cytomembrane, which were consistent with our prediction in Additional table S2. In addition, two VIGS vectors pYL156:GhPUB85A and pYL156:GhPUB45D were constructed using In-Fusion technology to study their functions under different stresses. Fifteen days after the VIGS infection, albino leaves of the positive control plants were observed, and all newly-emerged leaves were white in the later stage, while the others were normal with no albino leaves (Figure 5a). We investigated the expression quantity in the control plants (CK), and pYL156-, pYL156:GhPUB85A- and pYL156:GhPUB45D- infected plants under different stresses. The expression levels of two genes decreased significantly after the VIGS infection under different treatments showed their positive functions in responding to multiple stresses and the success of VIGS infection (Figure 5b-d), indicating the VIGS infection technology was an effective way to study the gene functions in cotton.