Identification of PUB gene family members in whole genome of cotton
The hidden Markov model (HMM) of U-box domain (PF04564) was downloaded from the database Pfam30.0, and used as a query to identify the candidate PUB members in four cotton genomic database using HMMER3.0. SMART. Besides, Pfam30.0 was also used for further identification to confirm every PUB members containing U-box domain. Finally, 93, 96, 185, 208 PUBs were identified from the four sequenced cotton species, G. raimondii (D5), G. arboreum (A2), G. hirsutum acc. TM-1 (AD1), and G. barbadense (AD2), respectively, which were named GrPUB1-93, GaPUB1-96, GhPUB1A-89A/1D-91D/181-185 and GbPUB1A-98A/1D-98D/197-208 according to their location on the chromosome. The gene number of tetraploid cottons was the twice as many as that in diploid cottons, showing PUB genes are relatively conservative. The information about gene name, locations, length of open reading frame (ORF), type of protein domain, position of U-box domain and subcellular localizations of these gene family members could be found (Additional file 1: Table S1, Additional file 2: Table S2, Additional file 3: Table S3 and Additional file 4: Table S4). The length of the PUB protein sequence in cotton ranges from 49 to 1492 AA, and the U-box domain contains about 75 amino acids. However, the length of U-box domain was almost the same except a few PUBs, for example, GaPUB39 and GhPUB40D has only 32 and 50 amino acids, respectively. Subcellular localization analysis showed PUB proteins distributed throughout the cell, including nuclear, cytoplasmic, chloroplast, Plasma Membrane, mitochondrial, and extracellular. But most PUB proteins are located in the nucleus. 20 different domains were found in all cotton PUBs (Table 1), and the primary mode was “U-box+ARM/HEAT”. Different domain modes may be associated with different functions of cotton PUBs.
Analysis of PUBs gene structure and evolution in cotton
Gene structure diagram of PUBs and evolution tree were constructed (Additional file 5: Figure S1, Additional file 6: Figure S2, Additional file 7: Figure S3 and Additional file 8: Figure S4). Based on the evolutionary relationship, the PUB genes could be divided into five subgroups (I-V). Among these subgroups, subgroup I was composed of domains “U-box + ARM” and “U-box only”, and the rest subgroups were composed of other domains. The exon number of PUB genes in cotton is greatly divergent, ranging from 1 to 25. Among all the PUBs, approximately 1/3 PUBs contain only one exon. Generally, the evolutionary relationship is somehow correlated with gene structure, that is, the more similar the number and size of the exon, the closer evolutionary relationship. In G. hirsutum, the length of GhPUB1A is 47Kb, much larger than other PUB genes, which may be correlated with the assembly and annotation of the cotton genome. Members in each subgroup in G. barbadense (AD2) was greatly different with that in G. raimondii (D5), G. arboreum (A2) and G. hirsutum (AD1), which may be correlated with the different origins of these species. So the evolution of PUBs in G. raimondii (D5), G. arboreum (A2) and G. hirsutum (AD1) was analyzed, and the results also indicatedⅰ-ⅴsubgroups were found (Additional file 9: Figure S5), and this was similar with the evolution of single PUB gene, suggesting the PUB members were highly conservative. Furthermore, more close evolution relationships of GhPUB1A-89A with GaPUB1-96 and GhPUB1D-91D with GaPUB1-93 were found based on the evolution analysis.
Location of PUB Genes in three cotton genomes
The MapInspect software was used to draw the distribution map of PUB genes on the chromosomes based on the position information. Among 93 genes in G. raimondii, 91 were located unevenly on the chromosome and others were found on Scaffolds (Figure 1A). The results indicated that only a few genes were found on chromosome 3, 4, and 12, and the PUB genes on chromosome 5 were the most (11 PUBs). Besides, PUB genes on chromosome 4, 6, 7, 11 and 12 were biasedly enriched towards one end of the chromosome. All of 96 PUB genes identified in G. arboreum were located on different chromosomes (Figure 1B). The results showed uneven distribution of PUBs on each chromosome in G. arboreum, and chromosome 1 enriched the most PUB genes (up to 14) and chromosome 3enriched the least PUB genes (only 2). Besides, the length of chromosome 5 was about 6 Mb, but there were 9 PUB genes were found, showing the highest distribution density. In G. hirsutum, 91.4% (169/185) PUB genes were anchored on chromosomes, among of which 82 and 87 genes were found A- and D- subgenome, respectively (Figure 2). The number of PUB genes on chromosome 7 was the most and chromosome 8 was the least compared with other chromosomes in both At- and Dt- subgenome in G. hirsutum, showing PUBs on these two chromosomes were relatively conservative and significant for cotton growth. The situation in G. barbadense was different with G. hirsutum (Additional file 10: Figure S6). The results showed PUB gene were equally distributed in At- and Dt- subgenome and unevenly located on each chromosome, which may be correlated with differentiation of two species.
Gene duplication analysis
Fragment duplication of the chromosomal region may result in the scattering of the gene family members on multiple chromosomes. Compared with other eukaryotes, plants always have a higher rate of gene replication. Recent studies have shown that G. raimondii have had at least two complete genome-wide replicates [30]. Segregation of cotton A genome and D genome diploid occurred in about 5-10Myr years ago [18], and allotetraploid G. hirsutum was generated from hybridization of diploid cottons and the number of chromosomes were doubled 1-2Myr years ago. In the study, BLAST2.2.31+ (ftp://ftp.ncbi.nlm.nih.gov/blast/executables /blast+/LATEST/.) was used for blastn and blastp (value 10) screening of homologous gene pairs from the cotton PUB genes identified. The uneven distribution of genes on the chromosome may be the result of gene duplication or partial fragment replication during the long evolutionary history of the cotton genome. Each time the replication occurs, the entire genetic sequence of cotton is doubled, and over time, these redundant genes are recombined or lost [23]. Previous studies have shown that gene duplication and post-segregation phenomena are two major driving forces of evolution [32]. Based on the multiple sequence alignment of the encoding sequences and proteins in diploid cotton, 18 and 27 homologous gene-pairs were discovered with MCScanX [37] in G. raimondii (D5) (Additional file 11: Figure S7A) and G. arboreum (A2) (Additional file 11: Figure S7B), respectively. Among these homologous gene-pairs, 15 segmental duplications and 3 tandem duplications were found in G. raimondii, and 25 segmental duplications and 2 tandem duplications were found in G. arboreum. The relationship between two diploid cottons and G. hirsutum was analyzed (Additional file 12: Figure S8). Totally 197 homologous gene-pairs were found between G. raimondii and G. hirsutum, among of which 58.89% (116/197) were located on Dt-subgenome, and 191 homologous gene-pairs were found between G. arboreum and G. hirsutum, among of which 55.50%(106/191) were located on At-subgenome. All these results showed more than half of homologous genes in G. hirsutum were derived from corresponding diploid cotton genome. Furthermore, approximately 41.11% - 44.50% homologous genes were originated crosswise from other diploid genome.
Expression pattern analysis of PUB genes in cotton
Based on previous transcriptome data of PUBs under different stresses (salt, drought, low temperature and high temperature) in G. hirsutum, 117, 148 and 119 PUB genes were found with FPKM >1 in root, stem and leaf, respectively, displaying tissue specificity. Among all the PUB genes, approximately 21 PUB genes with no expression were identified in three tissues, which may be associated with other specific regulations. Interestingly, it was found that 3 PUB genes (including GhPUB58D, GhPUB55A and GhPUB67D) always highly expressed in three tissues under salt, drought, low temperature and high temperature stresses. All PUB genes were divided into five subgroups (Ⅰ, Ⅱ, Ⅲ, Ⅳ and Ⅴ), similar expression patterns were found among all PUB genes (Additional file 13: Figure S9 and Additional file 14: Figure S10). In subgroupⅠ, 18 PUB genes with huge expression difference were discovered, otherwise, other PUB genes in subgroup Ⅱ-Ⅳ were found to have a consistent expression pattern under different stresses. But 4 PUB genes (GhPUB32A - GhPUB38D) in subgroup Ⅴwere found with little expression difference under different stresses.
Cloning and function analysis of GhPUB85A and GhPUB45D
The evolution relationship in Additional figure S5 showed GhPUB68A, GhPUB85A, GhPUB45D and GhPUB69D were belonged to subgroup ⅲ, showing their close relationship between each other. The transcriptome data showed GhPUB85A and GhPUB45D were highly expressed whereas GhPUB68A and GhPUB69D were almost not expressed at all. In order to investigate the response to stresses of homologous genes in cotton, qRT-PCR was used to study the expression difference in G. hirsutum TM-1. Drought, salt and low-temperature treatments were conducted and the results could be found in Figure 3. Higher expression of GhPUB85A and GhPUB45D under three stresses suggested they were actively expressed to respond the abiotic stresses, but GhPUB68A and GhPUB69D were not. In the research, interestingly, the highest expressions of GhPUB85A and GhPUB45D were found at 6h under drought stress, but the expressions at 12h were the highest under salt and low-temperature stress, indicating GhPUB85A and GhPUB45D responded to drought stress faster than salt and low-temperature stress. However, significantly different expressions of two genes under the same stress were, showed their different contributions in responding to abiotic stresses.
Based on the gene expression, GhPUB85A and GhPUB45D were cloned using cDNA from G. hirsutum TM-1 as template, and connected to pEASY-Blunt Cloning Vector for sequencing. The sequencing and enzyme digestion results showed the length of targets were correct. Red fluorescence vectors pBI121-GhPUB85A:RFP and pBI121-GhPUB45D:RFP were constructed to research their subcellular localizations (Figure 4), and the results showed these two genes were located at cytomembrane, which were consistent with our prediction. Besides, two VIGS vectors pYL156:GhPUB85A and pYL156:GhPUB45D were constructed using In-Fusion technology to study their functions under different stresses. 15d after the VIGS infection, the leaves of the positive control plants were obviously albino, and all the new leaves were albino in the later stage, while the others were normal (Figure 5a). Huge relative expression changes of two targets after the VIGS infection under different treatments showed their positive functions in responding multiple stresses and the success of VIGS infection (Figure 5b-d), which also suggested the VIGS infection technology was an effective way to study the gene functions in cotton.