Identification and classification of AQP genes in Gossypium species
HMMER was used to identify AQP genes in Gossypium, and MIP domain (Pfam accessions, PF00230) as the model. The remaining sequences were checked for the existence of MIP domains using CDD (Conserved Domains Database). A total of 54, 56, and 111 candidate AQP genes were predicted in G. arboreum, G. raimondii, andG. hirsutum, respectively (Table S3). The number of AQPs in genome of G. hirsutum was more than that in G. arboreum (A2) and G. raimondii (D5). According to the prediction, upland cotton was derived from the hybridization of two progenitors with the diploid genome, and relevant chromosomes were doubled in tetraploid species. The nomenclature of AQPs was guided by phylogenetic analysis and sequence similarity. Though the genome of G. raimondii is smaller than the genome of G. arboreum, the number of AQPs in these two diploid cotton species is quite equal.
To study the origin and evolution of cotton AQP genes, the distribution of AQPs in 34 plant species was further analyzed (Fig. S1A-C). The XIP subfamily was absent in monocots and widely present in dicotyledons except for Brassica (Fig. S1B). The least amount of AQP was identified in Physcomitrella Patens (22) and Selaginella moellendorffii (18) (Fig. S1C). In monocots and eudicots which have experienced the recent genome-wide polyploidy, such as Zea mays, Gossypium, Brassica rapa, Brassica oleracea, Glycine max, Populus trichocarpa, Populus trichocarpa, and Solanum lycopersicum, the number of AQPs was significantly more than that of other species (Fig. S1C). In the allotetraploid species G. hirsutum and Brassica napus, the number of AQPs is almost equal to the sum of the two diploid ancestor species. The relationship between the number of AQP genes and the genome was identified. The results revealed that no significant correlation was identified between the number of AQPs and genome size (Fig. S1D). The number of AQP genes was positively correlated with the total number of genes in eudicots (Fig. S1E).
Phylogenetic analysis of AQP gene family
To query the evolution of AQP genes among three cotton species (G. raimondii, G. arboreum, and G. hirsutum), A. thaliana, Oryza sativa, and Phaseolus vulgaris, a Maximum Likelihood (ML) tree was constructed based on the amino acid sequences of AQPs using MEGA 7.0 (Fig. 1). The clustering data clearly showed that no AQP sequences of Arabidopsis and O. sativa were clustered into the clade of XIPs. Besides, the phylogenetic pattern of cotton AQPs was similarly to those of P. vulgaris, which clustered into five distinct subfamilies including PIPs, TIPs, NIPs, SIPs and XIPs. Compared with the other three species, AQPs from the three cotton species had a higher relative coefficient, which suggested that there is a closer relationship among the three cotton species. (Fig. 1). All the aquaporins of G. hirsutum were classified into five subfamilies consisting of 48 PIPs, 26 TIPs, 20 NIPs, 11 SIPs, and 6 XIPs (Fig. 1, Fig. 2A). The PIP subfamily which harbors 48 members was the biggest and divided into PIP1 (20 members) and PIP2 (28 members). The TIP subfamily was divided into five groups (TIP1 to TIP5), with 14 members in TIP1, six in TIP2, one in TIP3, three in TIP4, and two in TIP5. The NIP subfamily was divided into seven groups, six members in NIP1, four members in NIP7, and two in either NIP2, 3, 4, 5, and 6. The SIP subfamily was divided into SIP1 (nine members) and SIP2 (one member). Only one XIP group was identified, with six members. Three AQPs (GhPIP2,9_A, GhNIP5,1_A, GhSIP2,1_A) specifically existed in G. hirsutum, and two existed in G. arboreum (GaSIP1,4, GaSIP1,6) (Fig. 1, Table S4). The result suggesting AQPs in Gossypium are usually conserved, but the tetraploid cotton G. hirsutum had produced some specific AQP genes during the evolution and gene family duplication.
Basic characteristics and sequence analysis of AQPs
Bioinformatics analysis revealed that MW of GhAQPs ranged from 18.56 kDa to 82.36 kDa with pI ranged from 4.59 to 10.15. Comprehensive analysis of these five subfamilies indicated most members of GhPIPs were alkaline, with MW of approximately 30 kDa. However, the TIP subfamily members showed lower molecular weight. The MW of SIPs were smaller (<27 kDa) than the other four subfamilies with the exception GhSIP1,2_D (Kayum et al. 2017). The length of the aquaporins from the G. arboreum varied from 173 residues to 366 residues, with a median of 269 AAs per protein. The length of the aquaporins from the G. raimondii varied from 209 residues to 336 residues, with a median of 272 AAs per protein. The length of the aquaporins from the G. hirsutum varied from 181 residues to 759 residues, with a median of 275 AAs each protein. The predicted molecular weights and pI values f AQPs from the two diploid cotton species were within the ranges of 25.84-34.57 kDa and 4.8-10.15, respectively (Table S3).
Six TMDs (TM1-TM6), two NPA (Asn-Pro-Ala) motifs, and the ar/R filter were essential for the subjectivity of AQPs (Kaldenhoff and Fischer 2006). To detect the conservatism of AQPs, a sequence analysis of 111 putative GhAQPs was performed. Table S3 summarizes NPA motifs, the aromatic/Arg (ar/R) selectivity filter (H2, H5, LE1, and LE2), subcellular localization for the AQP protein sequences of Gossypium. Taking GhAQPs as an example, six TMDs (transmembrane domains) were identified in most of GhAQPs (70 of 111, 63.1%), except for 24 AQPs having five, two AQPs having four, two AQPs having three and 12 AQPs having seven TMDs, respectively.
NPA motifs showed the typical Asp-Pro-Ala residues in PIP and TIP subfamilies. In NIP subfamily, most of the members showed NPA motifs, but there are exceptions. For example, one GhNIP (GhNIP2,1_A) showed that proline (P) is replaced by a serine residue (S) in the first NPA motif, four GhNIPs (GhNIP5,1_A, GhNIP5,1_D, GhNIP6,1_A, GhNIP6,1_D), two GaNIPs (GaNIP6,1, GaNIP7,1) and two GrNIPs (GrNIP5,1, GrNIP6,1), showed that alanine (A) was altered by serine (S) or threonine (T) in the first NPA motif, the alanine (A) was exchanged by valine (V) in the second NPA motif. Members of SIP subfamily encoded motifs with a variable third residue in which alanine (A) was replaced by threonine (T) or leucine (L). In XIP subfamily, most of the members showed various in the third residues of the first NPA motif, in which alanine (A) was replaced by isoleucine (I) or valine (V). Especially, three members of aquaporins (GhTIP4,1_A, GhNIP7,2_D and GhXIP2,1_A) having only one NPA motif.
The ar/R selectivity filter showed various forms and family-specific sequences compared to the NPA motifs. In the PIP subfamily, the ar/R position showed conserved residues, phenylalanine (F) in H2, histidine (H) in H5, threonine (T) in LE1, and arginine (R) in LE2 (Table S3). In the TIP subfamily, the ar/R is formed by the motif of histidine/asparagine in H2, motifs of isoleucine/valine in H5, aspartate/serine in LE1, and valine/arginine/leucine in LE2. In the NIP subfamily, the ar/R position is formed by the motif of tryptophan/alanine/glycine in H2, valine/isoleucine/serine in H5, threonine/serine in LE1, and arginine in LE2. The SIP subfamily showed motifs of valine/phenylalanine/isoleucine in H2, valine/histidine in H5, threonine in LE1, and asparagine/serine/phenylalanine in LE2. The XIP subfamily showed motif of phenylalanine/valine/leucine/isoleucine in H2, valine in H5, glycine/alanine in LE1, and arginine in LE2.
Subcellular localization of GhAQPs
Subcellular localization is important for understanding the function of genes. Based on the subcellular localization prediction on the WolF PSORT tool, most PIPs are predicted to localize on the plasma membrane. TIPs were predicted to localize on the vacuolar membrane, plasma membrane as well as in cytoplasm. NIPs, SIPs, and XIPs were predicted to localize on the vacuolar membrane and plasma membrane. To determine the subcellular localization of GhAQP proteins, a C-terminal GFP fusion with GhAQPs driven by 35S promoter was constructed and the free GFP vector was used as a positive control (Fig. 5). Transient expression in tobacco leaves was performed by agroinfiltration. The GhPIP1,2, GhPIP2,3 and GhNIP5,1 were detected in the plasma membrane, GhTIP1,1 and GhTIP2,1 were detected in the vacuole membrane, while the GhSIP1,3, GhXIP1,1 and 35S-GFP signals were detected in both the nucleus and plasma membrane. It’s indicated that the subcellular localization of AQPs was consistent with the results predicted in WoLFPSORT.
Gene structure and motif analysis of AQPs in Gossypium
The exon-intron structures play crucial roles in plant evolution. To validate if the exon-intron structures influence GhAQPs evolution in cotton, we employed an online tool GSDS (Gene Structure Display Server 2.0 (gao-lab.org)) (Fig. S3). Most GhPIP subfamily members have four exons except for six members (GhPIP2,6_D, GhPIP1,2_A) which have five and ten exons, respectively. In the TIP subfamily, most members showed three exons. For an instance, GhTIP1,3_A, GhTIP1,3_D, GhTIP1,1_A, GhTIP1,1_D, GhTIP1,2_A, GhTIP1,2_D had three exons, but GhTIP4,1_D, GhTIP2,3_D had four exons. In the NIP subfamily, genes have five exons except for GhNIP5,1_A, GhNIP6,1_A, GhNIP7,1_A, GhNIP7,1_D. In the SIP subfamily, most of the genes had three exons, while GhSIP1,2_D had 12 exons and GhSIP1,2_A, GhSIP1,3_A, GhSIP1,4_D, GhSIP1,5_A, GhSIP1,5_D had only one exon. In the XIP subfamily, most of the members had three exons but GhXIP2,1_D just had one exon. The number of exons of 56 GrAQPs varied from 1 to 5 which was similar to 54 GaAQPs. In detail, most PIPs (20 GaPIPs and 20 GrPIPs) had four exons, most TIPs (10 GaTIPs and 10 GrTIPs) had three exons, most NIPs (8 GaNIPs and 8 GrNIPs) had five exons, most SIPs (3 GaSIPs and 3 GrSIPs) and XIPs (2 GaXIPs and 2 GrXIPs) had three exons. Together, these results indicated that the gene structure in each GhAQPs subfamily is conserved in Gossypium.
To investigate the functional roles of aquaporins in upland cotton, ten motifs among the 221 AQP proteins were screened by default parameters in MEME online tools (Fig. S4). Details of the ten motifs can be found in Fig. S5. Out of the ten screened motifs, eight were conserved in the PIP subfamily, four were detected in the SIP subfamily and six each were observed to be conserved in TIP, NIP, and XIP subfamilies respectively. We found that motif 1, 4, 5, and 6 appeared in all subfamilies. Motif 2, 3, 7, and 10 existed specifically in the PIP subfamily, motif 9 in the TIP subfamily, motif 8 in TIP and NIP subfamily. Moreover, we discovered that, three GhPIPs (GhPIP1,1_A/D GhPIP1,2_A/D, GhPIP1,3_A/D) lack motif 7, one (GhPIP2,13_A/D) deficient of motif 8 and one (GhPIP2,6_D) absent of motif 3, respectively. We found in Gossypium that, the structure and properties of GhAQPs in each subfamily were conserved against evolution, however, they vary between subfamilies.
Chromosomal location and duplication of AQPs in G. hirsutum
Based on the complete genome sequence of cotton, we deployed MapChart software to map the AQP genes to the chromosomes (Fig. 2). From the mapping analysis, we discovered that AQP genes are unevenly distributed on the chromosomes with the number of AQP genes on each chromosome varying from one to nine. In G. arboreum, AQP genes were found in 12 chromosomes and one scaffold except in chromosome A12. Chromosome A10 contained eight of the GaAQPs genes and was considered as having the largest number of chromosomes. In G. raimondii, AQP genes were found to be distributed on 13 chromosomes and a maximum number of nine predicted GrAQP genes were present on chromosome D11. The identified GhAQP genes were distributed randomly on 25 chromosomes and nine unmapped scaffolds. We also found that both chromosomes A10 and D10 contained the maximum number of eight AQP genes in G. hirsutum as showed in Fig. 2. There were 48 AQPs located on 12 At chromosomes, 54 AQPs on 13 Dt chromosomes, and nine located on each of the scaffolds. The number and location of GhAQPs in At sub-genome were similar to Dt sub-genome (Table S3), indicating the hybridization event by A and D diploid species. The majority of AQP genes were located on the proximate or the distal ends of the chromosomes. The physical locations of the AQP genes in the genome of Gossypium reflected the diversity and complexity of this gene family.
Genome duplication events existed in the process of plant evolution, which may result in the expansion of gene families. However, in G. hirsutum, the numbers and locations of duplicated genes were highly symmetric (Fig. 2, Table S5). A total of 115 duplicated gene pairs of AQPs that shared relatively high sequence similarities (aa identity>90%) were identified in Gossypium, indicating that AQPs were duplicated frequently. Among them, 10, 13, and 92 duplicated gene pairs were found in the genome of G. arboreum, G. raimondii, and G. hirsutum, respectively.
In G. arboreum, G. raimondii, and G. hirsutum 9, 9, and 84 gene pairs were confirmed to be segmentally duplicated genes, suggesting that segmental duplication played a crucial role in the expansion of the AQP gene family in allotetraploid cotton species G. hirsutum (Fig. S2, Table S5). Duplicated genes could undergo pseudogenization, sub-functionalization, or neo-functionalization during the progress of evolution (Cui et al. 2017). To further explore the different selective constraints on duplicated GhAQPs, the Ka/Ks ratio for 115 pairs of duplicated genes were performed (Table S5). All the values of Ka/Ks between gene pairs were less than 1, suggesting that, AQPs genes undergone strong purifying selection pressure after segmental duplication and WGD (whole genome duplication).
Expression pattern of GhAQPs under salt and osmotic stress
We found that, the expression patterns of homologous GhAQP genes on At and Dt sub-genomes were similar in salt or osmotic stress. (Fig. 4). Most GhAQPs (genes in the red box) belonging to the PIP subfamily were induced rapidly after 200 mM NaCl treatment and continued to be up-regulated until 3 h after treatment. A few GhAQPs (genes in the blue box) were induced at 3 h after salt stress and were continuously up-regulated until 12 h. Only three gene pairs (GhPIP1,9_A/D, GhPIP2,1_A/D, GhSIP1,2_A/D) were continuously up-regulated after salt stress. The expression pattern of most genes under osmotic stress and salt stress were consistent, except for GhPIP1,2_A, GhPIP1,2_A/D.
To validate the expression pattern of GhPIPs identified in upland cotton in response to salt or osmotic stress, we performed qRT-PCR (Fig. 5A). For easy description, we used GhPIP to represent the GhPIP_A/D gene pair. Among the 21 gene pairs, most of GhPIP genes were significantly induced after 150 mM NaCl treatment, while nine genes (GhPIP1,6, GhPIP1,11, GhPIP2,1, GhPIP2,11,GhPIP1,2, GhPIP1,4, GhPIP2,8, GhPIP2,10, GhPIP2,13) were not detectable. A total of ten GhPIP gene pairs (GhPIP1,3, GhPIP1,7, GhPIP1,9, GhPIP1,10, GhPIP2,2, GhPIP2,3, GhPIP2,5, GhPIP2,6, GhPIP2,7, GhPIP2,9) were up-regulated after salt stress. Among the ten up-regulated, five GhPIP genes (GhPIP1,10, GhPIP2,2, GhPIP2,3, GhPIP2,5, GhPIP2,7) showed the highest expression at 3 h, which is consistent with the RNA-seq data. Six GhPIP gene pairs (GhPIP1,1, GhPIP 1,5, GhPIP 1,8, GhPIP 2,4, GhPIP 2,12 and GhPIP 2,14) were alternately up- and down-regulated throughout the treatment time courses, however, these genes showed a radical up-regulation at 48 h. Three genes (GhPIP2,4, GhPIP2,12 GhPIP2,14) were significantly down-regulated at 12 h, while GhPIP1,8 was down-regulated at 3 h. GhPIP1,1 showed less down-regulation at 1 h and remain unchanged to 12 h, but up-regulated sharply at 48 h. GhPIP1,5 gene remains down-regulated from 1 h to 3 h, after which it was up-regulated to 48 h.
We observed four genes (GhPIP1,2, GhPIP2,3, GhTIP1,1, GhTIP2,1) were up-regulated at early stages under drought stress, and to verify their expression levels, we performed qRT-PCR (Fig. 5B). The expression of these genes reached the peak value at 2 h after osmotic stress except for GhPIP1,2, suggesting that these genes might be involved in osmotic stress response. Together this result showed that GhPIPs genes may be involved in abiotic stress in cotton
Cis-regulatory elements of GhAQP genes
Cis-regulatory sequences are linear nucleotide fragments of non-coding DNA that exist in front of the promoter region. The cis-regulatory elements of promotor have various functions depending on their types, locations, and orientation. To expound the potential function of GhPIPs, 1,500 bp upstream sequences of GhPIPs were extracted and used to predict cis-elements in promoter regions against PlantCARE database. A total of 272 cis-regulatory elements were detected in GhPIPs and GhTIPs. Of which, 85% were core promotor elements or binding sites of DNA binding protein. These elements respond to development and stress response, we found that there are 53% of motifs involved in light response, 23% involved in plant hormone-responsive, 10% involved in other-stress responsive, and 14% were other motifs whose functions were not defined. These results demonstrated that GhPIPs and GhTIPs have multiple roles in cotton developmental processes, as well as the abiotic stress response. Here, we focus on five cis-regulatory elements responding to abiotic stresses, which included ARE, a cis-acting regulatory element essential for the anaerobic induction, LTR, cis-acting element involved in low-temperature responsiveness, MBS, MYB binding site involved in drought inducibility, HD-Zip 1, motif involved in differentiation of the palisade mesophyll cells, WUN-motif, wound-responsive element (Fig. S7). GhPIP genes possessed at least one stress-response-related cis-element. We discovered that most of the GhPIP genes had an ABRE element that participated in ABA-responsive signaling pathways in response to salt stress (Fig. S3).
Silencing of GhPIP2,7 decreased salt tolerance in cotton
All plants-silenced GhChlI showed typical photobleaching phenotype in newly developed leaves, which indicated that the VIGS system was applied successfully in GX100-2 (Fig. S8). We also examined the expression levels of three genes (GhPIP2,2, GhPIP2,3, GhPIP2,7) in gene-silenced plants by qRT-PCR. The gene expression was significantly decreased in gene-silenced plants than that in mock (plants transformed in empty vector) (Fig. S8).
To further evaluate the phenotype of the gene-silenced plants, we treated with 400 mM NaCl for 12 days in the pot experiment. We observed a significant decrease in the biomass and leaf area of plants under salt stress treatment than in normal conditions. The salt injury symptoms of GhPIP2,7-silenced plants were more severe than in GhPIP2,2- and GhPIP2,3-silenced plants (Fig. 6A). Similar results were observed in 2-week-old gene-silenced and mock plants when they were subjected to 400 mM NaCl treatment. However, leaves of gene-silenced plants showed different degrees of chlorosis after 400 mM NaCl treatment for four days (Fig. 6B). We observed a significant decrease in chlorophyll content of GhPIP2,7-silenced plants after 400 mM NaCl treatment than in GhPIP2,2- and GhPIP2,3-silenced plants (Fig. 6C). MDA (Malondialdehyde) is the product of peroxidation reaction, which indicates the degree of peroxidation of the cell membrane and the strength of the stress reaction. To further understand the effect of salt stress on gene-silenced plants treated in 400 mM NaCl, we tested the MDA content, POD, and SOD activities of the leaves at the three-leaf stage of cotton (Fig. 6D-F). The salt treatment (400 mM NaCl) was dramatically increased in MDA content and POD activities in leaves, while decreased the SOD activities except for GhPIP2,2-silenced plants. The MDA content was higher in gene-silenced plants than in mock, while the trend was reversed after treating with 400 mM NaCl except for GhPIP2,7-silenced plants (Fig. 6D). The POD activity of gene-silenced plants showed no significant difference compared to the mock before salt stress but decreased significantly in mock plants after 400 mM NaCl treatment for 12 days. The POD activity in the leaves of GhPIP2,7-silenced plants was extremely and significantly down-regulated than in mock (Fig. 6E). The SOD activity of GhPIPs-silenced plants showed significant or extremely significant differences compared with mock without salt treatment, among which GhPIP2,7-silenced plants showed extremely significance difference than mock plants (Fig. 6F). The SOD activity of GhPIP2,2-silenced plants was increased after treating with 400 mM salt stress, while decreased in GhPIP2,3- and GhPIP2,7-silenced plants (Fig. 6F). The results indicated that silencing of GhPIP2,7 significantly decreased resistance to salt stress in cotton.
Overexpression of GhTIP2,1 enhanced drought tolerance in Arabidopsis
To evaluate the function of GhTIP2,1 in response to salt and osmotic stresses, one-week-old GhTIP2,1 overexpressed lines (OE1 and OE2) and wild-type (WT) seedlings were transferred to 1/2 MS medium containing 15% PEG6000 and 150 mM NaCl, and root length were measured after one week. Under control conditions, both OE and WT seedlings grew normally and had similar root lengths. Root growth was inhibited in the two OE lines and WT after NaCl or PEG treatment, but the two OE lines were less sensitive to PEG than the WT (Figure 8A and B). Additionally, we generated the transformed Arabidopsis root cells transient expressing GhTIP2,1:GFP, and the signals in cells were detected in the vacuole (Fig. 8C). The expression pattern of GhTIP2,1 was determined by analyzing transgenic plants harboring the GhTIP2,1 promoter driving the expression of the GUS reporter gene. GUS activity was strongly expressed in cotyledons, rosette leaves, and roots in the control (Figure 8D). After 10% PEG6000 treatments and 200 mM NaCl, the GUS signal was weak in the cotyledons and rosette leaves compared with the control seedlings. GUS staining was inconspicuous in the leaves when treated with 20% PEG6000. These results indicated that GhTIP2,1 was highly expressed in roots, but down-regulated in leaves after salt stress and osmotic stress.
In addition, we further evaluate the role of GhTIP2,1 under salt and osmotic stresses, we analyze the antioxidant enzyme activity in transgenic plants and WT (Fig. 8E-G). To evaluate the role of GhTIP2,1 in the oxidative stress pathway, proline, MDA, and H2O2 contents was measured in GhTIP2,1 transgenic plants and WT. The results showed that transgenic plants accumulated less MDA and H2O2 than in WT under both salt and osmotic stresses. Also, the proline content was significantly higher in transgenic plants compared to WT under salt and osmotic stresses. Together, these results showed that GhTIP2,1 transgenic plant enhanced osmotic tolerance compared to WT cotton.
To study the molecular mechanism by which GhTIP2,1 contributes to abiotic response, the expression pattern of stress-responsive genes was determined using qRT-PCR (Figure 8H-J). Under control conditions, the transcription levels of stress-responsive genes, including AtNHX, AtLEA, and AtP5CS, showed no significant difference among OE lines and WT. However, the transcription of these genes in WT and OE lines was induced after 200 mM NaCl treatment. We observed that the transcription levels of these genes were substantially higher in the OE lines than in WT, indicating that GhTIP2,1 increases the tolerance to salt and osmotic stresses by upregulating the expression of stress-responsive genes.