Identification of PGs in sweetpotato and analysis of the physicochemical properties of the proteins
Using the A. thaliana PG proteins as the query sequences, a total of 103 PGs were identified in the sweetpotato genome and named IbPG001-IbPG103 according to their positions on the chromosomes (Additional file 1: Figure S1 and Additional file 3: Table S1-S2). 103 IbPGs encoded 152 (IbPG061) to 1344 (IbPG042) amino acids, correspondingly. The molecular weights ranged from 16.7 kDa to 151.4 kDa and the average value was 44.7 kDa. Additionally, the isoelectric point fluctuated widely, ranging from 4.66 (IbPG064) to 10.61 (IbPG056), of which 53% was basic and 47% was acidic. Approximately 32% IbPGs were unstable with stability index values greater than 40, the rest of members were more stable. The values of gravy varied from -0.599 (IbPG007) to 0.142 (IbPG091), showing that 88% of the members were hydrophilic proteins (The value of gravy < 0). According to the results of subcellular localization, IbPG042 was on the chloroplast, IbPG021, IbPG076 and IbPG090 were on the cell wall, and all of the others were on the cell membrane (Additional file 3: Table S3).
Phylogenetic analysis and classification of IbPGs
To explore the evolutionary relationships of the PG gene family in I. batatas and A. thaliana, an unrooted phylogenetic tree was constructed by using the protein sequences of 103 IbPGs identified and 68 A. thaliana PGs with the NJ method (Fig. 1). It has been reported that the PGs of A. thaliana were divided into seven subclasses, Clade A-G[29]. By clustering the A. thaliana and I. batatas PGs, it was discovered that the IbPG gene family was split into six subclasses, Clade A-F, with none IbPGs were classified into the Clade G in A. thaliana PG gene family. Specifically, the Clade D (39 IbPGs) contained the largest members of I. batatas PG family. Both Clade A and Clade B were formed of eight members, while the other clades contained 23, 16 and 9 PGs, respectively.
Analysis of gene structure and conserved motifs
To comprehend the structural diversity of these IbPG proteins, the exon-intron structure of each identified PG member was examined (Fig. 2). The findings showed that IbPG proteins had exons with a range of 2 (IbPG044, IbPG061, IbPG077) to 19 (IbPG032). Most IbPGs in Clade A had 6-7 exons except IbPG021 with 16 exons. All IbPGs in Clade B possessed 9 or 10 exons except IbPG082 (11 exons). The members of Clade C had 4-6 exons, while the majority of genes in Clade E contained 6-8 exons except IbPG032 (2 exons). The structure of exons and introns varies between clades, but the position and length of exons and introns were relatively conserved within the same clade[13]. IbPG010 and IbPG012, IbPG015 and IbPG065, IbPG059 and IbPG060, IbPG069 and IbPG072, IbPG070 and IbPG071, IbPG081 and IbPG084 had the same gene structure (Fig. 2c).
In the investigation of the conserved domains of PG protein sequences, domain SPNTDG (I), GDDC (II), CGPGHG (III) and RIK (IV) were discovered to be present. Using the MEME website to identify and analyse the conserved motifs of 103 IbPG proteins, 10 conserved motifs were obtained, named motif 1-10 (Fig. 2b and Additional file 2: Figure S2). Motif 5, 6, 2 and 3 correspond to domain I, II, III and IV, respectively. In detail, none of the four typical conserved domains were present in IbPG002, IbPG029, IbPG031, IbPG051 and IbPG006, while 59 PG proteins possessed all of these domains. There were 11 PG proteins contained domain I, II and IV, but 6 PG proteins had only domain IV. 14 PG proteins possessed domain I and II, domain III and IV were absent in 3 PG proteins, and 5 PG proteins contained domain IV. It was similar to the research, none of the Clade E members had conserved domain III[24]. In conclusion, the IbPGs in the same clade had analogical gene structure and conserved motif compositions, which strongly implied the reliability of the phylogenetic analysis used to classify subfamilies.
Chromosomal location, collinearity analysis, gene duplication and Ka/Ks analysis
In this study, chromosomal localization, collinearity analysis, gene duplication and Ka/Ks analysis of IbPGs were carried out to examine the mechanism of amplification and evolution of the IbPG gene family. By obtaining information about the chromosomal localization of IbPGs, a total of 103 IbPGs were mapped to the 14 chromosomes (Chr2-Chr15) (Additional file 1: Figure S1). One PG gene was found on Chr4 (IbPG013) and Chr15 (IbPG103), 2 PGs were distributed on Chr6 (IbPG020, IbPG021), 23 PGs were found on Chr14. Furthermore, there were 4 gene clusters on Chr7 (IbPG022-IbPG025, IbPG026-IbPG028, IbPG029-IbPG030, IbPG031-IbPG033), followed by Chr9 (IbPG043-IbPG046, IbPG047-IbPG051), Chr12 (IbPG062-IbPG063, IbPG066-IbPG073) and Chr14 (IbPG080-IbPG098, IbPG101-IbPG102).
In addition, collinearity analysis of the IbPG gene family revealed that a total of 21 IbPGs were involved in 11 duplication events, all of which were segmental duplications (Fig. 3a), for example, IbPG003 on Chr2 and IbPG008 on Chr3, IbPG007 on Chr2 and IbPG036 on Chr8, IbPG010 and IbPG012 on Chr3. Therefore, the segmental duplication events played crucial roles in the expansion of IbPGs.
Furthermore, a series of collinearity maps were constructed comparing sweetpotato with other four plants, A. thaliana (Fig. 3b), S. lycopersicum (Fig. 3c), M. domestica (Fig. 3d) and Z. jujuba (Fig. 3e). There were 110 pairs of collinearity relationships and the numbers of orthologous genes in S.lycopersicum, A.thaliana, M.domestica and Z.jujuba were 34, 27, 31 and 18. The largest number of collinearity relationships between I. batatas and S.lycopersicum PGs were probably due to the fact that two of them were annual herbaceous plants.
Then, the One Step MCScanX-Super Fast of TBtools software was used to calculate the Ka values, Ks values and Ka/Ks ratios of these identified collinearity gene pairs (Additional file 3: Table S4-S5). Ka/Ks is the ratio of non-synonymous substitutions (Ka) to synonymous substitutions (Ks) for 2 protein coding genes, and the magnitude of the value can determine whether there is a selection pressure acting on this protein to code the gene[30].
Thus, the ratios reflect the evolutionary selection of the species. From the results, it can be seen that the Ks values of some collinearity gene pairs were ‘Na N’, resulting in Ka/Ks values were ‘Na N’. The existence of ‘Na N’ indicated the majority of synonymous mutation sites on the genes were synonymous, meaning that the sequence divergence is too great and the evolutionary distance is too long[31]. The Ka/Ks values of all collinearity gene pairs were less than 1, which suggested that the IbPG family were mainly influenced by the effect of strong purifying selection in the evolutionary processes.
Analysis of cis‑acting elements in IbPGs
To figure out the potential functions and regulatory mechanisms of IbPG proteins, a total of 26 cis-acting elements associated with different stresses and different hormone responses were identified from 2 kb region upstream of the promoters of the IbPGs using the PlantCARE online website (Fig. 4 and Additional file 3: Table S6). 97 IbPGs contained the light-responsive element and 34 IbPGs contained the defense and stress response element (TC-rich repeats). 29 IbPGs were found to possess cis-acting elements involved in low-temperature responsiveness (LTR). Notably, 6 IbPGs contained wound-responsive element (WUN-motif). There were five hormone response elements in the promoters of the IbPG proteins, which included the auxin-responsive element (IAA), the abscisic acid response element (ABA), the gibberellin response element (GA), the MeJA response element and the salicylic acid response element (SA). Besides, alpha-amylase promoters (RY-element) and seed-specific regulation (A-box) were also present in 4 and 13 IbPGs, respectively.
Gene expression analysis in different tissues
The expression profiles of 103 IbPGs were compared among 8 tissues of the sweetpotato cultivar ‘Xushu 18’, including leaf, stem, proximal end (PE), distal end (DE), root body (RB), root stock (RS), initiative storage root (ISR) and fiber root (FR). The expression levels of the 103 genes were represented by a heatmap, as shown in Fig. 5a and Additional file 3: Table S7. The findings revealed that only one IbPG gene (IbPG034) was expressed in all tissues, but approximately 65% IbPGs were not expressed in any tissue of sweetpotato. 36 IbPGs were expressed in at least one organ, and most of them were expressed in the root tissue (PE, DE, RB, RS, ISR and FR). The highest expression in the leaf was IbPG007, of which value was 3 times higher than it in the stem. The IbPG079 displayed specifically expression in the stem, followed by IbPG103, IbPG034 and IbPG006. Additionally, the IbPG022 exhibited a significantly accumulation of transcripts in the FR with FPKM (Fragments Per Kilobase of exon model per Million mapped fragments) value larger than 10. It was noteworthy that IbPG008 did not expressed in the leaf and FR, in contrast to other genes, it was highest expressed in the RS, DE, ISR, RB and PE with FPKM values 15, 64, 19, 60 and 45 times higher than those expressed in the stem, respectively. These results demonstrated that IbPGs displayed diverse expression patterns, and genes within the same subfamily also expressed differently.
Gene expression analysis under different abiotic stress
To elucidate the possible functions of the IbPGs, Using the published transcriptome data of the FR of ‘Xushu 18’ under drought, salt, cold, SA, MeJa, ABA inductions. The expression patterns of differentially expressed genes (DEGs) under diverse abiotic stress conditions were investigated (Fig. 5b and Additional file 3: Table S8). Overall, most IbPGs experienced changes in their expression levels as the findings of various abiotic stresses. Under the salt and SA treatment, the IbPG006, IbPG038 and IbPG039 from Clade D expressed down-regulated significantly. The expression of the IbPG038 and IbPG039 were down-regulated under the MeJa therapy. When subjected to drought, SA and MeJa treatment, IbPG022 from Clade F showed substantially down-regulated. Meanwhile, IbPG034 of Clade E exhibited down-regulated with the SA treatment but was up-regulated with drought and salt treatment. However, there were 2 IbPGs (IbPG008 and IbPG099) expressed up-regulated to respond to drought treatment from the analysis of transcriptome data.
Gene expression analysis under drought and salt stress
qRT-PCR was performed to investigate the expression dynamics of IbPGs over time under PEG6000-induced drought stress and NaCl-induced salt stress (Fig.6 and Additional file 3: Table S10). The results showed that IbPG006 was significantly down-regulated after the treatments of drought and salt at all time points, and the lowest expression levels observed for IbPG006 corresponded to decrease of 374-fold and 114-fold, respectively. Under the drought treatment, the relative expression levels of IbPG034 and IbPG099 exhibited a tendency of first increasing, then decreasing, generally. IbPG034 showed the lowest expression with 3-fold induction at 3 h, up-regulated at 12 h and 24 h, and down-regulated at 48 h. IbPG099 displayed a trend of up-regulation at 3 h and 12 h and gradually down-regulated at 48 h after drought therapy. In addition, IbPG034 was up-regulated at all periods and performed immediately increased by salt treatment at 3 h, finally expressed at a high level at 48 h. Besides, IbPG099 displayed up-regulated at all time points except 3 h, and was most significantly expressed at 48 h. Collectively, these findings showed that different IbPGs had different response times to drought and salt treatment and they might express differently under abiotic stresses.