Identification of the PEBP gene family in potato
Fifteen PEBP genes (StPEBP1-StPEBP15; Table 1) were identified in the recently updated version DMV6.1of the potato reference genome [23] using the Hidden Markov models (HMM) method [24] to search for PBP domain for PEBP family members using the Pfam database number ‘PF01161’[25]. The members of these 15 StPEBP genes family were named according to their chromosomal locations, starting from Chromosome 1 (Figure 1). The 15 genes identified in the present study were located on chromosome 1 (three PEBP genes), 3 (three), 5 (three), 6 (one), 9 (three) and 11 (two). The protein sequences of these 15 StPEBP genes were validated by SMART(http://smart.embl.de/) and NCBI CDD (https://www.ncbi.nlm.nih.gov/cdd/) tools. SMART confirmed that the StPEBP3, StPEBP5, StPEBP6, StPEBP9, StPEBP14, StPEBP15 had the PBP domain, and NCBI CDD classified all 15 StPEBP genes from DM v6.1 to the PEBP superfamily. Theoretical isoelectric point (pI) was lowest in PEBP-like proteins (5.21 for StPEBP13; 5.53 for StPEBP12) and FT proteins (from 5.74 for StPEBP7 to 6.9 for StPEBP15), and great variation in TFL proteins (from 5.73 for StPEBP1 to 9.07 for StPEBP3) (Table 1). PEBP-like proteins had the smallest molecular weight but considerable variation was observed in the molecular weight among TFL proteins (Table 1). The nomenclature correspondence of the potato PEBP genes identified in our study to the names of potato PEBP genes published in the literature is presented in Table 1.
Table 1. Fifteen PEBP genes identified in potato
Name
|
Protein ID in DM v6.1(this study)
|
DMP ID from PGSC_DM_v3.4_pep_representative.fasta(PGSC0003+)
|
DMT from PGSC_DM_v3.4_pep_representative.fasta(PGSC0003+)
|
DMG from DM3.4 gff(PGSC0003+))
|
ID in Abelenda et al.[12] by BLASTn
|
Subfamily
|
pI
|
MW
|
Function and former name
|
Motif
|
StPEBP1
|
Soltu.DM.01G006970.1
|
|
|
|
Not found (but 74% coverage to Sotub01g016190 CEN1
|
TFL
|
5.73
|
22849
|
CEN 1
|
1.3.4.5.7.10
|
StPEBP2
|
Soltu.DM.01G006990.1
|
|
|
|
Sotub01g016180 CEN1
|
TFL
|
8.71
|
19586
|
CEN 1
|
1.2.3.4.5.7
|
StPEBP3
|
Soltu.DM.01G007030.1
|
DMP400020755
|
DMT400030575
|
DMG400011707
|
Sotub01g016150 CEN1
|
TFL
|
9.07
|
19612
|
CEN 1
|
1.2.3.4.5.7
|
StPEBP4
|
Soltu.DM.03G011110.1
|
|
|
|
Sotub03g010860 SP3D
|
FT
|
6.74
|
20090
|
Flowering locus T (StSP3D)
|
1.2.3.4.5
|
StPEBP5
|
Soltu.DM.03G017110.1
|
DMP400025227
|
DMT400037143
|
DMG400014322
|
Sotub03g014740 CEN1
|
TFL
|
8.95
|
19402
|
CEN1 (StCEN1)[20]
|
1.2.3.4.5
|
StPEBP6
|
Soltu.DM.03G033490.1
|
DMP400009953
|
DMT400014409
|
DMG400005654
|
Sotub03g032570 MFT
|
MFT
|
8.48
|
19040
|
MFT
|
1.2.3.4.5.10
|
StPEBP7
|
Soltu.DM.05G024030.1
|
|
|
|
Sotub05g026730 SP5G-A
|
FT
|
5.74
|
19665
|
Flowering locus T (StSP5G-A)
|
1.2.3.4.5
|
StPEBP8
|
Soltu.DM.05G024040.1
|
|
|
|
Sotub05g026750 SP5G-B
|
FT
|
5.76
|
19824
|
Flowering locus T (StSP5G-B)
|
1.2.3.4.5
|
StPEBP9
|
Soltu.DM.05G026370.1
|
DMP400040404
|
DMT400060057
|
DMG400023365
|
Sotub05g028860 SP6A
|
FT
|
6.9
|
19606
|
Flowering locus T (StSP6A) [40]
|
1.2.3.4.5
|
StPEBP10
|
Soltu.DM.06G029780.1
|
DMP400012606
|
DMT400018307
|
DMG400007111
|
Sotub06g031280 SP/TFL1
|
TFL
|
8.39
|
19433
|
Protein SELF-PRUNING
|
1.2.3.4.5
|
StPEBP11
|
Soltu.DM.09G003550.1
|
DMP400062201
|
DMT400090526
|
DMG400040097
|
Sotub09g010600 SP9D
|
TFL
|
7.97
|
16657
|
SP9D
|
1.2.4.5
|
StPEBP12
|
Soltu.DM.09G008810.1
|
DMP400011109
|
DMT400016032
|
DMG400006267
|
No found
|
PEBP-like
|
5.53
|
18047
|
PEBP-like
|
6.8.9
|
StPEBP13
|
Soltu.DM.09G008890.1
|
|
|
|
No found
|
PEBP-like
|
5.21
|
10122
|
PEBP-like
|
6.8.9
|
StPEBP14
|
Soltu.DM.11G004040.1
|
DMP400028268
|
DMT400041725
|
DMG400016179
|
Sotub11g010050 SP5G-like
|
FT
|
6.83
|
19837
|
Flowering locus T (StSP5G-like)
|
1.2.3.4.5
|
StPEBP15
|
Soltu.DM.11G004050.1
|
DMP400028269
|
DMT400041726
|
DMG400016180
|
No found
|
FT
|
6.9
|
20011
|
Flowering locus T (StSP5G-like) [40]
|
1.2.3.4.5
|
Note: The 15 StPEBP genes identified from sequence analysis and annotation of the DM v6.1 reference genome http://solanaceae.plantbiology.msu.edu/dm_v6_1_download.shtml and were named according to their chromosomal locations. DMT and DMP IDs were from the PGSC protein sequences of PGSC_DM_v3.4_pep_representative.fasta (of the DM v3.4 version of the potato reference genome). DM_1-3_516_R44_potato.v6.1.hc_gene_models.pep.fa were also analyzed but results were identical to the analysis of the representative proteins except all protein sequence database had some duplicates. Gene ID with ‘Sotub’ were according to Abelenda et al. 2014[12]; The correspondence between the StPEBP ‘Gene ID in SolTub ID’ were identified using NCBI-blastn; ‘pI’ means theoretical isoelectric point, ‘MW’ means Molecular weight [27]. The enes names starting with Sotub- were according to ‘Function’: the annotation information and were confirmed by both Uniprot analysis and STRING analysis of protein sequences. ‘GO annotation’ and ‘KEGG annotation’: annotation information from KOBAS[30]. ‘PEBP-like’ in Function: Phosphatidylethanolamine binding protein’. StPEBP12, StPEBP13 and StPEBP15 of the present study were not in the 15 gene list of Abelenda et al. 2014[12] according to BLASTn comparison (no significant similarity
Comparison between the 15 PEBP genes identified in the present study and the 15 PEBP genes annotated and labelled as PEBP genes in the potato DM v6.1 reference genome
We searched the DM v6.1 reference potato genome using http://solanaceae.plantbiology.msu.edu/ with the keyword “PEBP”. Among the15 items found by searching the reference genome (Table S1), we found that that the Soltu.DM.11G004050.2 was on the list of genes, but it is from the alternative splicing of Soltu.DM.11G004050 of the same gene PGSC0003DMG400016180 ( StPEBP15 gene). The StPEBP10 and StPEBP11 genes were not annotated as PEBP genes in the DM v6.1 reference genome but labelled as centroradialis (for CEN gene in or list) despite these genes had the PBP domain. Therefore, these two genes should be classified as PEBP genes, as we did in the present study. The “Soltu.DM.01G007000.1” gene was annotated and labeled as a PEBP gene in the DM v6.1 potato genome, and this gene was picked up as a PEBP gene when we searched the DM v6.1 with the keyword “PEBP”. However, we did not identify this gene as a PEBP gene because it does not contain a PBP domain (Table S1).
#: Searched the DM v6.1 using http://solanaceae.plantbiology.msu.edu/ with the keyword “PEBP”. Among the genes found by searching the reference genome, the Soltu.DM.11G004050.2 was on the list of genes found by the search, but it is from the alternative splicing of Soltu.DM.11G004050 of the same gene PGSC0003 DMG400016180, the StPEBP15 gene.
Comparison between the PEBP genes identified by us from the DM v6.1 reference genome and from the DM v3.4 reference genome
From the analysis of the protein sequences of the PGSC_DM_v3.4_pep_representative.fasta file, we identified nine PEBP proteins, including PGSC0003DMP400020755 for StPEBP3, and PGSC0003DMP400028269 for StPEBP15 (please see the full list in Table 1). These nine genes corresponded to StPEBP3, StPEBP5, StPEBP6, StPEBP9, StPEBP10, StPEBP11, StPEBP12, StPEBP14, StPEBP15 genes of the present study. Six StPEBP genes (StPEBP1, StPEBP2, StPEBP4, StPEBP7, StPEBP8, StPEBP13) were found in the DM v6.1 gene/protein sequence database but not in the DM v3.4 gene/protein sequence database. We also analyzed the PGSC_DM_v3.4_pep.fasta database and classified 10 transcripts/proteins as PEBP, but two of the transcripts (PGSC0003DMT400037142 and PGSC0003DMT400037143) or proteins (PGSC0003DMP400025226 and PGSC0003DMP400025227) were from the same gene (PGSC0003DMG400014322). Therefore, only 9 PEBP genes were annotated as PEBP genes in the version DM v3.4 reference genome and 6 PEBP genes were not annotated in that former version.
Comparison between the PEBP genes identified in the present study and the 15 PEBP genes in the literature
Four of the 15 StPEBP genes (StPEBP1, StPEBP12, StPEBP13, and StPEBP15; Table 1) identified in the present study were novel compared to the genes in the previous study [12]. In the BLASTn search for these four different genes: StPEBP1 had only 74% coverage with Sotub016190 CEN1 [12] ; StPEBP12 and StPEBP13 did not correspond directly to any of the former 15 PEBP genes [12] ; and the StPEBP15 had only 79.82% identity with the previous gene Sotub11g010050 [12] (Table 1).
KEGG and gene ontology (GO) analysis of the 15 StPEBP genes identified in the present study
The KEGG analysis identified that StPEBP4, StPEBP7, StPEBP8, StPEBP9, StPEBP14 and StPEBP15 genes are related to the circadian rhythm of plants (sot04712, circadian rhythm - plant). GO annotation identified StPEBP3 and StPEBP5 as regulators of plant flowering (GO: 0009908, flower development; GO: 0010228, vegetable to reproducible phase transition of meristem), and StPEBP6 as a regulator for ABA response and seed germination (GO: 0009737, response to abscisic acid; GO: 0010030, positive regulation of seed). And StPEBP11 was annotated to be involved in nucleus (GO: 0005634), cytoplasm (GO: 0005737), negative regulation of flower development (GO: 0009910), and vegetable to reproducible phase transition of meristem (GO: 0010228).
Phylogenetic analysis and motif detection of the PEBP family genes in potato
The neighbor-joining phylogenetic tree of 59 PEBP genes from potato, tomato, Arabidopsis, rice, and apple constructed using the MEGA7 software revealed the relationship of potato PEBP proteins in relation to those in other model plants (Figure 2). The results suggested that 15 StPEBP genes could be classified into four subfamilies (Figure 2). StPEBP4, StPEBP7, StPEBP8, StPEBP9, StPEBP14 and StPEBP15 were classified in the FT subfamily. StPEBP1, StPEBP2, StPEBP3, StPEBP5, StPEBP10, and StPEBP11 were closely related to AtTFL1 (TFL subfamily). StPEBP6 was closely related to AtMFT and could be classified in the MFT subfamily. StPEBP12 and StPEBP13 were far away from the other StPEBP genes in the phylogenetic tree, and they were classified into PEBP-like subfamily (Figure 2). This is consistent with the annotation results of UniProt and STRING.
The phylogenetic tree constructed based on 25 StPEBP protein sequences from potato, 15 from the DM v6.1 data and 10 from the SolTub_3.0.44 data (Figure 3) clarified the relationships between the genes/gene names in the present study and those in the previous version SolTub_3.0.44. StPEBP1、StPEBP3, and StPEBP5 were found to be homologous to PGSC0003DMT400030582, PGSC0003DMT400030575, and PGSC0003 DMT400037143,respectively. These three genes were annotated as StCEN1. StPEBP10, and StPEBP11 were found to be close to GSC0003DMT400018307, and StSP9D (T90526), respectively. StPEBP2 was partly homologous to PSC0003DMT400030575. All of these six genes (StPEBP1,StPEBP5,StPEBP2, StPEBP3, StPEBP10, and StPEBP11) are members of the TFL subfamily. StPEBP4, StPEBP9, StPEBP7, StPEBP8, StPEBP14, and StPEBP15 genes were found belonging to the FT subfamily, and the closest homologues were FLOWERING LOCUS T protein (PGSC0003DMT400060057,StSP6A) and FLOWERING LOCUS T (PGSC0003DMT400041725, PGSC0003DMT400041726). StPEBP6 of the MFT subfamily was found to be homologous to StMFT (PGSC0003DMT400014409). StPEBP12 and StPEBP13 were closely related to phosphatidylethanolamine binding protein (PGSC0003DMT400016032).
Motif detection of the potato PEBP family genes
Motif analysis of 34 PEBP proteins from Arabidopsis thaliana,tomato and potato using MEME online tools (https://meme-suite.org/) identified a total of 10 motifs (named as No. 1-10; Table S2). We found that each PEBP gene contains 3-6 motifs. Eight StPEBP genes (StPEBP4, StPEBP45, StPEBP47, StPEBP48, StPEBP49, StPEBP410, StPEBP414, and StPEBP415) had five motifs, motif 1, 2, 3, 4 and motif 5. StPEBP2 and StPEBP3 had motif s1-5 and motif 7, and StPEBP6 consist of motifs 1-5 and motif 10. StPEBP1 had motifs 1, 3, 4, 5, 7 and motif 10. StPEBP11 had motifs 1, 2, 4, and 5 (Table 1).
Note: the sequence of 34 PEBP proteins were submitted to the MEME website (https://meme-suite.org/) for motif identification. Ten motifs were conserved in PEBP genes of potato, tomato, and Arabidopsis thaliana.
All six Arabidopsis thaliana PEBP proteins (AtPEBP) are composed of motifs 1, 2, 3, 4, and 5. The structure of eight tomato PEBP proteins was similar to that of the AtPEBP proteins in Arabidopsis. Two tomato CEN1 genes (Solyc01009580、Solyc01009560) contained motifs 1-5 and motif 7, while other three FT-like tomato PEBP genes (Solyc05055660, Solyc11008650, Solyc11008660) contained three or four motifs (Figure 4).
Expression patterns of StPEBP genes in heat-stressed leaves
In real-time quantitative reverse transcriptase polymerase chain reaction (qRT-PCR) analysis of the expression levels of StPEBP genes in leaves using their specific primers (Table S3) identified relative activities (expression levels) of the genes in leaves and which genes were differentially expressed between Day 0 (control) and treated plants (Table 2). The cDNA sequences were too similar to have separate specific primers between StPEBP2 and StPEBP3 (both were TFL genes), between StPEBP7 and StPEBP8 (both were FT genes), and between StPEBP12 and StPEBP13 (both were PEBP-like genes). None of these three groups of similar genes showed high activities from the qRT-PCR analysis in leaves (Table 2). At Day 0, the expression of StPEBP15 in leaves was most active among the 15 StPEBP genes. StPEBP14 was the second most active gene in leaves. The expression level of StPEBP11 and StPEBP12 (StPEBP13 together) was too low (CT > 35) to be detected both for Day 0 control leaves and Day 1 and Day 4 of heat-stressed leaves (Table 2).
Note:UBI-F, UBI-R are primers of housekeeping gene used as the internal control[41]. The StPEBP1, 2, 6, 7, 13 had no specific qRT-PCR primers designed because qPrimerDB [32] could not generate specific qRT-PCR regions from the specific regions of their cDNA sequences. The primer pair StPEBP3-F and StPEBP3-R could amplify both StPEBP2 and StPEBP3 (both are TFL genes). Primers for StPEBP8 could also amplify StPEBP7 (both are FT genes). Primers for StPEBP12 can also amplify StPEBP13 (both are PEBP-like genes).
Table 2. The expression levels of PEBP genes in heat stressed mature leaves and roots of potato
Gene Name
|
Expression levels in leaves
|
|
|
Expression levels in roots
|
|
|
|
Day 0
|
Day 1
|
Day 4
|
Day 7
|
Day 0
|
Day 1
|
Day 4
|
Day 7
|
StPEBP3
|
1.2 f
|
#
|
#
|
14.7 f **
|
23.5 dc
|
12.2 dc
|
16.1 dc
|
35.8 d-b
|
StPEBP4
|
3.9 f
|
#
|
#
|
5.03 f
|
9.4 dc
|
5.0 dc
|
5.7 dc *
|
8.4 dc
|
StPEBP5
|
59.2 f
|
240.8 f
|
#
|
95.1 f
|
7.3 dc
|
17.9 dc
|
22.8 dc
|
36.5 d-b
|
StPEBP8
|
88.3 f
|
76.4 f
|
142.3 f
|
191.9 f *
|
18.5 dc
|
39.5 d-b
|
19 dc
|
39.7 d-b
|
StPEBP9
|
5.4 f
|
#
|
#
|
4.8 f
|
5.5 dc
|
#
|
6.8 dc
|
10.1 dc
|
StPEBP10
|
11.2 f
|
27.7 f
|
#
|
75.5 f *
|
14.7 dc
|
22 dc
|
72.4 b *
|
293.1 a *
|
StPEBP11
|
#
|
#
|
#
|
1.96 f
|
35.8 d-b
|
5.3 dc
|
18.4 dc
|
47.9 cb
|
StPEBP12
|
#
|
#
|
#
|
1.89 f
|
4.5 dc
|
4.9 dc
|
3.7 dc
|
10.4 dc
|
StPEBP14
|
1571.1 e
|
145.1 f **
|
50.6 f **
|
405.3 f**
|
10.8 dc
|
16 dc
|
7.1 dc
|
17.1 dc
|
StPEBP15
|
8793.9 a
|
3517.6 d **
|
4863.5 c **
|
7708.6 b
|
7.1 dc
|
2.3 d
|
5.7 dc
|
13.5 dc
|
Note:#: means the gene activity was too low to be detected and its Ct value of qRT-PCR is greater than 35. “StPEBP3” primers may pick transcripts of StPEBP1, StPEBP2 and StPEBP3 genes because too high similarity among these three sequences. Similar situation existed between StPEBP7 and StPEBP8 and also between StPEBP12 and StPEBP13. ‘Expression’ represents the average value of 2 - Δ CT x 100000 of three biological replicates. One-way ANOVA was followed by Duncan’s multiple range test, with the 5% significance level; the expression of leaves and roots were analyzed separately. *: The expression level was significantly different from Day 0 of the same gene in the same organ. The values followed by different letters in a row for a gene are significantly differently expressed (P < 0.05).
Under heat stress, both StPEBP14 and StPEBP15 genes were down-regulated on all Day 1, Day 4, and Day 7, but StPEBP3 (StPEBP2), StPEBP7/StPEBP8 (formerly StSP5G genes) and StPEBP10 genes were up-regulated in potato leaves (Table 2) on Day 7. StPEBP15 was still the most active gene in leaves among all genes under heat stress even through its activity was down-regulated (Table 2). In our transcriptome-wide analysis of leaves of the ‘Russet Burbank’ plants using the second-generation sequencing (Illumina), heat stress deceased the expression activities of StPEBP5, StPEBP9, StPEBP14, and StPEBP15 (Table S4).
Note: ‘#’means the difference in expression level is not significant.’RB1’ is according to our previous study on the expression level in ‘Russet Burbank’ leaves after three days of heat [33]; ‘Agria’ gene expression is according to the study in leaves of the whole plant under heat stress [42]; ‘Desiree’ gene expression is according to the gene expression level of leaves [21]; ‘RB2’ is according to our previous study on heat stressed ‘Russet Burbank’ tubers (the Illumina read sequences are available in NCBI BioProject ID:PRJNA578671; under publication elsewhere).
Expression patterns of StPEBPs in heat-stressed roots
In the control plants, activities of all 10 genes or groups of genes that had specific primers designed were detectable by qRT-PCR in the roots (Table 2). The gene StPEBP11, whose expression was not detectable in leaves, was the most active StPEBP gene in roots (Table 2).
Under heat stress conditions in roots, the expression of StPEBP10 was increasingly up regulated with the prolongation of stress, and it reached the peak on Day 7 of the heat treatment. The gene StPEBP4 was found to be downregulated on Day 4 after the starting of the heat treatment. The expression of other StPEBP genes was still detectable in heat-stressed roots but heat stress did not induce significant changes (Table 2).
Expression patterns of StPEBP genes in heat-stressed tubers
In the heat-stressed tuber transcriptome of the potato cultivar ‘Russet Burbank’, DESeq analysis showed that the expression of StPEBP9 (PGSC0003DMT400060057, formerly StSP6A) appeared to be mildly downregulated in response to heat stress (log2fc = - 1.5) (Table S4). We did not find other StPEBP genes to be differentially expressed in response to heat stress in the whole transcriptome.