Genome-wide identification of PEBP gene family members in potato, their phylogenetic relationships, and expression patterns under heat stress

The phosphatidylethanolamine-binding protein (PEBP) gene family is involved in regulating many plant traits. Genome-wide identification of PEPB members and knowledge of their responses to heat stress may assist genetic improvement of potato (Solanum tuberosum). We identified PEBP gene family members from both the recently-updated, long-reads-based reference genome (DM v6.1) and the previous short-reads-based annotation (PGSC DM v3.4) of the potato reference genome and characterized their heat-induced gene expression using RT-PCR and RNA-Seq. Fifteen PEBP family genes were identified from DM v6.1 and named as StPEBP1 to StPEBP15 based on their locations on 6 chromosomes and were classified into FT, TFL, MFT, and PEBP-like subfamilies. Most of the StPEBP genes were found to have conserved motifs 1 to 5. Tandem or segmental duplications were found between StPEBP genes in seven pairs. Heat stress induced opposite expression patterns of certain FT and TFL members but involving different members in leaves, roots and tubers. The long-reads-based genome assembly and annotation provides a better genomic resource for identification of PEBP family genes. Heat stress tends to decrease FT gene activities but increases TFL gene activities, but this opposite expression involves different FT/TFL pairs in leaves, roots, and tubers. This tissue-specific expression pattern of PEBP members may partly explain why different potato organs differ in their sensitivities to heat stress. Our study provides candidate PEBP family genes and relevant information for genetic improvement of heat tolerance in potato and may help understand heat-induced responses in other plants.


Introduction
Climate change has increased the frequency and extent of heat stress conditions for plants and animals, including crop plants. Potato (Solanum tuberosum) plants are very susceptible to heat stress, which can induce many changes in plant growth, especially tuber growth [1,2]. Plant morphological structure, flowering, and various other biological processes including seasonal gene expression patterns are known to be regulated phosphatidylethanolamine-binding protein (PEBP) gene family members [3][4][5][6]. Therefore, identification and characterization of PEBP gene family members is necessary for understanding functional plant biology and effects of abiotic stress on plants.
The PEBP gene family, generally divided into three subfamilies (FT for flowering locus T, TFL for terminal flower, and MFT for mother of FT and TFL) [7,8], has been identified in various plants, including Arabidopsis thaliana [9], maize (Zea mays) [10], rice (Oryza sativa) [11], tomato (Lycopersicum esculentum) [12,13], wheat (Triticum spp). [14], Chrysanthemum [15] and blueberry 1 3 (Vaccinium sp.) [16]. Some studies also identified a PEBPlike subfamily in plants [14,17]. StSP6A, a PEBP family gene in potato, is a FT-like gene, which likely plays a key role in the initiation of tuber formation and tuber expansion in potato [18]. StCEN1, which belongs to the TFL subfamily, is associated with the signaling of abscisic acid and cytokinin, and regulates the bud growth rate after the release of tuber dormancy in potato [19]. Overexpression of StCEN1 can also inhibit tuber formation [20]. According to quantitative reverse-transcription polymerase chain reaction (qRT-PCR) analysis of certain PEBP family genes, heat-caused changes in PEBP family gene expression appeared to delay flowering in chrysanthemum [15] and it is likely responsible, to a certain degree, for the inhibition of potato tuberization [21]. Therefore, identification and characterization of PEBP gene family members and their expression patterns, especially in response to heat stress, in potato is important for understanding potato's response to climate change conditions. It is still unclear which members of the PEBP family genes are most active in major organs (such as leaves, roots, and tubers) of the potato plants. Root system is very important for potato plant performance under heat stress but there is very little information available for comparative expression activities of PEBP family genes in roots, regardless of whether under ambient or heat stress conditions. Phylogenetic relationship of PEBP family genes among various plant species including potato was reviewed [12], but the potato genes in this review did not have an annotated potato gene number (PGSC, for potato genome sequence consortium) and apparently were from a preliminary version of the potato reference genome [22] and the review did not report the method used to identify these genes [12]. Furthermore, the previous version of the potato genome was assembled using short reads and represented only 86% of the 844 mb genome of homozygous doubled-monoploid (DM) potato clone [22]. Now, an updated version (DM v6.1) of the potato genome of the same doubled-monoploid clone DM1-3 516 R44 has become available in 2020, which is based on the Oxford Nanopore Technologies long reads coupled with proximity-by-ligation scaffolding, yielding a chromosome-scale assembly [23]. Thus, genome-wide analysis of this new version of the potato genome is required to identify all of the PEBP family genes and to compare them with the PEBP family genes reported previously.
In the present study, we analyzed the protein database (proteins for all coding sequences and representative proteins) of the new genome version (DM v6.1) and the previous version (DM v3.4) and compared the PEBP family genes identified from the gene annotation in both the genome versions as well as with the potato PEBP family genes reported in the literature. The purpose was to identify the members of the PEBP gene family in potato from the updated potato reference genome [23], characterize their phylogenetic relationships/classification and conserved motifs, and analyze their gene expression patterns in different tissues in response to heat stress.

Plant material
The plantlets of potato cultivar 'Atlantic' was used in the qRT-PCR analysis at the Gansu Agricultural University. The cultivar 'Russet Burbank' was used for the whole transcriptome analysis of both leaves and tubers of potato plants under heat stress at the Fredericton Research and Development Centre, Agriculture and Agri-Food Canada, Fredericton, Canada.

Identification of PEBP family genes in potato
We used Hidden Markov models (HMM) method [24] to identify PEBP gene family members from DM_1-3_516_ R44_potato.v6.1.hc_gene_models.pep.fa of the DM v6.1 potato genome. The target protein domain was PBP, and the Pfam database number was 'PF01161' [25]. In order to improve the accuracy of the identification, we used the new annotation data DM v6.1 of the potato genome (http:// solan aceae. plant biolo gy. msu. edu) [23], and searched for potential family members from the potato protein sequence files using the hmmsearch tool. The sequences of potential PEBP proteins were submitted to SMART [26] and NCBI CDD (https:// www. ncbi. nlm. nih. gov/ cdd/) to search for the PBP domains. PEBP family members with the Pfam PBP domain were used for further analysis.
The annotation information from the SolTub_3.0.44 version of genome annotation has been used for a long time, and many previous studies and open databases are based on this version. Therefore, we also analyzed the PGSC_DM_ v3.4_pep_representative.fasta (amino acid sequences corresponding to the representative CDS file) and the PGSC_ DM_v3.4_pep.fasta (amino acid sequences corresponding to all gene coding sequences) of the DM v3.4 annotation version of the DM v4.04 potato reference genome. We identified the PEBP gene family members in the PGSC gene database of the SolTub_3.0.44 genome annotation by using the same method as outlined above.
All the PEBP family genes identified from the annotation information of the potato genome version DM v6.1 were named based on their chromosome locations and added "St" prefix to represent Solanum tuberosum. The protein sequences of the identified members of the PEBP gene family were submitted to Expasy website (https:// web. expasy. org/) to calculate theoretical PI and molecular weight [27] of the proteins coded by these StPEBP genes. UniProt [28] and STRING [29] online tools were used to annotate the identified StPEBP genes by protein sequence alignment, and KOBAS website [30] was used to perform GO and KEGG annotations.

Chromosomal location, phylogenetic analysis and motif detection of the PEBP family genes in potato
The location of the PEBP gene family members on potato chromosomes was extracted from the DM v6.1 annotation of the potato genome [23]. Then MapChart software was used to conduct mapping of these genes on individual chromosomes.
To study the divergence and evolutionary relationship of the potato PEBP family genes with those from potato and other plant species, we constructed a phylogenetic tree among the protein sequences of these StPEBP genes and PEBP family genes of other species, including Arabidopsis thaliana [9], rice (Oryza sativas sp. japonica) [11], apple (Malus × domestica Borkh.) [31] and tomato (Solanum lycopersicum) [12]. The PEBP protein sequences of five species including potato were aligned by Cluster W method (default parameter) using the software MEGA7 software. The Neighbor-Joining (NJ) method was used to construct a phylogenetic tree, 'test of phylogeny' as 'bootstrap method 1000', 'model / method' as 'Poisson model', 'rates along sites' as 'uniform rates' and 'gaps / missing data treatment' as 'partial deletion'.
Highly similar genes were classified into the same group/ subfamily based on the phylogenetic tree grouping of the PEBP family genes identified from the DM v6.1 database and the PGSC (DM v3.044 database. Each of the PEBP family genes in the DM v6.1 was used to design primers for the analysis of the expression patterns of PEBP family genes. The sequences of PEBP proteins from Arabidopsis thaliana, tomato and potato were submitted to MEME website (https:// meme-suite. org/) for motif analysis, using 'select the number of motifs' set to 10, and other parameters were set as the default. The result was imported into TBtools to generate the motif distribution map.

Multiple collinearity analysis for duplication of StPEBP genes and comparison with Arabidopsis thaliana PEBP family genes
MCScanX [32] was used to elucidate the segmental duplication events and synteny of the PEBP family genes between potato and Arabidopsis. MEGA7 [33] and KaKs Calculator 2.0 [34] were used for the estimation of Ka (non-synonymous)/Ks (synonymous) ratios among the StPEBP genes. The Ka/Ks values and divergence between AtPEBP and StPEBP genes were calculated using the same method to further estimate the evolutionary relationship between PEBP family genes of the two species. The criteria for inferring tandem duplications between PEBP family genes were according to Liu et al. [35]: (a) the length of the shorter aligned sequence covered 70% or more of the longer sequence, and (b) the similarity of the two aligned sequences was 70% or higher, and (c) the two genes located in the same chromosomal fragment with maximum distance of 100 kb and separated by five or fewer genes. The criteria (a) and (b) were referred from Yang et al. [36], and the third criterion was from Wang et al. [37].

Expression analysis of PEBP family genes under control and heat stress conditions
We used leaves and roots of plantlets of cv Atlantic grown in hydroponic conditions to study the expression pattern of StPEBP gene family members in leaves and roots under heat stress. First, the plantlets of 'Atlantic' were transferred into glass bottles (6 cm × 15 cm) and cultured on solid MS medium (3% sucrose). The environmental parameters were set to 22 °C, 16 h of light per day with light intensity of 10,000 LUX. After 30 days of culture, the aseptically grown seedlings were transferred to a hydroponic tank containing Hoagland culture medium. In the first week of hydroponic culture, the plants were covered with transparent water cup to keep the air humidity around the plants and avoid wilting due to strong leaf transpiration. After a week-long acclimation, the water cup was removed, and the culture conditions were set to 23 °C, light intensity 10,000 LUX for 16 h per day. Each hydroponic tank contained eight liters of Hoagland medium, and each tank contained 12 potato plants. The medium was changed every three days.
After 45 days of hydroponic culture, the seedlings were treated with high temperature. The temperature of heat stress treatment was day (light) 35 °C/night (dark) 28 °C. The hydroponic solution was also changed on the third day of heat stress treatment. The samples were taken at Day 0, Day 1, Day 4, and Day 7 after starting the heat stress treatment, and the samples of Day 0 was taken before the beginning of heat-stress treatment, which serves as plants without heat stress. The samples included fully opened functional leaves and roots. All samples contained three biological repeats, i.e., from three different potato plants. The sampled leaves were the fifth to seventh fully expanded leaf from the top to the bottom, and 'root' refers to the relatively new growing root part accounting approximately for ½ of the root length. The tissues were immediately wrapped with tin foil and put into liquid nitrogen for freezing of ten minutes. Then, the tissue samples were stored in a -80 °C freezer for RNA analysis.

3
Transzol up kit of TransGen Biotech was used for total RNA extraction. Transcript one step gDNA removal and cDNA synthesis supermax Kit was used for cDNA synthesis. Real time quantitative PCR was used to examine the expression levels of StPBEP genes before (Day 0, control) and after 1, 4 and 7 days of heat treatment. Real time quantitative PCR primers were designed by using primer-blast of NCBI and qPrimerDB website [38].
Real time quantitative PCR (qRT-PCR) was performed using TransStart Top Green qPCR SuperMax and Roche LightCycler 96 system. The three-step procedure was as follows: denaturation at 95 °C for 10 s, annealing at 57 °C for 20 s, extension at 72 °C for 30 s and 50 cycles were used. The ΔCT of StPEBP genes and internal reference gene in the three biological repeats were calculated and standardized (2 −ΔCT * 100,000), and then the average value of the three biological repeats was used to represent the final expression levels of the gene.
The IBM SPSS statistics 22 was used to run One-way ANOVA and Duncan test of the qRT-PCR data from each organ (leaves and roots) for all StPEBP genes to identify which StPEBP genes were the most active. The significance level was set at P < 0.05. Student t-test (P < 0.05, two tails) was also used to determine whether the expression levels of each gene was significantly increased or decreased in comparison with the expression levels of the same gene in the same organs sampled from the Day 0 plants (immediately prior to the start of the heat treatment).

Transcriptome-wide differential expression analysis of StPEBP genes in "Russet Burbank" leaves and tubers in response to heat stress
We used the Illumina whole transcriptome sequences of the control and heat-stressed plants and tubers of cultivar Russet Burbank to examine the expression patterns of the identified StPEBP genes in response to heat stress. The methods used for RNA extraction, library construction, Illumina sequencing and reads cleaning, mapping and DESeq test of both control plants and heat-treated plants were described previously [39]. RNA sequencing was performed using the Illumina HiSeq4000 PE-100 bp platform. We conducted those studies to identify and characterize transcriptome-wide differentially expressed genes in response to heat stress. In the present study, we used that transcriptome resource for evaluating the expression patterns specifically of the StPEBP genes in response to heat stress. The unimap pipeline of TBSPG [40] was used to trim off the primers, clean the poor quality reads and map against the PGSC_DM_v4.03_transcriptupdate representative.fasta database. The differential gene expression analysis for the StPEBP genes was performed using DESeq [41]. The Illumina sequencing and analysis of tuber RNA has been described in details previously [42], and the read sequences of tubers are available in the short read database of NCBI (BioProject ID:PRJNA578671).

Identification of the PEBP gene family in potato
Fifteen PEBP family genes (StPEBP1-StPEBP15; Table 1 Top panel) were identified in the recently updated version DMV6.1of the potato reference genome [23] using the Hidden Markov models (HMM) method [24] to search for PBP domain for PEBP family members using the Pfam database number 'PF01161' [25]. The members of these 15 StPEBP genes family were named according to their chromosomal locations, starting from Chromosome 1 (Fig. 1A). The 15 PEBP family genes identified in the present study were located on chromosome 1 (three PEBP family genes), 3 (three genes), 5 (three genes), 6 (one genes), 9 (three genes) and 11 (two genes) (Fig. 1). The protein sequences of these 15 StPEBP genes were validated by SMART(http:// smart. embl. de/) and NCBI CDD (https:// www. ncbi. nlm. nih. gov/ cdd/) tools. SMART confirmed that the StPEBP3, StPEBP5, StPEBP6, StPEBP9, StPEBP14, StPEBP15 had the PBP domain, and NCBI CDD classified all 15 StPEBP genes from DM v6.1 to the PEBP superfamily. Theoretical isoelectric point (pI) was lowest in PEBP-like proteins (5.21 for StPEBP13; 5.53 for StPEBP12) and FT proteins (from 5.74 for StPEBP7 to 6.9 for StPEBP15), and greatly varied in TFL proteins (from 5.73 for StPEBP1 to 9.07 for StPEBP3) ( Table 1 Top panel). PEBP-like proteins had the lowest molecular weight, but considerable variation was observed in the molecular weight among the TFL proteins (Table 1 Top panel). The nomenclature correspondence of the potato PEBP family genes identified in our study to the names of potato PEBP family genes published in the literature is presented in Table 1 Top panel.

Gene duplication and synteny
Duplication events for seven pairs of the PEBP family genes were identified using MCScanX (Table 1 Bottom panel). Two pairs of StPEBP genes-the pair of StPEBP2 and StPEBP3 and the pair of StPEBP7 and StPEBP8were identified as tandemly duplicated (Table 1 Bottom panel). The remaining five pairs were identified as segmental duplication events (Table 1 Bottom panel). These nontandem duplications were not on the same chromosomes (Table 1 Bottom panel, Fig. 1B). The genes involved in   [22]. Similarly for v3.4 mRNA transcript and proteins: P400020755 is the abbreviation of PGSC0003DMP400020755 (from PGSC_DM_v3.4_pep_representative.fasta), and T400030575 is the abbreviation of PGSC0003DMT400030575 (from PGSC_DM_v3.4_pep_representative.fasta). DM_1-3_516_ R44_potato.v6.1.hc_gene_models.pep.fa were also analyzed but results were identical to the analysis of the representative proteins except all protein sequence database had some duplicates. The gene ID of BLASTn-detected homologs with 'Sotub' were according to Abelenda et al. [12]; The correspondence between the StPEBP 'Gene ID in SolTub ID' were identified using NCBI-BLASTn; 'pI' means theoretical isoelectric point, 'MW' means Molecular weight [27]. 'PEBP-like' in Function: Phosphatidylethanolamine binding protein'. StPEBP12, StPEBP13 and StPEBP15 of the present study were not in the 15 gene list of Abelenda et al. [12]  these five non-tandem duplications were not located in the highly replicated regions (Fig S1). The Ka/Ks ratio was 0.9 for the pair StPEBP8 and StPEBP7, suggesting nearly neutral selection; but all other six pairs of genes had Ka/ Ks ratios less than 0.2, suggesting existence of stabilizing selection against change (  (Table S1).
We also compared the PEBP family genes identified from the DM v6.1 reference genome and from the DM v3.4 reference genome. From the analysis of the   StPEBP2, StPEBP4, StPEBP7, StPEBP8, StPEBP13) were found in the DM v6.1 gene/protein sequence database but not in the DM v3.4 gene/protein sequence database. We also analyzed the PGSC_DM_v3.4_pep.fasta database and classified 10 transcripts/proteins as PEBP, but two of the transcripts (PGSC0003DMT400037142 and PGSC0003DMT400037143) o r p rot e i n s ( P G S C 0 0 0 3 D M P 4 0 0 0 2 5 2 2 6 a n d PGSC0003DMP400025227) were from the same gene (PGSC0003DMG400014322). Therefore, only 9 PEBP family genes were annotated as PEBP family genes in the version DM v3.4 reference genome and 6 of the 15 PEBP family genes we identified were not annotated in that former version.

Comparison between the PEBP family genes identified in the present study and the 15 PEBP family genes in the literature
Four (StPEBP1, StPEBP12, StPEBP13, and StPEBP15; Table 1 Top panel) of the 15 StPEBP genes identified in the present study were novel compared to the genes in the previous study [12]. In the BLASTn search for these four different genes: StPEBP1 had only 74% coverage with Sotub016190 CEN1 [12]; StPEBP12 and StPEBP13 did not correspond directly to any of the former 15 PEBP family genes [12]; and the StPEBP15 had only 79.82% identity with the previous gene Sotub11g010050 [12] (Table 1 Top panel).

KEGG and gene ontology (GO) analysis of the 15 StPEBP genes identified in the present study
The KEGG analysis identified that StPEBP4, StPEBP7, StPEBP8, StPEBP9, StPEBP14 and StPEBP15 genes are related to the circadian rhythm of plants (sot04712, circadian rhythm-plant). GO annotation identified StPEBP3 and StPEBP5 as regulators of plant flowering (GO: 0009908, flower development; GO: 0010228, vegetable to reproducible phase transition of meristem), and StPEBP6 as a regulator for ABA response and seed germination (GO: 0009737, response to abscisic acid; GO: 0010030, positive regulation of seed). And StPEBP11 was annotated to be involved in nucleus (GO: 0005634), cytoplasm (GO: 0005737), negative regulation of flower development (GO: 0009910), and vegetable to reproducible phase transition of meristem (GO: 0010228).

Phylogenetic analysis and motif detection of the PEBP family genes in potato
The neighbor-joining phylogenetic tree of 59 PEBP family genes from potato, tomato, Arabidopsis, rice, and apple constructed using the MEGA7 software revealed the relationship of potato PEBP proteins in relation to those in other model plants ( Fig. 2A). The results suggested that 15 StPEBP genes could be classified into four subfamilies ( Fig. 2A). StPEBP4, StPEBP7, StPEBP8, StPEBP9, StPEBP14 and StPEBP15 were classified in the FT subfamily. StPEBP1, StPEBP2, StPEBP3, StPEBP5, StPEBP10, and StPEBP11 were closely related to AtTFL1 (TFL subfamily). StPEBP6 was closely related to AtMFT and could be classified in the MFT subfamily. StPEBP12 and StPEBP13 were far away from the other StPEBP genes in the phylogenetic tree, and they were classified into the PEBP-like subfamily ( Fig. 2A). This was consistent with the annotation results of UniProt and STRING.
The phylogenetic tree constructed based on 25 StPEBP protein sequences from potato, 15 from the DM v6.1 data and 10 from the SolTub_3.0.44 data (Fig. 2B) clarified the relationships between the genes/gene names in the present study and those in the previous version Sol-Tub_3.0.44. StPEBP1, StPEBP3, and StPEBP5 were found to be homologous to PGSC0003DMT400030582, P G S C 0 0 0 3 D M T 4 0 0 0 3 0 5 7 5 , a n d PGSC0003DMT400037143, respectively. These three genes were annotated as StCEN1. StPEBP10, and StPEBP11 were found to be close to PGSC0003DMT400018307, and StSP9D (PGSC0003DMT400090526), respectively. StPEBP2 was partly homologous to PSC0003DMT400030575. All of these six genes (StPEBP1, StPEBP5, StPEBP2, StPEBP3, StPEBP10, and StPEBP11) are members of the TFL subfamily. StPEBP4, StPEBP9, StPEBP7, StPEBP8, StPEBP14, and StPEBP15 genes were found belonging to the FT subfamily, and the closest homologues were FLOWERING LOCUS T protein (PGSC0003DMT400060057, StSP6A) and FLOWERING LOCUS T (PGSC0003DMT400041725, PGSC0003DMT400041726). StPEBP6 of the MFT subfamily was found to be homologous to StMFT (PGSC0003DMT400014409). StPEBP12 and StPEBP13 were closely related to phosphatidylethanolamine binding protein (PGSC0003DMT400016032).
Ortholog analysis of PEBP family genes between Arabidopsis and potato detected that potato genes StPEBP5, StPEBP6, and StPEBP10 and shared homologous relationship with Arabidopsis thaliana genes AtBFT, AtMFT, and AtATC , respectively (Fig. 2C).

Expression patterns of StPEBP genes in heat-stressed leaves
In real-time quantitative reverse transcriptase polymerase chain reaction (qRT-PCR) analysis of the expression levels of StPEBP genes in leaves, using their specific primers (Table S3), identified relative activities (expression levels) of the genes in leaves and which genes were differentially expressed between Day 0 (control) and treated plants (Fig. 4A). The cDNA sequences were too similar A Neighbor-joining phylogenetic tree of 59 PEBP proteins from Arabidopsis, tomato, apple, rice and potato. Purple, blue, red, yellow and green represent genes from tomato, potato, Arabidopsis, rice, and apple, respectively. B Neighbor-joining phylogenetic tree of 25 potato PEBP proteins (15 from our current study and 10 identified earlier) the to examine the correspondence and relationships between 15 PEBP family genes from the DMv6.1 genome and the 10 PGSC genes from the SolTub_3.0.44 genome assemblies. Red, and blue represent genes identified in DMv6.1 data and SolTub_3.0.44 data, respectively. C Comparative orthologous relationships of PEBP family genes from Arabidopsis thaliana and potato  Table S2 for these ten motifs Fig. 4 Expression levels of 10 StPEBP genes in response to heat stress treatment in potato leaves and roots as revealed by qRT-PCR analysis. A: Leaves. B: Roots. Note: "StPEBP3" primers may amplify/pick transcripts of StPEBP1, StPEBP2 and StPEBP3 genes because too high similarity among these three sequences. Similar situation existed between StPEBP7 and StPEBP8 and also between StPEBP12 and StPEBP13. 'Expression' represents the average value of 2 − Δ CT × 100,000 of three biological replicates. * and **: The expression level was significantly different (P < 0.05 and P < 0.01, respectively) from Day 0 of the same gene in the same organ to have separate specific primers between StPEBP2 and StPEBP3 (both were TFL genes), between StPEBP7 and StPEBP8 (both were FT genes), and between StPEBP12 and StPEBP13 (both were PEBP-like genes). None of these three groups of similar genes showed high activities from the qRT-PCR analysis in leaves (Fig. 4A). At Day 0, StPEBP15 in leaves was most active (expressed) among the 15 StPEBP genes. StPEBP14 was the second most active gene in leaves. The expression levels of StPEBP11 and StPEBP12 (StPEBP13 together) were too low (CT > 35) to be detected both for Day 0 control leaves and Day 1 and Day 4 of heatstressed leaves (Fig. 4A).
Under heat stress, both StPEBP14 and StPEBP15 genes were down-regulated on Day 1, Day 4, and Day 7, but StPEBP3 (StPEBP2), StPEBP7/StPEBP8 (formerly StSP5G genes) and StPEBP10 genes were up-regulated in potato leaves (Fig. 4A) on Day 7. StPEBP15 was still the most active gene in leaves among all genes under heat stress even through its activity was downregulated (Fig. 4A). In our transcriptome-wide analysis of gene expression in leaves of the 'Russet Burbank' plants using the second-generation sequencing (Illumina), heat stress deceased the expression levels of StPEBP5, StPEBP9, StPEBP14, and StPEBP15 (Table S4).

Expression patterns of StPEBP genes in heat-stressed roots
In the control plants, activities of all 10 genes or groups of genes that had specific primers designed were detectable by qRT-PCR in the roots (Fig. 4B). The gene StPEBP11, whose expression was not detectable in leaves, was the most active StPEBP gene in roots (Fig. 4B).
Under heat stress conditions in roots, the expression of StPEBP10 was increasingly up regulated with the prolongation of stress, and it reached the peak on Day 7 of the heat treatment. The gene StPEBP4 was found to be downregulated on Day 4 after the starting of the heat treatment. The expression of other StPEBP genes was still detectable in heat-stressed roots but heat stress did not induce significant changes (Fig. 4B).

Expression patterns of StPEBP genes in heat-stressed tubers
In the heat-stressed tuber transcriptome of the potato cultivar 'Russet Burbank', DESeq analysis showed that the expression of StPEBP9 (PGSC0003DMT400060057, formerly StSP6A) appeared to be mildly downregulated in response to heat stress (log2fc = -1.5) (Table S4). We did not find other StPEBP genes to be differentially expressed in response to heat stress in the whole transcriptome.

Discussion
Identification of StPEBP genes detected considerable differences between two versions (DM v3.4 and DM v6.1) of the potato genome annotations for the PEBP family genes. We have identified 15 StPEBP genes in potato based on the latest potato genome annotation, DM v6.1 version (http:// solan aceae. plant biolo gy. msu. edu/ dm_ v6_1_ downl oad. shtml, visited on April 20, 2021). Only 9 of these 15 genes were found in the PGSC annotations (PGSC transcripts, DMT, or protein DMP or genes DMG). Furthermore, only 11 of the 15 PEBP family genes we identified in the latest version of the potato genome were found previously in potato based on the literature review. It was unknown how the 15 potato PEBP were identified previously [12], but only seven of them were officially annotated in the DMT, DMP databases of the annotation version 3.4 of the potato reference genome DM v3.04 and DM v4.04 (http:// solan aceae. plant biolo gy. msu. edu/ pgsc_ downl oad. shtml) ( Table 1 Top panel). Our results demonstrate that compared to DM v6.1, approximately 33% (5/15) of the PEBP family genes were absent in the DM v3.4 annotation of the reference genome of the same potato doubled monoploid DM genotype. Our comparative genomic analysis of the PEBP gene family provides a surprising example that third generation long-read sequencing apparently is more powerful than the combined approach of BAC library and Illumina second generation sequencing for genome decoding in potato.
The StPEBP genes identified in the present study are more nearly complete because we identified several PEBP family genes not found in the previous genome versions, removed false PEBP family genes (no PBP domain) (Table S1), and removed duplicated transcripts from the same genome. Therefore, the 15 StPEBP genes that we identified likely represent a more accurate repertoire of StPEBP genes to date. It is a coincidence that the DMP (DM potato proteins annotated by PGSC) number was also 15 from following three search or analysis approaches: (a) searching with the key word "PEBP", (b) the present study from the analysis of DM v6.1, and the previous reports [12], and (c) the present study from the analysis of the all-cds proteins database of DM v3.4 annotation ( Table 1 Top panel). It appears that the long-reads-based genome assembly and annotation provides a better genomic resource for identification of PEBP and perhaps other genes. We believe that this inconsistency in different genome versions is due to the differences in genome assembly and annotation between the two versions because two different sequencing technologies and platforms were used in two versions of the potato genome [12]. The reference genome (DMv6.1) [23] published in 2020 is based on long reads and has a more complete and comprehensive annotation of the potato genome. Therefore, the 15 StPEBP genes we identified from the DMv6.1 annotation are likely more representative of the actual PEBP family genes in the DM potato clone. These genes provide an excellent genomic resource for various research and applications, especially understanding how the PEBP family genes influence various traits in potato.
The phylogenetic analysis of 59 PEBP family genes from potato, tomato, Arabidopsis, rice and apple, and evolutionary relationships between StPEBPs and Arabidopsis PEBP family genes, suggest that 15 StPEBP genes can be grouped into four subfamilies: FT, TFL, MFT and PEBP-like. We suggest that StPEBP4, StPEBP7, StPEBP8, StPEBP9, StPEBP14 and StPEBP15 genes belong to the FT subfamily. StPEBP7, StPEBP8, StPEBP14 and StPEBP15 were found to be close to SP5G and SP5G-like genes of tomato and can be annotated as SP5G-like genes of potato. It is generally believed that StSP5G can inhibit tuber formation by regulating the expression of StSP6A [12]. On the other hand, StPEBP4 is close to tomato SP3D, and the SP3D protein (or SFT protein) of tomato leaves can be transported to the shoot apical meristem (SAM) through phloem, and then form florigen activation complex (FAC) with 14-3-3 and SPGB protein [12]. It is also known that StSP3D has a similar flowering function [18]. Our study indicates that StPEBP9 is highly similar to SlSP6A and StSP6A (PGSC0003DMT400060057) ( Fig. 2A, B). Our results suggest that StPEBP1, StPEBP2, StPEBP3, StPEBP5, StPEBP10 and StPEBP11 genes belong to the TFL subfamily, and StPEBP6 gene belongs to the MFT subfamily. We found StPEBP6 gene to be genetically close to AtMFT. Our results demonstrate that StPEBP12 and StPEBP13 genes are far away from the other StPEBP genes in the phylogenetic tree; thus, we identified these two genes as members of the PEBP-like subfamily ( Fig. 2A). PEBPlike genes may perform functions different from other PEBP family genes and are likely involved in photoperiod response and regulation of flowering in different cotton genotypes in cotton [17]. Little is known of the PEBP-like genes in potato. We also found that the conserved motifs of the PEBP-like proteins are very different from that of other PEBP family genes. The differences in motifs may provide some clue to the function of PEBP-like genes. Nevertheless, our study probably provides the first evidence for the existence of PEBP-like genes in potato.
The syntenic relationship of StPEBP5, StPEBP6, and StPEBP10 with three Arabidopsis PEBP proteins suggests that genes of these three proteins may have evolved orthologously during evolution (Fig. 2C). The phylogenic tree also supports this suggestion, as genes for these three proteins were placed with clusters with the Arabidopsis PEBP family genes ( Fig. 2A). More noticeably, the proteins for StPEBP6 and StPEBP10 were clustered with those of both Arabidopsis and rice PEBPs. Whereas StPEBP7, StPEBP8, StPEBP14, and StPEBP15 were grouped in a separate cluster that consist of only potato and tomato PEBP family genes (Fig, 2A). These results suggest that StPEBP6 and StPEBP10 may have emerged earlier than StPEBP7, StPEBP8, StPEBP14, and StPEBP15 genes (Fig, 2A), a suggestion that merits further investigation. Our results suggest tandem duplication occurred in two pairs of StPEBP genes and segmental duplication in five pairs of StPEBP genes (Table 1 Bottom Panel).
Gene motif analysis in our study demonstrates that five motifs (motifs 1-5) are conserved in all except two PEBP family genes in potato. The same motifs are also found in most of the PEPB genes in other plants. All of six PEBP family genes in Arabidopsis thaliana have the same motifs 1-5 (Fig. 3). More than 50% of tomato PEBP proteins also have the same motifs (Fig. 3). AtFT, SlSP3D and StPEBP4, homologous to SlSP3D, which are part of FAC (Flowering Activation Complex) [12] also have motifs 1-5. This indicates that motifs 1-5 are the most common conserved sequence of the PEBP proteins and perhaps the structural basis for the PEBP proteins to exert their biological functions, especially to regulate flowering. StSP6A (PGSC0003DMT400060057) is an important gene to induce tuber formation [12], and StPEBP9 gene is homologous to StSP6A gene (Fig. 2B). The potato StPEBP9 gene in our study contains all conserved motifs 1-5 but the tomato SlSP6A (Solyc5g055660) gene was found to have only motifs1, 3, 4, and 5, missing motif 2. This suggests that motif 2 may play a key role in the process of tuber formation. The motif 7 was found only in StPEBP1, StPEBP2, and StPEBP3 genes, which was annotated as StCEN1 and two tomato CEN1 genes (Solyc01g009580, Solyc01g009560). This result suggests that motif 7 may be involved in the core functional components of StCEN1 proteins to repress tuberization.
Gene expression analysis in the present study found that StPEBP14 and StPEBP15, two FT-like genes, were the most active genes, about a thousand-fold more active than StPEBP9 (former name: StSP6A) in leaves of young potato plants under control conditions (Fig. 4A). The results may suggest that StPEBP14 and StPEBP15 play an important role in receiving environmental signals and are involved in regulating potato plant developmental stages. Transgenic and/or gene editing studies may help determine the biological functions of these two genes.
Heat stress considerably changed the expression patterns of PEBP family genes in leaves and detected an opposite expression pattern of some FT and TFL genes: Downregulated two FT genes (StPEBP14 and StPEBP15) but up-regulated two TFL genes (StPEBP3 and StPEBP10) (Fig. 4A). These results are consistent with our previous transcriptome analysis of leaves of cv Russet Burbank under heat stress [39]. In tomato, SlSP5G and SlSP5G-like are floral inhibitors and their expression are controlled by the photoperiod [27]. The potato StPEBP14 (a FT gene) and StPEBP15 (a FT gene) are homologous of SlSP5G and SlSP5G-like ( Fig. 2A), whereas, StPEBP10 (a TFL gene) is likely a negative regulator for flowering according to GO analysis (GO:0009910: negative regulation of flower development). StPEBP10 is also found to be homologous of SlTFL1 (Solyc06g074350 SP/TFL1), which performs as floral inhibitor and has a role in sympodial shoot architecture [12]. Arabidopsis FT is a flowering promoter, but the AtTFL1 plays as a flowering repressor [18]. StCEN1 (PSC0003DMT400037143) can repress tuber formation, whereas StSP6A (PGSC0003DMT400060057) is the core inducer of tuberization [20]. Within the FT gene group, the downregulation of StPEBP14 and StPEBP15 was associated with upregulation of StPEBP8 suggests there might be an activity switch among FT gene members under heat stress (Fig. 4A). Further research is required to investigate what roles these opposite expression patterns (StPEBP14 and StPEBP15 versus StPEBP3, StPEBP8, and StPEBP10) play for flower induction and tuberization under heat stress.
Our gene expression analysis in potato roots found that heat stress in roots also induced an opposite expression patterns between FT and TFL genes but different gene members from the FT/TFL pair in leaves. In leaves, the opposite FT versus TFL pairs were StPEBP14 and StPEBP15 versus StPEBP3 and StPEBP10. In roots, heat stress downregulated FT StPEBP4 gene and upregulated TFL StPEBP10 gene on Day 4 of heat treatment in potato roots (Fig. 4B). Unlike leaves that had very high expression activities of StPEBP14 and StPEBP15, roots were found to have detectable expression activities of 10 StPEBP genes under both non-stressed and heat-stressed conditions, but none of them were highly expressed (Fig. 4B). It is known that heat stress has negative effect on root growth and root activity [43,44]. This gene expression pattern in the present study is in agreement with the published information that FT-like genes and TFLlike genes usually have the antagonistic functions in plant growth and development [45]. Further research is required to understand the roles of this opposite expression patterns between these TFL and FT genes on root growth and heat tolerance of the potato plants. In our study, the expression of TFL gene StPEBP10 was up-regulated significantly in potato roots; therefore, we deduced that StPEBP10 may play a role in regulating root development under heat stress.
Tuber gene expression under heat stress also showed a decreased activity of a FT gene (StPEBP9) and an increased activity of a TFL gene (StPEBP3) (Table S4). Interestingly, we discovered that heat stress downregulated different FT gene members in different organs: StPEBP14 and StPEBP15 in leaves, StPEBP4 in roots, and StPEBP9 in tubers. Our results of the gene expression profiling at the whole transcriptome level in the control and heat-stressed potato tubers of cv Russet Burbank suggest mild downregulation of StPEBP09 gene in response to heat stress in potato tubers. Microarray-based analysis of heat-stressed tubers of the cultivar 'Desiree' by Hancock et al. [21] also found that heatstress caused down-regulation of this gene (fold change was -2.6) (Table S4). Considering the transcriptome data from these two independent experiments, we tentatively conclude that most of the StPEBP genes do not dramatically change their expression levels in potato roots and tubers in response to heat stress.

Conclusion
We have identified 15 StPEBP genes in potato; six of which are novel in terms of the comparison with previous genome versions (v3.4, PGSC), and four were novel in terms of the comparison with the published information in plants. Our study on PEBP gene family supports that the long-readsbased genome assembly and annotation provides a better genomic resource for identification of PEBP and perhaps other genes. The 15 StPEBP genes belong to FT, TFL, MFT and PEBP-like four sub-families of the genes and are located on six chromosomes. Our results suggest tandem duplication of StPEBP genes of two pairs and segmental duplication of StPEBP genes of five pairs. Most of the StPEBP genes have conserved motifs 1 to 5, like Arabidopsis and other plants. However, the conserved motifs of the StPEBP genes belonging to potato PEBP-like proteins were quite different from that of other PEBP family genes, suggesting potentially distinct roles of the PEBP-like genes in potato. We have identified the differentially expressed StPEBP genes in response to heat stress in potato leaves, roots, and tubers. Gene pairs of FT gene downregulation associated with TFL gene upregulation in response to heat stress were detected in all these three organs. Our results on the PEBP gene family, motif identification, phylogenetic classification and expression patterns under heat-stress in potato enhance the knowledge about potato PEBP genes and provided candidate genes for various genomics studies in potato, including genetic engineering of potato plant for heat tolerance.