Illumina and PacBio sequencing data analysis
To construct a comprehensive regulation network in response to drought stress, control and treated samples were performed SMRT sequencing on the PacBio platform and next-generation sequencing (NGS) on the Illumina Hi-Seq platform. High-quality RNA was extracted from above tissues of “Desiree” after PEG-6000 treatments (0 h, 1 h, 3 h, 6 h, 12 h, 24 h and 48 h) for Illumina Hi-Seq and pool together for construction of Iso-Seq librariy. After removing low-quality reads and adapter sequences, 38,347,790-46,349,352 reads were obtained and more than 80% sequences were mapped the reference genome (Table 1). The Pearson’s correlation coefficient showed that all correlation values between the three replicates ranged from 0.734 to 0.925, indicating a perfect positive correlation and that the sequencing results could be used for further analysis (Fig. 1). For Iso-Seq, the PacBio platform generated 33.24 Gb clean data. Among these data, 464,332 circular consensus (CCS) read were obtained using Iso-seq pipeline with min full pass >3 and min predicted accuracy > 0.9. Then, 375,387 full length non chimeric (FLNC) sequences were determined by searching for the polyA tail signal and the 5’ and 3’ cDNA primers and 121,434 high-quality consensus sequences were obtained by clustering the FLNC sequences. SMRT sequencing can offer better transcript integrity (up to 50 kb in length) than NGS and has greatly contributed in maize (Wang, Tseng et al. 2016), populous (Chao, Gao et al. 2019) oilseed rape (Yao, Liang et al. 2020) and sorghum (Mace, Tai et al. 2013, Abdel-Ghany, Hamilton et al. 2016), but no literature has been reported in potato. Significantly longer isoforms were detected by SMRT than by NGS assembly, demonstrating better isoform integrity (Fig. 2a). In addition, the overall expression levels of transcripts detected by SMRT were higher than those detected by NGS, indicating better quantification accuracy (Fig. 2b).
Analysis of gene expression and transcriptomic changes in response to drought treatment
DESeq was used for differential expression analysis between biologically duplicated sample groups. False Discovery Rate (FDR) was used as key indicators for differential expression gene screening. Fold Change≥2 and FDR<0.01 were used as screening criteria. In total, 31,637 genes were detected in the library of six samples drought treated at different times, with common genes and unique genes between six libraries and the number of DEGs in D24h samples was the highest (Fig. 3a and Supplementary Table S4). Moreover, almost all of DEGs in the libraries showed moderate expression levels with only a small percentage of the genes expressed in high levels (Fig. 3b). As shown in Fig. 3c, more than 88% of the DEGs had a -3＜fold change ≤3. Transcription factors (TFs) play important roles in plant abiotic stress responses (Jian, Ma et al. 2018). To further analyze the function of the DEGs, we used iTAK software to predict transcription factors in DEGs. In total, 664 DEGs belonging to 28 TF families were identified. The TF family members were unevenly distributed. Seven TF families, bHLH (69), MYB (60), ERF (53), bZIP (46), NAC (43) WRKY (35) and HD-ZIP (33) accounted for more than 51% (339/664) of all TFs (Fig. 3d). Plant hormones, such abscisic acid (ABA), auxin (IAA), and cytokinin (CK), play vital roles in abiotic stresses (Verma, Ravindran et al. 2016). In total, 919 DEGs belonging to eight classifications of plant hormones, and ABA, IAA and CK-related genes accounted for the majority, accounting for about 94% of all hormone-related genes (Fig. 3e). To combine the hormone profiling data with the expression profiles of genes involved in hormone metabolism and signaling, we analyzed 231 genes involved in plant hormone metabolism and visualized them as heat maps after functional classification (Fig. 4). Heat map results showed that DEGs were most involved in plant hormone signaling.
Gene profiling in response to drought stress in potato
To explore drought stress response mechanisms in potato, the expression patterns were calculated (Fig. 5a). Moreover, 98,624 genes were conducted using the K-means algorithm. Fourteen clusters were obtained and sixty-three pathways were enriched, such as carbon metabolism, carbon fixation in photosynthetic organisms, porphyrin and chlorophyll metabolism, protein processing in endoplasmic reticulum and oxidative phosphorylation (Fig. 5b and Fig. 5c). Furthermore, WGCNA was also performed to screen genes response to drought stress by detecting molecular modules. Thirteen modules were generated and more than one module was significantly correlated with each sample. Blue, darkred, midnightblue, orange, cyan, lightgreen, and darkgreen were significantly correlated with DCK, D1h, D3h, D6h, D12h, D24h and D48h, respectively (Fig. 5d and Fig. 5e). Then, KEGG enrichment analysis were conducted to explore the enriched pathways in each module correlated to each sample (Fig. 5f). Two pathways, galactose metabolism (ko00052) and fatty acid metabolism (ko01212), were enriched in more than two treated samples. Plant-pathogen interaction (ko04626) and glutathione metabolism (ko00480) were enriched in early stages after drought stress. In addition, through the comparative analysis of the control group and the other six samples processed in different periods, a total of 12,798 differentially expressed genes after deduplication were found (Fig. S1, Supplementary Table S2 and S3). The functional enrichment of these DEGs during the six drought treatment period was conducted (Fig. 6a) and twenty-nine DEGs were selected to confirm the transcriptome sequencing data using the real-time (RT)-PCR (Fig. 6b).
Putative roles of AS for drought stress regulation in potato
AS plays critical roles in abiotic stress response by increasing genetic diversity in plants (Ling, Alshareef et al. 2017, Laloum, Martin et al. 2018). However, little information about AS involved in drought stress response has been obtained in potato. In this study, 21,756 AS events were categorized into five patterns, including 12,453 retained intron (RI), 2,493 skipping exon (SE), 236 mutually exclusive exons (MX), 4,481 alternative 3′ splicing sites (A3), and 2,093 alternative 5′ splicing sites (A5) using Isoform sequencing technology, which yields long reads without assembly (Fig. 7a). Among these AS events, IR events predominated, accounting for 57.2%, followed by A3, SE, A5 and MX (Fig. 7b), which was consistent with findings of previous reports in other plant species (Yan, Gao et al. 2019). Four random selected genes with AS events were successfully validated by reverse transcription (RT)-PCR (Fig. 7c).
To identify the co-expression network of genes involved in AS events, the WGCNA package was used and three significant modules were identified (brown, blue and turquoise). The CK sample correlated to the brown module (r=0.79, P=2e-5), whereas the turquoise (r=0.71, P=3e-4) and blue (r=0.77, P=5e-5) modules were significantly correlated with samples after 6 and 24 h treatments, respectively (Fig. 7d). KEGG enrichment of the significant modules was also performed. Six, eleven and four pathways were significantly enriched in brown, turquoise and blue modules. The galactose metabolism (ko00052) was significantly enriched in turquoise and blue modules, which were treated with PEG-6000 after 6 and 24 h, respectively (Fig. 7e).
Putative roles of lncRNA involved in drought stress regulation in potato
LncRNA is a major component in transcriptome sequencing, and plays important roles in the regulation process after transcription (Rinn, Chang et al. 2012, Kwenda, Birch et al. 2016, Muret, Klopp et al. 2017, Li, Liu et al. 2021). However, few information about lncRNA in potato has been obtained. In this study, 3,345 lncRNAs with various lengths were identified using composite computational algorithm based on the PacBio sequencing data (Fig. 8a, b). More than 54% of lncRNAs were located in the intergenic region while less than 7% were located in the antisense region according to reference annotations (Fig. 8c). To identify the co-expression network of target genes of lncRNAs, the WGCNA package was used and yellow and brown modules were significantly enriched (Fig. 8d). KEGG analysis indicated that the spliceosome (ko03040) and ubiquitin mediated proteolysis (ko04120) pathway were most enriched in CK and drought stress after 6 h samples, respectively.
Alternative polyadenylation analysis
In eukaryotes, polyadenylation is an important transcriptional modification in mRNAs (Elkon et al., 2013; Bentley, 2014). Alternative polyadenylation can easily increase the transcripts variety and regulate gene expression by producing different transcripts (Shen et al. 2011; Sherstnev et al., 2012; Fu et al., 2016). The iso-seq data allowed us to detect polyadenylation sites in potato using the program TAPIS (Abdel-Ghany, Hamilton et al. 2016). In this study, 26,153 poly (A) sites from 13,010 genes detected in the Iso-Seq data were identified (Supplementary Table S6). On average, 2.01 poly (A) sites per gene were found and 856 genes had at least five poly (A) sites (Fig. 9a). In addition, a clear nucleotide bias of the flanking region of all poly (A) sites was found. To confirm this phenomenon, we investigated the motifs of 50 nucleotides upstream of predominant sites using MEME analysis. Furthermore, three conserved polyadenylation signals, i.e. TTTTGT, GCCCCC and GGGGCG were overrepresented in upstream of poly (A) cleavage sites (Fig. 9b).