Transcriptome sequencing and whole genome expression profiling of purple-fleshed sweetpotato under salt stress

Abstract Background: Purple-fleshed sweetpotato (PFSP) is one of the most important crops in the word which helps to bridge the food gap and contribute to solve the malnutrition problem especially in developing countries. Salt stress is seriously limiting its production and distribution. Due to lacking of reference genome, transcriptome sequencing is offering a rapid approach for crop improvement with promising agronomic traits and stress adaptability. Results: Five cDNA libraries were prepared from the third true leaf of PFSP at seedlings stage (Xuzi-8 cultivar) treated with 200 mM NaCl for 0, 1, 6, 12, 48 hours. Using second and third generation technology, Illumina sequencing generated 170,344,392 clean high-quality long reads that were assembled into 15,998 unigenes with an average length 2178 base pair and 96.55% of these unigenes were functionally annotated in the NR protein database. A number of 537 unigenes failed to hit any homologs which may be considered as novel genes. The current results indicated that sweetpotato plants behavior during the first hour of salt stress was different than the other three time points. Furthermore, expression profiling analysis identified 4, 479, 281, 508 significantly expressed unigenes in salt stress treated samples at the different time points including 1, 6, 12, 48 hours, respectively as compared to control. COG analysis revealed that the highest number of genes was categorized as “post-translational modification, protein turnover, chaperones” followed by “signal transduction mechanisms”. Gene ontology (GO) enrichment analysis showed that salt stress caused changes in metabolic process (Secondary and amino acids), hormone response, cellular processes and single organism processes. These findings suggest that salt stress tolerance in PFSP plants may be mainly depended on three factors including the balance between protein biosynthesis and degradation, preventing or correcting damage caused by miss-folding due to salt stress and transforming the certain stimulus induced by salt

stress into a biochemical signal which activates more genes specifically involved in salt stress tolerance.

Introduction
Sweetpotato (Ipomoea batatas (L.) Lam.), the only crop plant belongs to Convolvulaceae family with starchy storage roots. Purple-fleshed sweetpotato (PFSP) considered to be an important source for anthocyanin which displays strong antioxidant properties [1]. It is also considered as an important staple source of calories and proteins which consumed by all age groups. In terms of agricultural production sweetpotato considered as the seventh most important food crop in the world [2].
Salinity is a global problem caused vast area of lands remaining uncultivated. Exposure of sweetpotato plants to salt stress resulting in problems such as ion imbalance, mineral deficiency, osmotic stress, ion toxicity and oxidative stress [3]. Ultimately, these conditions interact with several cellular components including DNA, protein, lipids and pigments. That's in rule impeding plant development and affect sweetpotato production [4]. Therefore, introducing of salt tolerant sweetpotato cultivar became necessary.
With the fact of environmental stress and climate change there is an urgent need to accelerate crops breeding with higher production and stress tolerance traits [5]. In sweetpotato transcriptome sequencing offers a rapid approach for crop improvement with promising agronomic traits and stress adaptability. Several transcriptome sequencing studies have been conducted on hexaploidy sweetpotato genome [6][7][8]. However, having a complex genome structures (2n=6x=90), sweetpotato still didn't achieve a reference genome which covered a few percent of genome, so still a long way from the reference genome [9].
In the present study, second and third generation sequencing technology were used to establish a useful database of transcriptomes sequencing as well as differentially expressed genes in PFSP leaves under salt stress conditions. In total 102,845,433 high quality reads were assembled into 16,856 transcripts giving 15,998 unigenes. Our results provide novel insights into PFSP response to salt stress and identified numerous specific genes involved in salt stress defense mechanisms. That's in role can be used to guide future efforts towards breeding of sweetpotato salt resistant cultivars.

Plant materials
Xuzi-8, a high quality and early mature cultivar with purple flesh storage roots, was used in this experiment. Its storage roots contain 6% soluble sugar and more than 80 mg anthocyanin /100 g fresh weight. This cultivar was obtained from Xuzhou Institute of Agricultural Sciences in Jiangsu Xuhuai District, China.

Treatments and experimental design
Apical stem cuttings (15-20 cm) of Xuzi-8 cultivar were growing in hydroponics culture using Hoagland nutrient medium for four weeks. Seedlings were exposed to salt stress (200 mM NaCl were added to Hoagland solution). 0.1 g sample was taken from the upper third leaf after 0, 1, 6, 12, 48 hour from salt treated and untreated plants for the next generation sequencing. While, for the third-generation sequencing, one sample was collected by mixing the upper third leaf of sweetpotato seedlings after 1, 6, 12, 48 hour of salt stress and one control samples. All collected samples were immediately frozen in liquid nitrogen and stored at -80 º C until processing for RNA extraction and all treatments were done in triplicate.

RNA extraction
Total RNA was extracted from collected samples with the TRIZOL method (Life technologies, Carlsbad, CA) according to the manufacturer's protocol. RNA degradation and contamination were monitored on 1% agarose gels. RNA purity was checked using the Nanophotometer® spectrophotometer (IMPLEN, CA, USA). RNA concentration was measured using Qubit® RNA Assay Kit in Qubit® 2.0 Fluorometer (Life Technologies, CA, USA). RNA integrity was assessed using the RNA Nano 6000 Assay Kit of the Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, USA).

Library Preparation for Transcriptome Analysis
A total amount of 3 μg RNA per sample was used as input material for the RNA sample preparations. Sequencing libraries were generated using NEBNext® Ultra™ RNA Library Prep Kit for Illumina® (NEB, USA) following manufacturer's protocol and index codes were added to attribute sequences to each sample. Briefly, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was carried out using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer 5X . First strand cDNA was synthesized using random hexamer primer and M-MuLV Reverse Transcriptase RNase H-. Second strand cDNA synthesis was subsequently performed using DNA Polymerase I and RNase H. Remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. After adenylation of 3' ends of DNA fragments, NEBNext Adaptor with hairpin loop structure were ligated to prepare for hybridization. In order to select cDNA fragments of preferentially 150~200 bp in length, the library fragments were purified with AMPure XP system (Beckman Coulter, Beverly, USA). Then 3 μl USER Enzyme (NEB, USA) was used with size-selected, adaptor-ligated cDNA at 37°C for 15 min followed by 5 min at 95 °C before PCR. Then PCR was performed with Phusion High-Fidelity DNA polymerase, Universal PCR primers and Index (X) Primer.
At last, PCR products were purified (AMPure XP system) and library quality was assessed on the Agilent Bioanalyzer 2100 system.

Clustering and sequencing
The clustering of the index-coded samples was performed on a cBot Cluster Generation System using TruSeq PE Cluster Kit v3-cBot-HS (Illumia) according to the manufacturer's instructions. After cluster generation, the library preparations were sequenced on an Illumina platform and paired-end reads were generated.

Data analysis Quality control
Raw data (raw reads) of fastq format were firstly processed through in-house perl scripts.
In this step, clean data (clean reads) were obtained by removing reads containing adapter, reads containing ploy-N and low-quality reads from raw data. At the same time, Q20, Q30, GC-content and sequence duplication level of the clean data were calculated. All the downstream analyses were based on clean data with high quality.

Transcriptome assembly
The data from all libraries/samples were pooled into one big left fq file (read1 files) and right fq files (read2 files) into one big right [10]. Transcriptome assembly was accomplished based on the left and right fq using Trinity [11] with min_kmer_cov set to 2 by default and all other parameters set default.

Gene functional annotation
Gene function was annotated based on the following databases: Nr (NCBI non-redundant protein sequences, https://www.ncbi.nlm.nih.gov/) with E-value cut-off of le-5 Nt (NCBI non-redundant nucleotide sequences https://www.ncbi.nlm.nih.gov/) with E-value cut-off of Based on the NR and Pfam annotations, Blast2GO (v2.5) was used to obtain GO (Gene Ontology) annotations (http://www.geneontlology.org ) according to the molecular functions, biological processes and cellular component ontologies [12]. GO enrichment analysis of the differentially expressed genes (DEGs) was implemented by the GO-seq R packages based Wallenius non-central hyper-geometric distribution [13] which can adjust for gene length bias in DEGs.
KEGG [14] is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecularlevel information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies (http://www.genome.jp/kegg/ ). KOBAS software were used [15] to test the statistical enrichment of DEGs in KEGG. SSR of the transcriptome were identified using MISA (http://pgrc.ipkgatersleben.de/misa/misa.html ), and primer for each SSR was designed using Primer3 (http://primer3.sourceforge.net/releases.php ).

Quantification of gene expression levels and differential expression analysis
Gene expression levels were estimated by RSEM [16] for each sample. Clean data were mapped back onto the assembled transcriptome. Read count for each gene was obtained from the mapping results.
Differential expression analysis of two conditions/groups was performed using the DESeq2R package DESeq2 rovide statistical routines for determining differential expression in digital gene expression data using a model based on the negative binomial distribution. The resulting P values were adjusted using the Benjamini and Hochberg's approach for controlling the false discovery rate. Genes with an adjusted P-value ≤0.001 found by DESeq2 were assigned as differentially expressed [17]. Q-value≤0.001& |log2FC foldchange |≥1 was set as the threshold for significantly differential expression.

Sequencing and de novo assembly of sweetpotato transcriptome under salt stress conditions
Five cDNA libraries were prepared from the third true leaf of PFSP seedlings (Xuzi-8 cultivar) treated with 200 mM NaCl for 0, 1, 6, 12, 48 hours. These libraries were sequenced using Illumina high-throughput sequencing platform. After removing the lowquality reads and all possible contamination, a total of 170,344,392 clean reads with Q20 >96.73% and a GC percentage between 45.07% and 46.50% (Table 1). Each library was represented by over than 30 million high-quality reads, with number ranging from 32,830,183 to 35,663,873. Due to the lake of a reference genome, the clean reads resulted from the transcriptome sequences were aligned and assembled using Trinity software. After further clustering and assembly, a total of 21,497,466; 20,272,643; 21,954,725; 19,121,890 and 19,998,709 mapped reads were obtained with percentage 60.26%, 61.79%, 61.62%, 59.33% and 58.87% of total reads at different time points (0, 1, 6, 12, 48 hours), respectively. The quality of these assemblies and unigenes length distribution are shown in Table 1. Statistics on unigenes and transcripts length resulted from mixed second and third generations sequencing were performed using PacBio's officially recommended cogent software (Table1 and 2, Figure 1). In addition, the total number of CDS was 30,615 of which 23,245 CDS mapped to the protein database.

Functional annotation
To annotate the obtained unigenes, a BlastX search against the NR NCBI protein database with cut-off E-value of 10 -5 based on sequence similarity. In total, 15,461 unigenes were detected ( Table 3)

Expression patterns of PFSP unigenes in response to salt stress
The results in Figure 6 showed that the highest number of DEGs was induced at 48 hours  3.6. Detection of salt-induced genes related to salt tolerance stressed samples using the rigorous algorithm (FDR≤0.001, log2 FC-ratio ≥1). As shown in  respectively. Sequence showed homology with known sequences were classified as "hits" and sequence showed no homology with known sequences were classified as "no hits".

SSR and SNP identification
For further application of sweetpotato SSRs and SNPs were discovered using assembled transcriptomes. A total of 24,559 SSRs was identified in transcriptomes present in 15,976 sequences. Furthermore, the number of sequence containing more than one SSR was 5,762 and the number of SSRs present in compound formation was 2,215. In addition, the major types of the identified SSRs were mono-nucleotide (19,380), di-nucleotide (2,791), tri-nucleotide (2,199), tetra-nucleotide (123), penta-nucleotide (33) and hexa-nucleotide

Discussion
Sweetpotato is a hexaploid non-model crop with a complex genome lacking high quality reference genome [18]. To date the current understanding of the complex physiological and molecular mechanisms of salt tolerance in sweetpotato remains limited [19,20]. High throughput RNA sequencing is required for identification of candidate genes involved in salt stress tolerance. In addition, it will be helpful for better understanding of salts stress tolerance mechanisms in sweetpotato [5].
In the present study, deep sequencing analyses of PFSP under different durations of salt stress were characterized. Using second and third generation technology, Illumina sequencing generated 170,344,392 clean high-quality long reads that were assembled into 15,998 unigenes with an average length 2178 base pair and 96.55% of these unigenes were functionally annotated in the NR protein database. A number of 537 unigenes failed to hit any homologs which may be considered as novel genes.
The current results revealed that the obtained sequences highly matched with top seven species with percent ranging from 4.58 to 21.3% including Nicotiana sylvestris, Nicotiana tomentosiformis, Solanum tuberosum, Sesamum indicum, Coffea canephora, Solanum lycopersicum and Solanum pennelli. In addition, the online available sequence of sweetpotato genome gave only 3.07% matches coming in the 8 th rank with the current experiment obtained sequence. That's mean that 96.93% of our data are not present in the sweetpotato online data. Therefore, the current annotation results will improve sweetpotato genome annotation and facilitate the discovery of genetic resources that are responsible for salt stress in hexaploid sweetpotato.
The results in Figure 3 showed that the most dominant Go-terms which were identified during the different durations of salt stress included metabolic, cellular, single organism and response to stimulants. These results are in agree with [21] in diploid halophytic sweetpotato, [22] in Ipomoea imperati [23]. The present COG results indicated that there were 16.89% belongs to "general function prediction only" which mean that these genes are poorly characterized, in addition, 4.34% genes with unknown function. The highest number of genes was categorized as "post-translational modification, protein turnover, chaperones" in our study. Which mean that, sweetpotato plants under salt stress conditions, starts the key mechanisms of chemical modifications including regulating of enzymes activity, localization and interaction with other cellular molecules such as proteins, nucleic acid, lipids and co-factors [24]. Furthermore, protein turnover related genes control the balance between protein biosynthesis and degradation, which is very important for determining the resistant or sensitivity of the plant to salt stress. That's may be due to if the protein biosynthesis is more than breakdown indicates an anabolic state, in this case the plant tolerance to stress will be enriched [25][26][27][28]. Moreover, Protein chaperones' major function is to prevent or correct damage caused by miss-folding due to salt stress [29][30][31][32]. The second COG group of genes involved in "signal transduction mechanisms" which is mainly transforming the certain stimulus induced by salt stress into a biochemical signal which activates more genes specifically involved in salt stress tolerance [33].
In the present study, to identify significant gene expression changes under salt stress, differentially expressed unigenes were analyzed using the rigorous algorism equation (FDR≤0.001, Log2FC-ratio ≥ 1) depending on RPKM reads.
The current results indicated sweetpotato plants behavior during the first hour of salt stress was different than the other three time points. During the first hour of salt stress a number of four unigenes only were expressed with significant difference (unigene11357, unigene15171, unigene16391 and unigene7424). The first unigene (11357) belongs to SBP gene family which have important role in leaf development, vegetative phase change and may be have a role in stress response [34]. The second (unigene-15171) belongs to HSP70 which is essential regulator to maintain internal cell stability and prevent aggregation under physical or chemical pressure [35]. The third unigene (16391) is a plant invertase/ pectin methyl-esterase inhibitor. The invertase affects growth and development consisting with its activities such as stress response. In addition, pectin methyl-esterase determines the solidity of cell wall including root development and permeability. The fourth unigene (7424) belongs to uncharacterized protein family and unknown function [36]. According to the current results presenting of only four significantly expressed unigenes in leaf tissues during the first hour of salt stress may be due to salt stress signals slightly received or didn't received by leaves during the first hour [37] in Xuzi-8 sweetpotato cultivar.
At six hours of salt stress, there were 529 up-regulate unigenes giving 479 significantly expressed unigenes. A number of ten genes were the highest (four folds higher than control) among all significantly expressed genes including (unigene128, 16087, 16131, 2278, 4002, 55, 5655, 9231, and 9447). Furthermore, it was noted that these superior unigenes are three different functions including protein detoxification, energy production and conservation, and dehydrin. In addition, out of these 479 unigenes there were six unigenes involved in the defense mechanisms against salt stress including (Unigene128, 13856, 13920, 14004, 396, and 6144). Except for the unigene128 and 396 all the other unigenes belong to ABC-transporter G-family which is involved in growth, development and response to abiotic stress. In addition, the unigene396 belongs to pentatricopeptiderepeat proteins (PPR-family) which play important roles in organelle RNA metabolism, plant development and involved in abiotic stress response which have been previously reported in rice [38,39]. There were 5 genes specifically involved in salt stress response including (unigene243, 3644, 7343, 7826 and 9718). The unigene243 and 3644 belong to ABA insensitive-like protein which involved in ABA and salt stress responses and act as a positive component in glucose signal transduction [40]. The unigene7343 belongs to EIDlike F-box protein family which is a known drought response regulation. Unigene7826 were found to be belonged to WCOR 413-like protein which is responsible for cryoprotection of plasma membrane against dehydration stress which has been discovered earlier in sweetpotato genome, these findings are in agree with [41,42] in wheat. The unigene 9718 was detected as a putative salt stress responsive protein isoform-1 which has been discovered before in sweetpotato genome.
At twelve hours of salt stress a number of 341 unigenes were up-regulated with 281 significantly expressed unigenes. Out of the 281 genes there were three unigenes (unigene3153, 3480 and 3682) expressed with four folds more than control, all these four genes belonging to Ahpc-TSA family with intracellular signal transduction biological function which constitutes an enzymatic defense against salt stress [43]. There were 10 unigenes among the 281 unigenes, are specifically involved in defense mechanisms against salt stress (unigene10470, 11483, 12478, 12588, 12916, 13856, 14004, 2507, 4295 and 6144) which were included under ABC-transporter family [44]. In addition, there were 5 unigenes with a biological function as response to salt stress (unigene13368, 13566, 234, 3644 and 7343) which belongs to three different families including BEL1-like homeodomain protein-1 (BLH-1), ABA insensitive like protein (bZIP transcription factor) and EID-like F-box protein-3 [45].
At 48 hours of salt stress, there were 663 up-regulated unigenes with 508 significantly expresses. A number of 9 unigenes were the highest significantly expressed among the 508 unigenes. Out of these 9 unigenes, four unigenes (unigene16087, 16288, 2278 and 9017) are responsible for energy production and conservation; two unigenes (unigene2618, 2863) were responsible for signal transduction mechanism, one unigene (unigene657) as a reverse transcriptase, one unigene (unigene10431) as protein import into nucleus, and one (unigene7343) was involved in posttranslational modification, protein turnover, chaperones. In addition, there were 6 unigenes are specifically involved in defense mechanisms to salt stress including unigene10470, 12916, 2507, 4295, 5014 and 6144. All these six unigenes belong to ABC-transporter family which is involved in growth, development and stress response [44]. Moreover, a number of 5 unigenes  Note: Nt, total number of clean nucleotides; The GC percentage is the proportion of guanidine and cytosine nucleotides among total nucleotides; The Q20 and Q30 percentage is the proportion of nucleotides with a quality value >20 and 30, respectively; The N percentage is the proportion of unknown nucleotides in clean reads.     Clusters of orthologous groups (COG) classification in Xuzi-8 sweetpotato cultivar.
Genes from the same Orthologous have the same function, so that direct functional annotations to other members of the same KOG cluster.