Transcriptomic basis for salt tolerance and disease resistance of silverleaf sunflower revealed by Iso-seq and RNA-seq

Background Silverleaf sunflower, Helianthus argophyllus , is one of the most important wild species that have been usually used for the improvement of cultivated sunflower. Although a reference genome is now available for the cultivated species, H. annuus , its effect in helping understanding the mechanisms underlying the traits of H. argophyllus is limited by the substantial genomic variance between these two species. Results In this study, we generated a high-quality reference transcriptome of H. argophyllus using Iso-seq strategy. This assembly contains 50,153 unique genes covering more than 91% of the whole genes. Among them, we find 205 genes that are absent in the cultivated species and 475 fusion genes containing components of coding or non-coding sequences from the genome of H. annuus . It is interesting that in line with the strong disease resistance observed for H. argophyllus , these H. argophyllus -specific genes are predominantly related to functions of resistance. We have also profiled the gene expressions in leaf and root under normal or salt stressed conditions and, as a result, find distinct transcriptomic responses to salt stress in leaf and root. Particularly, genes involved in several critical processes including the synthesis and metabolism of glutamate and carbohydrate transport are reversely regulated in leaf and root. Conclusions Overall, this study provided insights into the genomic mechanisms underlying the disease resistance and salt tolerance of silverleaf sunflower and the transcriptome assembly and the genes identified in this study can serve as a complement data resources for future research and breeding programs of sunflowers.


Background
Soil salinity is one of the most important environmental stresses limiting the productivity of crops worldwide, particularly in arid and semi-arid areas. A large amount of dissolved salt in soil would decrease the ability of a plant to take up sufficient water and result in the accumulation of Na + and Cl - [1], which are toxic to plant growth by decreasing photosynthesis efficiency and impairing several crucial metabolic and signaling pathways [2]. Although the mechanisms of salt tolerance have been extensively studied in many model plants, such as Arabidopsis and rice (see review of [3], they are poorly understood in sunflower. The tolerance to salt in sunflower is achieved by preferentially uptaking Ca 2+ coupled with the exclusion of Na + , B 3+ , Mn 2+ and Mg 2+ [4].
The identification of genes underlying salt tolerance would be very important for the research and breeding programs of sunflower. However, only very limited genes involved into these processes have been identified. They include genes encoding membrane G-protein-coupled receptor, cyclic nucleotide and calmodulin-regulated ion channel protein and Ca-pump protein [5]. The understanding to such an important trait would be substantially accelerated after the release of the reference genome assembly of a cultivated sunflower, H. annuus [6]. However, like many other crop plants, the genetic variation has been strongly reduced in the cultivated sunflower, compared to its wild relatives [7], due to artificial selection and genetic bottlenecks during the history of domestication [8]. The reference genome can represent only a part of the gene resources. A pan-genome derived from 287 cultivated lines, 17 landraces and 189 wild accessions suggests that more than a quarter of the genes vary across genotypes and around 10% of the genes in cultivated sunflower were derived from wild species [9]. Chromosome structure also varies remarkably across species, only 64 and 83% of the genomes of H. niveus and H. argophyllus are syntenic with those of H. annuus [7], respectively. Many wild species have been widely used in the breeding program of the cultivated sunflower as donors of beneficial alleles. Therefore, a reference genome or transcriptome assembly of these wild germplasms is also in need for rapid identification of genes and trait improvement.
Among them, silverleaf sunflower (H. argophyllus),, a sister species of H. annuus, is one of the most important donors. For example, it has been used as a donor of alleles for disease resistance [10][11][12][13][14][15][16][17] and drought tolerance [18,19]. Overall, it is the largest donor of genes to the cultivated sunflower, with approximately 5% of the genes in the cultivated sunflower were introgressed from H. argophyllus. When compared to H. annuus, 10 inverted and 8 translocated segments were found in the genome of H. argophyllus. Furthermore, H. argophyllus also exhibited tolerance in saline soils [20] and can be used as a donor of salt tolerance alleles [21]. Although several genomic regions associated with salt tolerance had been identified in H. argophyllus, the genes underlying this desirable trait are poorly known.
In this study, we firstly evaluated the salt tolerance of H. argophyllus and compared it with that of H. annuus. To better understand the genomic basis for the salt tolerance, we have built a reference transcriptome of H. argophyllus consisting of 50,153 unique genes, 205 of which are absent in H. annuus, using a long-read technology, Iso-Seq. We also find 475 fusion genes composed of coding or noncoding sequences. It did not surprise us that in concert of the strong disease resistance observed for silverleaf sunflower, the H. argophyllus-specific and 'fusion' genes were majorly related to disease resistance. We further investigated its transcriptomic responses to salt stress and found 3,930 and 1,885 genes that were significantly regulated in root and leaf, respectively. The responses to salt stress in root and leaf are distinct and the former is more sensitive. Our work revealed that the disease resistance of silverleaf sunflower is facilitated by genetic innovation and the salt tolerance is more dependent on the sensitive regulation of genes.
Overall, we presented a high-quality reference transcriptome for H. argophyllus providing insights into the disease resistance and salt tolerance. This reference transcriptome can serve as a complement data resources for future research and breeding programs of sunflowers before the reference genome is available for H. Although the heights of both the lines decreased in a similar degree (foldchange = 0.75) (Figure 1a), significant differences were shown in many other physiological phenotypes ( Figure 1). We measured the water loss rate (WLR) of plants every 30 min after the treatment of salt and a lower WLR at every timepoints ( Figure 1b

Reference transcriptome assembly
To better understand the genomic mechanisms underlying the salt tolerance of H. argophyllus, we first construct a high-coverage reference transcriptome assembly.
We collected RNA from several tissues including leaf and root and pooled them together for Iso-seq, which finally generated 290,182 reads with an average length of 1,874 bp (Supplementary Table 1). The poly-A sequence was detected in 280,695 (96.73%) of the reads (Poly-A reads), among which 270,267 (96.28) were full-length with both the 5' and 3' primers (FL reads), 264,113 (94.09%) were full-length with nonchimerism of these ROIs (FLNC reads). We then evaluated the similarity between the H. argophyllus transcriptome and the reference genome of cultivated species [6] and found a very limited similarity between them. Even after a very relax filtering (Identity > = 30%, coverage > = 30%) and cleaning, there were only 72.16% of the reads mapped to the reference genome. This result is consistent with the reported variance between cultivated and wild sunflowers [7], and raised a caveat that the reference genome of H. annuus may not be the best choice for analyses in H. argophyllus despite they are closely related. Therefore, we assembled the reads into an assembly consisting of 50,153 unique genes, with a mean length of 1999.73 bp.
We assessed the completeness of this transcriptome by querying them against the database of Benchmarking Universal Single-Copy Orthologs (BUSCO, version 3.0) [27]. Our results indicate that 91.4 % of the core embryophyta genes (CEGs) were found in this assembly, with 52.5% being single copies and 38.9% being duplicates (Supplementary Table 2), comparable to the percentage reported in the reference genome [6], suggesting a high coverage of this assembly. Due to the existence of present and absent variations, the dataset of SNPs cannot present the whole difference between these two species. Therefore, we further investigated their differences in genes. By clustering their genes into families, we  Table 3). This explains the strong and wide disease resistance of H. argophyllus and the successful practice of using H.
argophyllus to enhance the disease resistance of cultivated sunflower [10][11][12][13][14][15][16][17]. It surprised us that there are also many genes (10.25%, 12 of 117 genes) specific to H. annuus involved into disease resistance (Supplementary Table 4). This suggests high diversity of resistance genes, which could be due to the activities of retrotransposons, such as long terminal repeat retrotransposons (LTR), which often results in the expansion of resistance genes in plants [29,30], after the divergence between H. argophyllus and H. annuus. Furthermore, in concert of the strong tolerance to salt as mentioned above, we also found many transporters in H. Table   3). To explore in what biological processes the H. argophyllus-specific genes were involved, we performed gene ontology (GO) enrichment analysis and found that these genes were involved in 11 biological processes and were enriched into three processes, 'mRNA processing', 'defense response' and 'transport' (Supplementary Table 5). The GO term 'transporter activity' was also overrepresented in this gene set, implying an outstanding tolerance to environmental stresses. These results together suggest that H. argophyllus has been armed with an extra gene reservoir enabling wide resistance or tolerance to various biotic (disease) and abiotic (drought, salt) stresses.

argophyllus-specific genes including Aquaporin-like protein (Supplementary
By comparing the sequences of H. argophyllus unique genes and the genes in H. annuus genome, we found 475 genes which are composed of two parts that are distant in the genome of H. annuus (Table 1). This kind of genes were termed as 'fusion' genes in many other studies [31,32]. Nevertheless, 'fusion' genes can also be 'split' genes depending on their origins and evolutionary trajectories. These genes ranged from 206 to 7,440 bp with a mean of 1,917.56 bp in length, comparable to that of the whole set of unique genes, suggesting they are not exactly fused genes. On the other hand, the length of their components found in the H. annuus genome varied from 156 to 9,444 bp, averaged at 1,318.91 bp, which was also comparable to that of other genes, suggesting they are not split genes either ( Figure 2). Hereafter in this paper, for a purpose of easy description, we refer these genes to 'fusion' genes following the terminology in other reported studies [33].
Among the 475 'fusion' genes, 171 were composed of sequences from two proteincoding genes, 55 were composed of protein-coding and non-coding genes, and 246 were from coding/non-coding genes and intergenic regions (Supplementary Table 6).
Those components of non-coding genes and intergenic sequences were probably resulted by the degradation of open reading frames or pseudogenization after the split of their parent genes.
We detailly inspected the functions of these 'fusion' genes and found that they are involved into much more diverse biological processes (Supplementary Table 6). The 'fusion' genes were categorized into 606 biological processes and enriched in 277 of them (Supplementary Table 7). The roles of 'fusion' genes are diversified. For example, we enriched GO terms include 'response to abiotic stimulus', 'pectin catabolic process', 'response to temperature stimulus', 'response to oxygencontaining compound', 'response to phosphate starvation' and 'cell wall modification', to name only a few (Supplementary Table 7). We particularly noticed that several processes, such as 'pectinesterase activity', 'response to abscisic acid' and 'cell wall modification' are strongly associated with the responses to abiotic stresses [34][35][36][37]. We first compared the gene expression levels before and after the treatment of salt using DESeq2 package [38], and genes with foldchange > = 1.5 and Q-value < = 0.01 were defined as significantly differentially expressed genes (DEGs). As a result, we found 1,885 DEGs in leaf (Figure 3; Supplementary Table 8 Table 9). The DEGs identified in leaf also included 36 'fusion' genes, 31 and 5 were up-and down-regulated, respectively (Supplementary Table   10). A hypergeographic test suggests that the H. argophyllus-specific genes tend to be down-(P = 0.02) but not up-regulated (P = 0.82) in a chance significantly larger than random occurrence; a opposite pattern was found for 'fusion' genes with a P value of 8.283374e-12 and 0.76 for the up-and down-regulated genes. These results suggest critical but different roles played by H. argophyllus-specific and 'fusion' genes in response to salt stress.
Like the reports in many other plants, genes in the pathways involved into salt tolerance were significantly regulated. For example, non-specific lipid-transfer protein (nsLTP) in lipid metabolic pathway was significantly up-regulated (Supplementary Table 8). The results of GO enrichment analysis also confirmed the conclusion that processes and activities related to resistance to abiotic and biotic resistance were over-represented in the up-regulated genes (Supplementary Table   11), such as response to osmotic stress, response to water and defense response to fungus and those related to 'cell wall organization' and 'cell wall biogenesis' were over-represented in the down-regulated genes (Supplementary Table 12). Genes involved in the removal of superoxide radicals, such as CPN20, had been significantly up-regulated under salt stressed condition (Supplementary Table 11), consistent with the increased activities of POD and SOD (Figure 1f and 1g Table 13). On the other hand, this pattern was not shown for 'fusion' genes. For 'fusion' genes, 10 and 5 genes were up-and down-regulated (Supplementary Table 14). It is interesting that the regulation of genes in leaf and root is distinct, with only a few genes that were commonly up-or down-regulated in both the tissues (Figure 3). We therefore categorized the DEGs into eight groups,  Table 15 and Supplementary Table 16 showed several interesting functions. For example, the genes up-regulated in leaf but down-regulated in root were enriched in GO terms related to the biosynthesis and metabolic of glutamate, which serves as a signal triggering long-distance defense [39] and mediating responses to abiotic stresses [40]; the genes upregulated in root but down-regulated in leaf showed functional enrichment to carbohydrate transport and DNA methylation, which may have played roles in the regulation of genes (Supplementary Figure 5).
We further compared the foldchanges of the commonly regulated genes and classified them into three groups (see methods), (1) higher in leaf, (2) higher in root and (3) similar. As a result, we found 1,521, 3,717, and 190 genes belonging to these groups, respectively, suggesting a varied degree of response to salt stress of these two tissues with a stronger regulation observed in root. These results together indicate a distinct pattern of transcriptomic responses to salt stress in leaf and root.

Discussion
The reference genomes of most, if not all, of the crops were generated from a cultivated species, which can represent only a part of the gene pools due to the genetic bottlenecks during the history of domestication [8]. A pan-genome analysis of sunflower also revealed remarkable variation across different genotypes [9]. As for H. argophyllus, there is 83% of the chromosomes showing collinearity with the reference genome, and the percentage is even lower for H. niveus (64%) [7]. In concert to the genetic variation, the phenotypes, such as seed dormancy, flowering time, oil composition and content, as well as the resistance to abiotic and biotic resistance, also vary substantially across genotypes[19, 41,42]. Therefore, to better understand the mechanisms and genomic resources underlying those desirable traits observed in the wild species, a reference genome, or at least a reference transcriptome, is needed for each valuable genotype. In this study, we generated a high-quality reference transcriptome for H. argophyllus, one of the most important donors of beneficial alleles to the cultivated sunflower [9], by using a long-read technique, Iso-seq, which is powerful in capturing full-length transcripts [43][44][45][46]. It is no doubt that a reference genome of H. argophyllus is in urgent need and some groups may have already been working on that. Before the genome sequence become available, the reference transcriptome provided in our work can serve as an alternate and be used for gene mining, and the SNPs identified from transcripts can potentially be used as markers in breeding program.

The divergence between H. annuus and H. argophyllus
The variance between wild species and the cultivated sunflower had been repeatedly described in many earlier works [7,9,21]. In this study, to compare the divergence between the silverleaf sunflower and the cultivated species, we sequenced the transcriptomes of H. argophyllus leaves and roots under normal or salt stressed conditions and identified substantial SNPs and indels. It should be noted that although RNA-seq had been proved to be efficient in identifying molecular markers associated with environment tolerance by comparing the transcriptomes between the sensitive and tolerant cultivars [47], the dataset of SNPs and indels obtained in this study can only represent a part of the whole variances between these two species due to the lack of a reference genome of H. argophyllus.
Structure variations between H. argophyllus and H. annuus had also been previously reported [7,21]. Studies based on molecular markers revealed at least 10 inversion  Table 6). The genomic divergence between these two species was also reflected by the presence and absence variation of genes. We found 205 and 117 H. argophyllus-and H. annuus-specific genes, respectively. The former was highly enriched in processes related to disease resistance, consistent with the fact that H. argophyllus had been used as a valuable gene resources in the improvement of disease resistance of the cultivated sunflower.

Distinct salt stress responses in leaf and root
Both leaf and root had shown strong response to salt stress. It did not surprise us that root is more sensitive to the stress with 3,930 genes were significantly regulated while only 1,885 genes were regulated in leaf (Supplementary Table 8).
We found that genes that have been regulated were distinct between these two tissues. We sorted the DEGs in leaf or root according to their fold changes. As a result, we found that the most strongly up-regulated genes in leaf are those related to lipid synthesis and transport and amino acid synthesis. For example, non-specific lipid-transfer protein (nsLTP) and methylsterol monooxygenase 1 (MSMO1) were frequently seen in the top ranked up-regulated DEGs (Supplementary Table 8). Lipid has been known to play critical roles in sensing salt stress [48] and mediating salt tolerance in various plants and algae [49]; nsLTP has been reported to be involved into several biological processes including the tolerance to salt in maize [50] and the null mutant of LTP in Arabidopsis had resulted in hypersensitivity to salt stress [51].
The most up-regulated genes in root are dehydrin-like proteins, followed by small heat shock proteins (HSP) and their cognates (Supplementary Table 8). The upregulation of genes encoding dehydrin-like protein is line of the conclusions drawn from the studies in other crops. For example, dehydrin-like protein was accumulated in cereals in response to cold or drought stresses [52]. HSPs are widely engaged in resistance or tolerance to various environmental stresses, such as cold, heat, drought and salt [53][54][55], and the over-expression of HSP can significantly enhance the tolerance to drought and salt stresses [55]. The mechanism and pathway of HSP mediating salt stress tolerance is conserved because the similar responses were also observed in bacteria, such as Clostridium botulinum [56]. It is reasonable to believe that sunflower applied the same pathway to cope with salt stress.
On the other hand, the most down-regulated genes in root are non-symbiotic hemoglobin genes (nsHb) (Supplementary Table 8), which is involved in responses to several abiotic stresses including salt stress in maize [57,58]. A study in Arabidopsis had shown that the expression level of nsHb is negatively correlated to the tolerance to salt. The over-expression of spinach nsHb in a transgenic Arabidopsis had resulted in lower tolerance to NaCl [59]. We speculate that the decreased expression of nsHb in H. argophyllus after the treatment of salt is beneficial for the plant to cope with the stress. Most of the genes top ranked in the down-regulated DEGs in leaf are hypothetical proteins without any annotated functions. It is interesting that besides these hypothetical proteins, the expression of many resistance genes dropped dramatically under salt stressed condition, which may result in a reduced resistance to diseases.
GO analysis provided a more general view of the processes that have been affected by salt stress. As expected, the processes of responses to abiotic stimulus including water, heat, inorganic substance as well oxidative stress in root were generally activated by the treatment of NaCl. Several other related processes, such as transport, lipid and proline metabolic processes, were also up-regulated under salt stressed condition. The processes up-regulated in leaf were more concentrated on some fundamental biological processes, such as translation, mitosis, cell wall organization and carbon fixation (Supplementary Table 11). In line with the increased activities of POD and SOD (Figure 1), several processes involved in positive regulation of oxidoreductase activity were also enhanced in leaf (Supplementary Table 11). These results are consistent with earlier observations in other crops. The elongation factor had a higher abundance in cotton under salt stressed conditions [24]. We noticed that several GO terms were commonly overrepresented by both the up-regulated DEGs in leaf and root. They include 'proline biosynthetic process' and 'lipid biosynthetic process' (Supplementary Table 15).
Proline is one of the most important osmolytes conferring salt tolerance and can also function as a signaling molecular modulating the expression of genes involved in the recovery of cells [60]. The impairment of proline biosynthesis pathway could lead to hypersensitivity to salt stress in Arabidopsis [61]. Furthermore, we also found decreased expression of proline dehydrogenase genes under salt stressed condition ( Figure 1). These results together explained the increased accumulation of proline in leaves after salt treatment.
The processes down-regulated in leaf are majorly involved in the transport of carbohydrate, such as hexose transport, glucose transport and monosaccharide transport. It does not look like a response to the salt stress, but more like a consequence caused by the impairment by salt stress. The processes of cell wall biogenesis, such as 'cell wall organization' and 'cell wall polysaccharide metabolic process', were also down-regulated (Supplementary Table 12). As a result, the metabolic processes of polysaccharide, such as sucrose, glucan, xyloglucan, in root were then subsequently affected (Supplementary Table 18; Supplementary Table   19). conclusions Overall, we presented a high-quality reference transcriptome of H. argophyllus, which enabled us to identify a set of genes that are absent in the cultivated sunflower. These genes are related to disease resistance and salt or drought tolerance, representing beneficial gene resources that can be utilized in breeding programs. The salt tolerance of H. argophyllus was also measured and its transcriptomic responses to salt stress were profiled. Distinct responses were found between leaf and root of sunflower. We believe that the data provided in this study can serve as a complement data resources for future research and breeding programs of sunflowers.

Plant materials and treatments
The silverleaf sunflower H. argophyllus (ARG1807) obtained from the USDA North Central Regional Plant Introduction Station (NCPIS) was used in this study. The seeds were sterilized and germinated before the seedlings were transferred to pots filled with garden soil when they were grown to two-leaf stage. Salt treatment was performed to plants at four-leaf stage by supplementing with NaCl solution (25 mM).
After a salt treatment of 7 days, the leaves in the middle of the plants under salt stressed or normal conditions were taken and freezed immediately. In physiological assay, a cultivated cultivar (sk02R) maintained in our lab was also planted and treated. Three biological replicates were performed for each treatment.

Physiological assay
To compare the salt tolerance between silverleaf (ARG1807) and the cultivated sunflower (sk02R), we examined the height of plants, the content of chlorophyll, Na + , soluble proline, MDA, the activities of POD, SOD as well as stomatal conductance (SC) and the rate of photosynthesis and water loss. At the same time of sampling (7 days after salt treatment), the height of the plants of both the cultivars were measured and the others were measured using the frozen leaves. The activities of POD and SOD were measured as previously described [24,62]. To measure the content of Na+, the leaves were incinerated and the ash was dissolved in 0.5 M HCl solution before the concentration of Na+ was determined as described in [63]. The content of MDA and proline was also measured according to previously described protocols [22,64].

RNA extraction and sequencing
Total RNA was extracted using TRIzol reagent following the manufacturer's instruction (Thermo Fisher Scientific, MA, USA) and quantified using a NanoDrop 2000 spectrophotomer (Thermo Fisher Scientific, MA, USA). Two strategies, Iso-seq and RNA-seq, were applied in this study, the former was used to generate a reference transcriptome and the latter was used to quantify the gene expression levels under different conditions. For Iso-seq, total RNA of leaf and root with or without the treatment of salt was pooled together and reversed to cDNA using Clontech SMARTer PCR cDNA synthesis kit before they were circularized using SMRTbell Template Prep kit and sequenced on a Sequel platform (PacBio, CA, USA).
Second-generation sequencing was also performed to RNA extracted from leaf or root under the normal and salt stressed conditions. The libraries were constructed using a standard protocol of Illumina as described in many other studies[29, 65] and sequenced on a HiSeq platform (Illumina, CA, USA).

Data processing
The full length reads of insert (ROIs) were firstly clustered using an algorithm of iterative clustering for error correction (ICE) to generate consensus isoforms, which were then mapped by the non-full-length and chimeric ROIs for further correction.
Finally, consensus isoforms with accuracy higher than 99% were retained for further analyses. To quantify the gene expression levels of leaves or roots under normal or salt stressed conditions, the reads of RNA-seq were mapped to the reference transcriptome generated from Iso-seq ROIs as described above using bowtie2 [66] with default parameters. The number of reads hit to each transcript was counted and subjected to the identification of DEGs using DESeq2 [38]. Genes with foldchange higher than 1.5 or 0.66 and adjusted P-values lower than 0.01 were defined as significantly up-and down-regulated genes. To compare the degree of regulation of DEGs, the foldchanges of genes were transferred to their logarithms of 2 and the subtracts between leaf and root were calculated. The genes with P-values higher than 0.01 were manually changed to 1 before they were transferred, which would result a value of 0 indicating no change. The DEGs with a subtract (leafroot) higher than 2, or lower than -2, were classified into a group of 'higher in leaf', and 'higher in root', respectively, and the remaining genes were in the group of 'similar'. To detect the variance between H. argophyllus and the cultivated sunflower, H. annuus, the reads of RNA-seq were aligned to the reference genome of H. annuus [6] and SNPs were called using Varscan [67] with criteria of Q > 20, depth > 8, P-value < 0.01.

Annotation of transcriptome
The functions of unique genes in this reference transcriptome were annotated by homologous searching in public databases including SwissProt [68], TrEMBL [68], KEGG [69], InterPro [70] and Non-redundant protein NCBI databases [71]. The sequences of unique genes were queried against these databases using BLAST [71] and the best hit item for each unique gene was retained. Coding sequence of H. annuus andunique genes of H. argophyllus were performed the all-against-all comparisons using Blastn (version 2.2.31) [71], with an e-value cutoff of ≤1e−5.
OrthoMCL [72] with default parameters were used to identify gene families. The unique genes of H. argophyllus in H. argophyllus-specific gene families were selected and defined as H. argophyllus-specific genes.

GO analysis
The GO entries of unique genes were extracted from the output of InterPro searching results and the terms were obtained from the GO database according to the entries. GO enrichment analysis was performed for different gene sets using a hypergeometric test implemented in an R package, 'phyper'.   Length comparison between H. argophyllus transcriptome, the coding sequences of H. annuu