Integrated Single-Molecule Long-Read Sequencing and Illumina sequencing reveals the resistance mechanism in response to barley yellow dwarf virus-GAV in Psathyrostachys huashanica

Background: Even though Psathyrostachys huashanica has great potential and value, there is not yet a reference genome available. To date, most of the studies of P. huashanica have focused on the creation of translocation lines and addition lines as well as the development of molecular markers. Therefore, there is a great lack of research at the transcriptional level. Results: The full-length transcriptome of P. huashanica was sequenced using PacBio isoform sequencing (Iso-Seq) of a pooled RNA sample derived from roots, stems, buds and leaves in equal amounts to explore potential full-length transcript isoforms. A total of 112,596 unique transcript isoforms were obtained, with a total length of 114,957,868 base pairs (bp). Illumina sequencing reads were used to correct and trim PacBio isoforms. In total, 103,875 unigenes were annotated with at least one functional database. In addition, a mass of DEGs involved in BYDV-GAV defence were identified; most of these resistance related genes showed gene ontology (GO), and KEGG enrichment analysis showed that these DEGs were mostly involved in the response to plant-pathogen interaction, plant hormone signal transduction and the MAPK signalling pathway. Then, twenty resistance-related upregulated genes, including MAPKs, CRPKs, CDPKs, PR proteins, WRKYs and disease resistance proteins were selected for RNA-seq data validation using quantitative real-time PCR, which showed consistencies between these two sets of data. Besides, the results also indicated that a series of defense-related genes were induced during BYDV-GAV infection at different stages. Conclusions: The full-length transcriptome dataset should contribute to improved P. huashanica stress resistance gene utilisation, and will serve as a reference database for the analysis of transcript expression in P. huashanica . In addition, we identified genes in our transcriptome dataset involved in the regulation of the BYDV-GAV infection response in P. huashanica. sequence; GO:Gene KOG:Cluster of Eukaryotic Orthologous Groups; KEGG:Kyoto Encyclopaedia of Genes and Genomes; DEGs:Differentially expressed CRPK:cysteine-rich receptor-like protein MAPK:mitogen-activated CDPK:calcium-dependent

and Agriculture Ministry 1999). It is distributed throughout central Asia, from Eastern Turkey to central China and Mongolia, and grows only on the residual soil of the mountainous braes and rocky slopes of Huashan Pass in the Qinling Mountains of Shaanxi Province, central China [3].
The development of alien addition lines is important for transferring useful genes from exotic species into common wheat [12]. In 1991, scientists began to create hybrid materials between common wheat and P. huashanica, which is regarded as an effective and economic method for introducing agronomically desirable characters into available wheat cultivars [6]. Chromosome engineering techniques for the targeted introgression of resistance genes from wild wheat relatives to hexaploid wheat have been widely used for many years [13,14]. By using embryo culture, backcrossing, and chromosome doubling, scientists crossed common wheat cv. 7182 and P. huashanica to generate heptaploid hybrid H8911 (2n = 49, AABBDDNs) [6]. In order to study its resistance to stripe rust, Giemsa C-banding, genomic in situ hybridization (GISH) and EST-SSR techniques were used to characterise a Triticum aestivum-P. huashanica Keng monosomic addition line PW11 with superior resistance to stripe rust. This subsequently proved that a small terminal segment of the 3Ns genome carried new stripe rust resistance genes that showed high levels of disease resistance [15]. In the same way, wheat x P. huashanica Keng 2Ns, 4Ns, 6Ns and 7Ns disomic addition lines were generated and showed enhanced tiller numbers and disease resistance [16][17][18][19]. In addition, a recent study showed that P. huashanica exhibited high resistance to barley yellow dwarf virus-GAV (BYDV-GAV) and disrupted the movement of the viral particles from inoculated leaves to new leaves [11].
Barley yellow dwarf viruses (BYDVs) are a class of single-stranded RNA viruses belonging to the genus Luteovirus, which are phloem-limited and obligately transmitted by aphids in a persistent or circulative manner [20]. Based on the specificity of their aphid vectors and serological properties of the virus particles, BYDVs have been classified into four distinct isolates, including BYDV-PAV, -GPV, -PAV, and -RMV, among of which, BYDV-GAV is the species most often found in China in recent years [21][22][23]. In addition, BYDVs can infect some cereal crops and grass species belonging to the family Gramineae (Poaceae), such as wheat (Triticum aestivum), barley (Hordeum vulgare), and oat (Avena sativa) [24,25]. In the field, infection of BYDVs results in a number of wheat cultivars showing symptoms of leaf yellowing and plant dwarfism, but severe cases can cause yield loss [26].
Transcriptomics is the study of gene structure, expression and regulation. Short-read secondgeneration sequencing (SGS) has become a powerful tool for quantitative gene expression levels and exploring gene transcriptional regulation [27]. However, SGS has its limitations, for example short read lengths, which makes it poorly suited for the assembly of complex genomes and transcriptomes as well as full-length isoforms and methylation detection [28,29]. Single-molecule real-time (SMRT) sequencing, a form of third-generation sequencing, was developed by Pacific BioSciences (PacBio) and offers an alternative approach to overcome these limitations; this enhances our understanding of the complexity of the transcriptome [28,29].
Full-length cDNA sequencing is fundamental to studies of structural and functional genomics, and is beneficial for genome annotation, the identification of novel genes and isoforms as well as the characterization of long non-coding RNAs (lncRNAs), especially those without a reference genome [30,31]. Besides, SMRT sequencing has been applied to the assembly of plant genomes based on its long reads, such as Triticum aestivum [32], loblolly pine [33], Zea mays [34] and rubber tree [35]. On the other hand, full-length cDNA sequencing has been used to characterise the complexity of transcriptomes combined with Illumina-based RNA-seq datasets in Sorghum bicolor [36], Zea mays [37], cotton [38], sugarcane [39], coffee bean [40] and sweet potato [41]. Moreover, SMRT sequencing has also been widely used to construct plant transcriptomes without a reference genome, for example Camellia sinensis [42], Astragalus membranaceus [43] and Halogeton glomeratus [44], but it had not previously been used in P. huashanica.
In this study, we constructed full-length cDNA libraries from P. huashanica and performed SMRT sequencing to generate large-scale full-length transcripts. This is the first study to perform SMRT sequencing of the full-length transcriptome of P. huashanica. Then, we used full-length transcripts as a reference to profile the differential gene expression in response to BYDV-GAV infection, and found plenty of defence-related genes that had been induced by the viral invasion. The results of our study provide novel insights into the resistance response to BYDV-GAV infection in P. huashanica and could potentially contribute to the utilization of resistance resources of P. huashanica in the future.

Results
Construction of a long-read reference transcriptome for P. huashanica In order to obtain desirable transcript isoforms, high-quality RNA was extracted from four tissues of P.  Table S1 and Figure S1). After quality control, 3,677,921,244 bp CCS (reads of insert, 1,688,359 reads) were obtained, including 1,140,528 in the 1-5 kb library and 547,831 in the 5-10 kb library (Table. 1). The ROI length distributions of each library are shown in the Additional File: Figure   S2). Of these reads of insert, 507,104 (~ 30.04%) were classified as full-length and non-chimeric, and 902,746 (~ 53.47%) were classified as non-full-length reads based on whether 5' and 3' ends and poly A tails were detected. Only reads with two ends and a poly A tail were classified as full-length (Additional file: Figure S3). Only full-length non-chimeric reads and non-full-length reads were used in further analysis, while short reads with a length of < 300 bp and chimeric reads were discarded from the subsequent analysis.

Functional annotation of coding genes
Full-length non-chimeric reads were clustered into consensus groups. For each cluster, if there was sufficient FL and non-FL coverage, then Quiver was run to polish the consensus. Only high QV consensus sequences were used for downstream analysis. High quality consensus isoforms of each library were merged into the final results, and redundancy was removed. After clustering and polishing these three libraries, we obtained 112,596 isoforms from 177,882 isoforms (Table 2), and the length distribution of final consensus isoforms is shown in (Fig. 1).To improve PacBio transcript quality, RNA-Seq reads were used to trim all final consensus isoforms, and mapping identity, substitution, deletion, insertion and mapping coverage were obtained (Additional file: Table S2). Then, we used BLAST, BLAST2GO and InterProScan5 to perform functional annotation terms from the nonredundant (NR), NCBI nucleotide sequence (NT), Gene Ontology (GO), Cluster of Eukaryotic Orthologous Groups (KOG), Kyoto Encyclopaedia of Genes and Genomes (KEGG), SwissProt, and Interpro protein databases with transcripts (Additional file: Table S3). As a result, 103,875 (92.25%) unigenes were annotated with at least one functional database (Table 3). Also, with KOG, GO and KEGG annotation, functional distribution was statistically assessed and the results are shown in (Fig. 2). In addition, we obtained 36,283 annotated transcripts among NR, KEGG, Swissprot and Interpro (Fig. 3). Table 1 Summary of reads of insert classification, the size fraction of F01 and G01 libraries were 1-5 kb, and the size fraction of H01 library was 4.5-10 kb  Illumina sequencing and assessment of sequencing quality To determine the appropriate time points for RNA-seq analysis in P. huashanica, time course virus titers were quantified using qRT-PCR at different time points after BYDV-GAV inoculation. As shown in ( Fig. 4), BYDV-GAV titer increased dramatically at 3 dpi and then peaked at 10 dpi and then declined gradually until 14 dpi when the virus was barely detectable. Then, mock-infected and BYDV-GAVinfected leaves (3 dpi, 7 dpi and 14 dpi) were selected to conduct RNA-seq analysis.
Since the genome sequence of P. huashanica was not available, we used our PacBio sequencing dataset as a reference to perform RNA-SEq. High quality RNA was extracted from the four samples; each had three repetitions to generate 12 libraries. The libraries were sequenced on an Illumina HiSeq2000 platform. More than 16 GB raw reads and 115 million clean reads were generated from each library using Illumina paired-end sequencing (Additional file: Table S4). Thereafter, clean reads were aligned to the PacBio dataset and a total of 76,313 transcripts were detected, including 16,827 lncRNA transcripts and 47,196 mRNA transcripts which were used for FPKM analysis. The assessment of transcript coding quality was determined by three types of prediction software, and the results are shown in (Fig. 5).

Identification of differentially expressed genes (DEGs)
Differentially expressed genes (DEGs) were obtained by comparison of BYDV-GAV-inoculated samples with mock-inoculated samples at each time point. DEGs were identified with the criteria of |fold change| ≥ 2 and q-value ≤ 0.001 at particular time points (Fig. 6). In this way, 11,282 genes were identified to be DEGs at 3 dpi, of which 5991 genes were upregulated and 5291 were downregulated.
Interestingly, there were 4689 genes downregulated, and there were more than 3864 upregulated genes at 7 dpi. At the later time points, the upregulated genes increased to 4729, which again exceeded the 4528 downregulated genes. In addition, we used volcano plots to visualize DEGs in a more intuitive way (Fig. 7). The results of the distribution of upregulated and downregulated genes were consistent with the accumulation of virus in P. huashanica. All statistically significant DEGs were further characterized by functional annotation and enrichment analysis.

Short time series expression miner (STEM) analysis
To explore the major expression trends of DEGs from the sequencing dataset over the four time points, we performed a clustering and visualization analysis using STEM software and found that all DEGs were assigned to 46 clusters that displayed similar expression patterns. We found that 15 clusters were statistically significant (P-value ≤ 0.05) (Fig. 8a) 7dpi, but showed downregulated expression levels in the following period. Those upregulated clusters were induced by the BYDV-GAV infection to resist viral replication and movement.

GO and KEGG enrichment analysis of DEGs
To investigate the biological function of viral-induced genes in P. huashanica, GO and KEGG pathway analysis were performed on DEGs at different time points. The GO enrichment results for DEGs between 3 dpi and mock-infected indicated that the cellular process and metabolic process were the most significantly represented groups in the biological process category, while cells and cell parts were the most common enrichment terms in the cellular component. Within the molecular functional category, catalytic activity and binding were the predominant enrichment groups (Fig. 9a). The above participate in fighting BYDV-GAV infection.

Validation of gene expression profiles using qRT-PCR
The expression trends of 20 representative genes were verified by qRT-PCR in a separate experiment.
The genes selected that were related to the defence response were divided into three categories. The first class included protein kinases, such as cysteine-rich receptor-like protein kinase (CRPK), mitogenactivated protein kinase (MAPK), calcium-dependent protein kinase related genes (CDPK) and wallassociated receptor kinase (WARK). The second class included genes involved in defence-related proteins, such as pathogenesis-related protein (PR), heat shock protein 70 (HSP70) and WRKY transcription factor (WRKY). The last class selected included resistance genes, such as those coding for disease resistance proteins. The results of all 20 genes selected using RT-qPCR were consistent with the results obtained from RNA-Seq analysis (Fig. 11).

Discussion
Although many studies related to P. huashanica have been performed, and numerous important germplasm resources and resistant materials have been obtained, there was still no reference genome available for of P. huashanica, with little progresses achieved in functional genomics.
Therefore, it was necessary to urgently produce a reference genome or transcripts of P. huashanica for further study.
Iso-Seq sequencing was developed and applied to the characterisation of transcriptomes of different species including plants and animals. It has a great many advantages, for example, creating long reads, the lack of a need for assembly and so forth, making it especially suitable for non-model organisms that lack genomic sequences [30,31].
The long-standing and widespread use of P. huashanica in current resistance gene mining and molecular genetic breeding, along with its significant value as a germplasm resource, has led to intense interest in uncovering the mechanism of resistance to numerous pathogens [5,11]. Because of the lack of a reference genome and reference transcripts, much of this interest has focused on the creation of translocation and addition lines. To tackle this problem, we applied long-read SMRT sequencing of four distinct tissues (i.e. the root, stem, leaf and bud), from which we were able to generate a meritorious complete transcriptome. Finally, we generated an initial collection of 112,596 unique P. huashanica transcript isoforms in our experiment. This transcriptome dataset offers a broad range of possibilities to study the multifaceted characteristics of P. huashanica.
Using the PacBio full-length isoform sequencing to obtain transcripts without assembly overcomes the difficulty of short-reads posed by next generation sequencing [45]. Even though PacBio offers longer reads than other current platforms, it has a higher error rate [46]. The short-reads from the Illumina platform have been widely used for RNA-Seq differential gene expression analysis, since it provides sufficient depth and a lower error rate compared to reads generated from PacBio [27]. Therefore, we used clean reads created from short-read next-generation sequencing (NGS) technology to correct the PacBio transcript isoforms [47].
P. huashanica serves as a resistant germplasm resource library contains a plethora of resistance resources; it not only confers resistance to biotic stress, but also confers resistance to abiotic stress.
By sequencing, we found that P. huashanica contains a huge transcription factor and resistance gene library. In previous research, using wild relative species to create high-quality and durable resistant materials has been proved an extraordinary way to recover genetic diversity of wheat germplasm [48][49][50]. Starting twenty years ago, research groups introgressed BYDV (Barley Yellow Dwarf Virus) resistance genes, which were exploited in more than ten wild relative species belonging to Thinopyrum, Agropyron, Elymus, Leymus, Roegneria and Psathyrostachy genera into wheat [51].
In this study, Illumina sequencing was performed to profile DEGs between BYDV-GAV-infected and mock-infected P. huashanica at different time points. A total of 11,282, 8,553 and 9,257 genes were identified as DEGs at 3 dpi, 7 dpi and 14 dpi, respectively, most of which were enriched in plant-pathogen interaction, plant hormone signal transduction and MAPK signalling pathways. In the long-term interaction between plants and pathogens in nature, plants have evolved a complicated immune system to recognize pathogens and activate defence responses.
Pattern recognition receptors (PRRs) localized on the plasma membrane belong to the receptor-like kinase (RLK) or receptor-like protein (RLP) family, which combine conserved molecular features derived from pathogens to trigger activation of PAMP-triggered immunity (PTI) [52,53]. In addition to PTI, plants also recognize phytopathogen-synthesized effector proteins known as resistance (R) proteins to promote their infection of the host plants [54,55]. In our experiment, we found a number of MAPKs (mitogen-activated protein kinases), CRPKs (cysteine-rich receptor-like protein kinases), CDPKs (calcium-dependent protein kinases), WAKs (wall-associated kinase-like proteins) and serine/threonine-protein kinases were significantly upregulated during viral infection, which might be associated with the innate immunity of the plant [56][57][58][59]. Additionally, plant proteins belonging to the nucleotide-binding site-leucine-rich repeat (NBS-LRR) family, one of the largest gene families known in plants, were clearly correlated with resistance responses and pathogen detection, especially the effector molecules of pathogens responsible for virulence [60][61][62]. Our sequencing results identified many NLR resistance proteins that were induced to high expression to defend against viral invasion at different time points. The results were suggestive of a defence response involving both ETI and PTI.
Transcription factors (TFs) play a central role in the response to multifaceted biotic stresses, such as insect attack and pathogen infection [63]. In order to respond to such stresses, many TF families have been reported to be differentially expressed in numerous plants as a positive response to bacterial, fungal and viral infections [64].
In summary, the full-length transcript sequencing of P. huashanica, which is one of the wild relatives of wheat, provided an immense opportunity to identify high-quality germplasm resources for crop breeding. In addition, a diversity of resistance genes induced by BYDV-GAV was obtained with Illumina sequencing, which could contribute to the antiviral breeding in wild relative of wheat. involved in viral resistance will extend our understanding of the complex molecular mechanisms of the interactions between BYDV-GAV and P. huashanica.

Conclusion
In summary, the total set of transcripts in the study could be used to improve P. huashanica stress resistance gene utilization and provide a foundation for future studies on molecular breeding for resistance to wheat disease.

Materials And Methods
Sample preparation and RNA extraction P. huashanica was provided by Professor Jinxue Jing from North West Agriculture and Forestry University, China.
To obtain a good representation of the P. huashanica transcriptome, all four tissues, including roots, stems, buds and leaves, including inoculated and uninoculated leaves, were harvested from different development stages of five independent plants. Total RNA was extracted from each sample using the protocol described in [70], employing a Trizol kit (Invitrogen), followed by a Qiagen RNeasy Plant minikit (Qiagen) and subsequently pooled in equal amounts.
For illumina RNA-Seq, P. huashanica were inoculated with viruliferous aphids carrying BYDV-GAV, and control plants (mock-infected) were inoculated with non-viruliferous aphids, 5 aphids per leaf. The viruliferous and nonviruliferous aphids were raised on susceptible wheat line 7182. For the experimental groups, at the three-to-four leaf stage, the samples of RNA-Seq were collected at day 0 (mock-infected), day 3 (dpi), day 7 (dpi) and day 14 (dpi). Then, all these samples were immediately frozen in liquid nitrogen for more than ten minutes and stored at -80 °C for further use. The concentration of each RNA sample was measured using a QubitFluorometer (Life Technologies, USA), optical density was checked using the NanoDrop (Thermo Fisher Scientific, MA, USA), and RNA integrity was assessed with an Agilent 2100 bioanalyzer (Agilent, USA). Only those with an RIN value of more than eight were retained for further use.

Library preparation and PacBio sequencing
The preparation of cDNA was strictly according to the PacBio Iso-seq protocol. Total RNA was used to prepare the first-and second-strand cDNA using the SMARTer PCR cDNA Synthesis Kit (Clontech) and Phusion High-Fidelity DNA Polymerase (NEB). Then, PCR amplified fragments of dsDNA were used to construct and sequence SMRTbell libraries. The cDNA fractions were generated with the BluePippin size selection system (Sage Science). We constructed two libraries, 1-5 kb and 4.5-10 kb, of which the former fragment library ran two cells and the latter ran only one. The size selection, PCR amplification and sequencing of the three SMRT cells were conducted on the PacBio Sequel platform by BGI (BGI-Shenzhen, China).

Data processing and read correction
Raw data from libraries produced by the Pacific Biosciences Sequel were processed following the SMRT analysis pipeline. Consistent sequences were obtained after removing redundant subreads, which were used to identify insert fragments; the sequences achieved in this step were called reads of insert (ROI). The ROI sequences were divided into four categories: full-length non-chimeric, chimeric, non-full-length and short reads. Only those sequences that had 5' adapters, 3' adapters and a poly-A tails could be defined as full-length reads [71].
Therefore, full-length non-chimeric sequences and non-full-length sequences were retained for further analysis.
These classified ROI were used to predict de novo consensus isoforms by performing an Iterative Clustering for Error Correction (ICE) algorithm. Then, predicted consensus isoforms were polished using the Quiver algorithm to finally obtain the full-length polished consensus sequences. Depending on the Quiver output (QV), which indicated how confident the consensus calls were, the algorithm classified the polished isoforms into high QV (expected accuracy ≥ 99%) or low QV isoforms; only high QV isoforms were used for further analysis.
Because there was no available reference genome for P. huashanica, we used CD-HIT-EST version 4.6.5 (-c 0.98 - were cleaned by removing adapter sequences, reads with over 5% unknown reads, and reads with low quality scores (Q ≤ 15).
Subsequently, the paired-end (PE) reads generated from Illumina Hiseq X-ten platform was used to correct PacBio isoforms. Then, clean reads from next-generation sequencing were delivered to proofread version 2.14.0 with the default parameters to correct single-bases and trim the regions of chimeras and low quality [73].

Identification of coding RNAs
We used three analysis tools: CPC [74], txCDsPredict [75] and CNCI [76], as well as a protein database, pfam [77], to distinguish between coding and non-coding sequences based on their coding potentials. Three analysis tools marked the code capacity of isoforms and set a threshold value to distinguish lncRNAs from mRNAs.
Isoforms that could align to the Pfam database were identified as mRNA; others were considered lncRNAs. We confirmed that the transcript was mRNA or lncRNA on the basis of at least three of the four judgment results being consistent, long non-coding RNAs having a nucleotide sequence of at least 300 bp and an ORF of no more than 100 amino acids.

Functional annotation
Final full-length isoforms were searched against the NCBI non-redundant (NR), NCBI nucleotide sequence (NT), Twenty genes that were found to be differentially expressed were selected based on the RNA-seq results (i.e. >2fold change, p < 0.05) and analysed using real-time quantitative PCR. The gene-specific primer pairs for each gene were designed using Primer3web version 4.0.0 (http://primer3plus.com/primer3web/primer3web_input.htm).
P. huashanica housekeeping gene 18S rRNA was used as the control to normalize all data (for a list of these genes primer sequences, see Additional file 3: Table S6). The data of the relative expression level of DEGs were normalized using a 2 −△△Ct method [87]. All qRT-PCR reactions were repeated in three biological and three technical replications, and the means and corresponding standard errors were calculated.        Figure 1 Length distribution of final consensus isoforms from merged libraries.   The titer of BYDV-GAV infected in P. huashanica at mock, 3 dpi, 7 dpi and 14 dpi.  Statistic of DEGs of BYDV-GAV infected in P. huashanica at different time points.

Figure 7
Volcano plots of DEGs of BYDV-GAV infected in P. huashanica at 3dpi, 7dpi and 14dpi compare to mock, respectively.

Figure 8
Cluster and STEM analyses of DEGs and all detective genes. A. Forty clusters were obtained using STEM Figure 9 GO pathway enrichment of DEGs of BYDV-GAV infected in P. huashanica at 3 dpi, 7 dpi and 14 dpi compare to mock, respectively.

Figure 10
KEGG pathway enrichment of DEGs of BYDV-GAV infected in P. huashanica at 3 dpi, 7 dpi and 14 dpi compare to mock, respectively.

Figure 11
Validation of gene expression using qRT-PCR at mock, 3 dpi, 7 dpi and 14 dpi.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.   Table S6.xlsx