Integrated analysis of full-length transcriptome and secondary metabolism reveals novel regulation in Paulownia fortunei infected with phytoplasma

Background: Paulownia witches' broom (PaWB) is a devastating disease caused by phytoplasma, which can lead to considerable economic losses. Previous studies have shown that 60 mg•L-1 methyl methane sulfonate (MMS) can restore infected Paulownia seedlings to healthy one. To improve the understanding of PaWB, we used single molecule real time (SMRT) sequencing, which can provide long sequences that can be effectively used to identify gene alternative splicing, obtain novel genes, improve gene annotations and comprehensive analysis of the metabolome. Results: Here, the first integrated analysis SMRT sequencing, the Illumina platforms, and high-performance liquid chromatography-mass spectrometry (HPLC-MS) to analyze changes in transcripts and metabolites in Paulownia fortunei. A total of 140,528 non redundant transcripts were identified, which supported 6,209 of novel gene loci that can enrich 70,223 annotation information of Paulownia fortunei P. fortuneigenome. A total of 5,606 transcripts, 46 lncRNAs, 83 fusion transcripts, 597 transcription factors, and 57 metabolites were related indirectly or directly to PaWB. Subsequent analysis of the transcriptome and metabolome found trans-cinnamate 4-monooxygenase, caffeoyl-CoA-O methyltransferase, ferulic acid and peroxidase were up-regulated in Paulownia fortunei infected with phytoplasma. They were involved in flavonoid, phenylpropane and salicylic acid metabolism, which might be related to Paulownia-phytoplasma interaction. Conclusions: Our results will enrich the understanding of molecular mechanisms of PaWB. This study provides a foundation for better understanding the changes in gene expression, metabolites, and morphology of Paulownia fortunei with PaWB. To further elucidate the mechanism of PaWB in P. fortunei , we performed SMRT sequencing, sequencing on an Illumina platform, and high-performance liquid chromatography-mass spectrometry (HPLC-MS) to analyze changes in the transcripts and metabolites of healthy seedlings (PF), phytoplasma-infected seedlings (PFI), healthy seedlings treated with 60 mg•L -1 methyl methane sulfonate (MMS) (PF-60), and phytoplasma-infected seedlings treated with 60 mg mg•L -1 MMS (PFI-60). The first integrated analysis of the full-length transcriptome and secondary metabolism revealed transcripts involved in PaWB. This study will help in better understanding the changes in genes, metabolites, and morphology in phytoplasma-infected P. fortunei , and the results may be applied to other plants.

transcriptome and metabolome found trans-cinnamate 4-monooxygenase, caffeoyl-CoA-O methyltransferase, ferulic acid and peroxidase were up-regulated in Paulownia fortunei infected with phytoplasma. They were involved in flavonoid, phenylpropane and salicylic acid metabolism, which might be related to Paulownia-phytoplasma interaction. Conclusions: Our results will enrich the understanding of molecular mechanisms of PaWB. This study provides a foundation for better understanding the changes in gene expression, metabolites, and morphology of Paulownia fortunei with PaWB.

Background
Paulownia trees have been planted worldwide and play important roles in safeguarding food security (relay intercropping), the production of craft products (musical instruments and paper making), and in improving the ecological environment [ 1,2]. However, Paulownia is easily infected with phytoplasma that causes Paulownia witches' broom (PaWB), leading to slow growing of big trees and death of small trees, which can lead to 3 considerable economic losses [ 3].
PaWB has been studied for decades, with focus on morphological, physiological, and molecular changes [ 4]. It has been revealed that, in phytoplasma-infected seedlings, the leaves atrophy shrinking, axillary bud proliferation, shortened internodes, and phyllody, compared with their content in uninfected seedlings [ 5,6]. From the physiological aspects, we obtained auxin, and amino acids decrease [ 7,8], whereas cystine catalase, auxin oxidase, and vitamin C oxidase increase [ 8,9]. In addition, we identified genes, proteins, and metabolites that responded to phytoplasma infection [ [10][11][12][13][14][15][16]. However, short-read sequence needs to be assembled into full-length transcripts, which lose a lot of alternative splicing (AS). Despite all these studies, a full understanding of PaWB is still lacking. To improve the understanding of PaWB, we used SMRT sequencing, which can provide long sequences of up to 20 kb that can be used effectively to identify gene alternative splicing, obtain novel genes, and improve gene annotations.
Studying metabolites can help to reveal changes in their complex interactions in response to biotic and abiotic stresses and establish direct contact with biological events [ 17,18]. Comprehensive data from SMRT sequencing have been applied to study the functions of transcripts in regulating metabolism and environmental responses in plants, including Camellia sinensis (tea) [ 19], and Archilochus colubris (hummingbird) [ 20]. Hence, a comprehensive analysis of the metabolome and transcriptome is a useful 4 approach for identifying transcripts involved in PaWB.
To further elucidate the mechanism of PaWB in P. fortunei, we performed SMRT sequencing, sequencing on an Illumina platform, and high-performance liquid chromatography-mass spectrometry (HPLC-MS) to analyze changes in the transcripts and metabolites of healthy seedlings (PF), phytoplasma-infected seedlings (PFI), healthy seedlings treated with 60 mg•L -1 methyl methane sulfonate (MMS) (PF-60), and phytoplasma-infected seedlings treated with 60 mg mg•L -1 MMS (PFI-60). The first integrated analysis of the full-length transcriptome and secondary metabolism revealed transcripts involved in PaWB. This study will help in better understanding the changes in genes, metabolites, and morphology in phytoplasma-infected P. fortunei, and the results may be applied to other plants.
To further validate the accuracy of the full-length transcriptome, six transcripts (selected from Table   S1) of PaWB were selected randomly for quantitative real-time polymerase chain reaction (qRT-PCR) analysis. The expression patterns of qRT-PCR showed similar trends to the expression patterns 5 obtained by analyzing the SMRT sequencing data ( Figure. 1d). QRT-PCR results can support the correctness of SMRT sequencing.

Alternative splicing analysis
By aligning the nonredundant isoforms to the reference Paulownia genome (http://paulownia.genomics.cn), we found more exons from PacBio Iso-Seq than from the reference database ( Figure. 2b). A total of 28,709 (PF), 38,309 (PFI), 35,713 (PF-60) and 37,797 (PFI-60) isoforms were identified, which were classified into three groups (A, B, and C, respectively) by performing alternative splices. Group A contained 13,082 known isoforms (detected and mapped to the gene set); group B contained 70,223 new isoforms (newly annotated genes); and group C contained 108,962 new isoforms from 3,386 likely novels genetic, which do not find any annotated gene ( Figure. 2c). These novel gene loci improved 70,223 novel transcript annotations, among which five main AS types were identified. The numbers of AS events in PFI, PF-60, and PFI-60 were more than in PF. The most frequent AS event in the four samples was intron retention (IR) ( Figure.  Three generations of full-length transcriptome data compared with two generations of the transcriptome, more alternatively spliced transcripts were found, such as growth-regulating factor 2like (Paulownia_LG9G000183.1, 3 isoforms) and 14-3-3 protein 1-like (Paulownia_LG5G000267.1, 2 isoforms). To further validate the accuracy of the detected isoforms, we randomly selected six genes, and verified the isoforms by real-time polymerase chain reaction (RT-PCR) (Table S2, Figure. 2d). The results showed that six transcripts each had two isoforms.

Fusion transcript analysis
Fusion transcript is chimeric RNA encoded, which stem from transcription read-through, genomic structural rearrangements, or trans-splicing RNA [ 24,25]. Gene fusion has a common feature by somatic chromosomal rearrangement in humans. However, gene fusion has a few reports in plants. In this study, we identified 93 fusion transcripts ( 81 in PF, 90 in PFI, 80 in PF-60 and 93 in PFI-60 (Table S4, Figure 4). The Venn diagram showed the highest number of fusion transcripts was found in all four samples, and 83 fusion transcripts were considered as tissues specific fusion transcripts in PFI ( Figure S1).

Genes analysis
To recognize expression changes in response to phytoplasma infection in P. fortunei, initially, the fragments per kilobase of transcript per Million fragments mapped (FPKMs) of the selected sample genes were standardized and normalized. Then, the standardized FPKM values were analyzed by kmeans clustering. We identified 30,437 genes, which were divided into 10 clusters (K1-K10, Figure. 5a). In K2 and K6, the genes were candidate PaWB-related genes (Table S5). To obtain more biological information about the differentially expressed genes, we conducted a GO enrichment analysis ( Figure. 4b). In K2, we found that genes involved in the membrane were enriched in PFI. In K6, we found that gens involved in the membrane and cellular processes were enriched in PFI. Such as, homeobox gene WUSCHEL (WUS) can regulate axillary bud formation or branching [ 26,27]. In this study, we found that Paulownia_LG5G000525 encoding protein WUS-like (PfWUS) were up-expressed in the PFI. This report is in line with the results in soybean [ 26], which may regulate witches' broom to promote axillary meristem and initiation the cytokinin (CK) signaling pathway [ 28].

Transcription factor analysis
Transcription factors (TFs) are proteins that bind specific nucleotide sequences on the upstream of the gene, which regulates RNA polymerase binding to DNA templates, thus regulating gene transcription. Previous studies demonstrate that many TFs related directly or indirectly to PaWB have been identified (Li et al., 2018). Hence, we analyzed the expression patterns of TFs in PF, PFI, PF-60, and PFI-60. A total of 12,162 TFs transcripts were obtained in different gradient ( Figure 6a). Subsequent, weighted correlation network analysis (WGCNA) was used to identify correlation networks of the TFs. Among them, yoyalblue module of highly expressed were identified in PFI ( Figure   S2). We identified 597 (Table S6) PaWB-associated transcription factors. Based on the GO enrichment analysis, we found that transcripts involved in cellular process, catalytic activity, binding, and metabolic process were enriched may be important in PaWB ( Figure S3).

Metabolites analysis
A total of 645 metabolites were detected, which mainly include plant hormones, sugars, and flavonoids (Table S7, Figure.  were selected as candidate PaWB-related metabolites, these metabolites were mainly related to flavonoid, phytohormone, and phenols (Table S8). When plants infected with phytoplasma, plant active oxygen to activate the plant's defense system, while too much reactive oxygen species can cause damage to the plant cells and the structure of genes. Flavonoids, as a kind of antioxidants, play an important role in this process. At the same time, the contents of IAA and its chelates were significantly different in phytoplasma-infected seedlings. This is consistent with our previous research results[ 29], speculating that the PaWB may be related to the change of auxin content. These results showed that phytoplasma infection activates the plant′s defensive response and disrupts plant metabolism, further inducing an increase in antioxidant content and imbalance in plant hormones.

Cross-talk analysis between full-length transcript and metabolome
To explore the impact of phytoplasma infection in P. fortunei, we conducted a correlation analysis of the transcriptome and metabolome. In particular, ferulic acid (PT0417) correlated with peroxidase (PB.5202) in the phenylpropanoid biosynthesis pathway ( Figure. 8d). Peroxidase is the last key enzyme in lignin synthesis, which plays an important role in plant biotic and abiotic stresses, and can regulate cell lignification. Similarly, ferulic acid (PT0417) correlated with peroxidase (PB.5202) in the phenylpropanoid biosynthesis pathway. In addition, in the phenylpropanoid biosynthesis pathway, we found that the gene (Paulownia_LG12G000996.1) encoding trans-cinnamate 4-monooxygenase, a key enzyme in flavonoids biosynthesis, was up-regulated in PFI and that changes in its expression levels directly affected downstream compound synthesis, which is consistent with previous results [ 29]. Full-length transcriptome and secondary metabolism analysis revealed a novel regulation mechanism that will help enrich the understanding of Paulownia-phytoplasma interactions.

Discussion
Next-generation sequencing approaches are highly useful in quantifying gene expression [ 30] and have been used widely in transcriptome studies. However, short reads can lead to

SMRT sequencing improves paulownia annotation
To improve the understanding of increases the accuracy of transcript, we used SMRT sequencing, which can provide long sequences that can be used effectively to identify gene alternative splicing, obtain novel genes, and improve gene annotations. We sequenced the transcriptomes of twelve mRNA samples from PF, PFI, PF-60, and PFI-60 samples. A total of 1,274,765,160 clean reads were mapped to paulownia reference genome, which 1,121,793,341 reads (87.81%) mapped in paulownia reference genome (Table S4). We quantified 286,907 consensus isoforms from twelve samples. The data of biological repeats were clustered together by clustering analysis (Table S5) Cross-talk between ferulic acid and peroxidase associated with PaWB Plant responses to stress signals do not occur in isolation, rather different signaling pathways crosstalk to combat and tolerate the stress, and a signal can lead to alterations in genes and gene products. A transcriptome is considered to represent all genes in a cell at a certain time, whereas a metabolome represents the endpoint of gene expression, so differences in metabolites can be linked directly to biological events [ 17]. Integrative transcriptomic and metabolomic studies can help to identify key genes and metabolites associated with biotic and abiotic stresses and have been proved to be important in revealing metabolic regulation and identifying key candidate genes. In this study, several key genes and metabolites were identified, and the correlation between ferulic acid (PT0417) and peroxidase (PB.5202) was revealed in the phenylpropanoid biosynthesis pathway. Peroxidase catalyzes the last step in lignin biosynthesis, and its activity and isozymes are related closely to disease resistance of plants [ 36]. When plants are infected by pathogenic bacteria, they will make lots of thick cell walls such as lignin [ 37], which constitute an important structural barrier. Similarly, ferulic acid cross-linked mainly with cell wall lignin and polysaccharides to form part of the cell wall, which solidifies the plant wall in the defense against pathogen intrusion [ 38]. In addition, ferulic acid is a good antioxidant that has strong scavenging effects on hydrogen peroxide, superoxide radicals, hydroxyl radicals, and nitroso peroxide, to protect normal metabolism [ 39]. Ferulic acid, caffeoyl-CoA-O-methyltransferase and peroxidase were up-regulated in PFI. These were beneficial to lignin production, which phenylpropane metabolism was enhanced to resist phytoplasma-infection Paulownia.

Flavonoids responsive to phytoplasma infection in P. fortunei
Flavonoids play important roles in pathogen-infected plants because they have antioxidant properties that prevent the breakdown of lipids by peroxidase [ 37]. Flavonoid compounds through the phenylpropanoid metabolic pathway, which uses phenylalanine and tyrosine as raw materials [ 40,41]. These amino acids enter downstream branching pathways and the flavonoids are synthesized by phenylalanine ammonia-lyase, trans-cinnamate 4-monooxygenase, and cinnamate-4hydroxylase catalyzes. In our full-length transcriptome study, we found that the gene (Paulownia_LG12G000996.1) encoding trans-cinnamate 4-monooxygenase, a key enzyme in flavonoids biosynthesis, was up-regulated in PFI and its changes in its expression levels directly affected the downstream compound synthesis, which is consistent with previous result [ 29]. These was beneficial to flavonoid production, which phenylpropane metabolism was enhanced to resist phytoplasma-infection Paulownia.
Paulownia may produce a large amount of active oxygen and activate the defense system after phytoplasma infection [ 9]. Flavonoids and other substances are used mainly as antioxidants to eliminate reactive oxygen species produced by plants in response to biological stress [ 42]. In P. fortunei, the flavonoids may be used as antioxidants to eliminate the damage caused by excess reactive oxygen, which may play a role in balancing plant defense responses.
Pfwus responded to PaWB to regulate morphological variations When invaded by a pathogen, host plants initiate pathogen-associated molecular pattern-triggered immunity and effector-triggered immunity responses, and gene expression patterns of the host plant change [ 43]. Previous studies showed that squamosa promoter binding protein-like 9 SPL9 is the major target of miR156b, and SPL9 by physically interacting with the WUS may regulate axillary bud formation and branching [ 26]. Notably, the WUS can directly regulate axillary bud formation and branching [ 23,26], and the cytokinin (CK) signaling pathway [ 28]. WUSs have been reported in many model plants. Arabidopsis ( Figure. S4). In our study, we found Pfwus genes were up-regulated and twocomponent response regulator ARR-like (Paulownia_TIG00016041G000111.1) were down-regulated in the PFI. On the other hand, we results shown that zein O-glycosyltransferase-like (PB.21605.1) and gibberellin 2-oxidase-like (Paulownia_LG10G000034.1) in PFI were significantly higher than in PF, while auxin response factor, genes involved in auxin signal transduction, were down-regulated in PFI.
Upregulation of CK/IAA causes imbalance of hormones, which may lead to PaWB symptoms.

Conclusions
The first integrated analysis that included SMRT sequencing, sequencing on an Illumina platform, and HPLC-MS was performed to investigate changes in transcripts and metabolites in phytoplasmainfected P. fortunei. We obtained 5,606 transcripts, 46 lncRNAs, 83 fusion transcripts, 597 TFs, and 107 metabolites. Subsequent analysis of the transcriptome and metabolome found trans-cinnamate 4-monooxygenase, caffeoyl-CoA-O-methyltransferase, ferulic acid, and peroxidase were up-regulated in PFI. These were beneficial to flavonoid and lignin production, which phenylpropane metabolism was enhanced to resist phytoplasma-infection Paulownia. Full-length transcriptome and secondary metabolism analysis revealed a novel regulation mechanism, which will enrich the understanding of Paulownia-phytoplasma interactions. This study will help in better understanding the changes in genes, metabolites, and morphology in Paulownia with PaWB disease. Illumina Hiseq X-ten library construction and sequencing

Methods
The Illumina Hiseq X-ten library was constructed using TruSeq RNA kit (Invitrogen, Carlsbad, CA, USA) and sequenced using Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA) and ABI Step Subsequently, using EMBOSS software, Open Reading Frame (ORF) prediction was performed on the identified lncRNAs and the amino acid sequence length of more than 100 was filtered out to obtain the final lncRNAs prediction.

Identification of fusion transcripts
Fusion transcripts obtained from RS-IsoSeq analysis were predicted. The methods were referred to Chao et al [ 30]. In addition, fusion transcripts were validated by Illumina short read of the HiSeqX Ten platform.

Transcript annotation
All the transcripts were mapped to the reference genome, searched against the NCBI non-redundant protein sequence database (Nr) (http://www.ncbi.nlm.nih.gov/), the GO with Blast2 GO program, To distinguish different alternative splicing events, PacBio-seq isoform sequences were designed Isoform-specific primers (Table S3) The extraction method of metabolites of plant leaf were referred to Cao [ 29], the metabolite was extracted from the samples using LC-ESI-MS/MS (HPLC, Shim-pack UPLC SHIMADZU CBM20A system). Applied Biosystems 4000 Q TRAP system was performed, for data analysis. And the dates were processed with software Analyst 1.6.1 (AB SCIEX, CA, USA).

Screening differential metabolites related to PaWB
Principal Component Analysis (PCA) and Partial Least Squares-Discriminant Analysis (PLS-DA) were used to analyze the difference metabolites between different samples. PCA was used to determine the separation trend among sample groups. Different varieties or tissue metabolites were screened firstly by multivariate analysis with PLS-DA VIP parameter values of the model. Then numerical fold changes were used to further screening. The difference multiple is the ratio of the metabolite content in the treatment group divided by the metabolite content in the control group. The resulting difference metabolites should meet the following conditions: fold change is greater than or equal to 2 or less than 0.5, and the VIP is greater than or equal to 1.

Correlation analysis between metabolites and transcriptome
In this study, pearson correlation coefficients (> 0.8) were designed for metabolome and transcriptome data integration. A core program from R was used to determine the correlation between the number of enzymes corresponding to the enzyme in KEGG pathway and the screening criteria for the Correlation results.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.