Gene co-expression analysis and tissue-specific gene identification in Liriodendron chinense via hybrid sequencing


 Background

Liriodendron chinense (Hemsl.) Sarg. is an economically and ecologically important deciduous tree species that has been studied for many years. Although the complete L. chinense genome has been sequenced, the gene co-expression modules and tissue-specific genes of L. chinense remain unknown.

Results

Here, we used the bracts, petals, sepals, stamens, pistils, leaves, and the shoot apex of L. chinense as materials and analysed their gene co-expression modules and tissue-specific genes via hybrid sequencing. We identified 3,032 DEGs between the floral and vegetative tissues and 2,126 tissue-specific genes. By using WGCNA analysis, we identified 13 gene co-expression modules, and KEGG pathway enrichment analysis revealed that tissue-specific genes and genes from different modules were enriched in different pathways. Genes associated with plant defence were highly expressed in the bracts, genes participating in plant hormone signal transduction were highly expressed in the shoot apex, and genes participating in photosynthesis were highly expressed in the leaves, petals and sepals. Moreover, we identified 10 MIKC-type MADS-box genes that were classified as member of the AP3/PI, SVP, SEP, AG/SHP/STK, AGL12, SOC1 and TM8 subfamily. Phylogenetic analysis showed that the expression profiles of these ten genes were consistent with those reported in Arabidopsis and Populus , indicating that these genes are highly conserved evolutionarily and related to floral and vegetative tissue development. The small number of MIKC-type MADS-box genes in L. chinense was probably owing to its incomplete genome annotation.

Conclusions

In this work, we provided a reference transcriptome for L. chinense research by using hybrid sequencing. We identified 2,126 tissue-specific genes and 3,032 DEGs that contributed greatly to the functional differences between vegetative organs and floral organs. By using WGCNA analysis, 13 gene co-expression modules and 52 hub genes from six co-expression modules of interest were identified. Moreover, we identified 10 MIKC-type MADS-box genes that might be related to the development and growth regulation of floral and vegetative organs. These findings will improve our understanding of gene co-expression, tissue specific genes and flower development model of L. chinense .

To the best of our knowledge, biological traits and phenotypic characteristics are the result of the coexpression of multiple genes [37][38][39][40]. Co-expression network analysis, a powerful tool to discover connections between genes effectively and accurately, has been widely applied in plant research. For example, researchers found that genes encoding beta-glucosidases and sorbitol dehydrogenases in Pyrus bretschneideri were hub genes involved in a co-expression network [37]. By combining co-expression network analysis and functional enrichment tools, researchers identi ed 1,896 functional modules associated with moso bamboo development [38]. In maize, co-expression analysis was applied at the pathway level and enabled researchers to identify genes that potentially regulate the accumulation of gene transcripts associated with glossiness [39]. However, we have no knowledge of gene co-expression modules in L. chinense; thus, it is necessary to study gene co-expression in L. chinense to improve our understanding of gene expression patterns.
Gene co-expression and tissue-speci c genes in uence biological traits and phenotypic characteristics.
Moreover, tissue-speci c genes play fundamental roles in tissue differentiation and maintenance [41]. Researchers have found that the divergence of tissue-speci c gene expression patterns modulates ion maintenance in tissues and improves salinity tolerance in Populus euphratica [42]. In pineapple (Ananas comosus), researchers detected 273 tissue-speci c genes, which provided a basis for the study of the genetic mechanisms that regulate fruit phenotypes [43]. Although studies of L. chinense tissues (leaves and petals) have been reported, tissue-speci c genes in L. chinense remain unknown, and this lack of knowledge impedes our understanding of tissue traits of this species [5,6].
Researchers have studied the transcriptome of L. chinense but have focused on only one or two tissues [5,6]; a high-quality reference transcriptome of L. chinense is still needed. More importantly, research on MIKC-type MADS-box genes in L. chinense has not been reported. To understand the transcriptome of L. chinense comprehensively and bridge the gap in knowledge concerning tissue-speci c genes, gene coexpression modules and MIKC-type MADS-box genes in L. chinense, we used hybrid sequencing to study the transcriptomes of multiple tissues (bracts, sepal, petals, pistils, stamens, leaves, and the shoot apex) in L. chinense. By using hybrid sequencing, we identi ed TFs, lncRNAs, differentially expressed genes (DEGs), and fusion transcripts and investigated AS and APA in L. chinense. Importantly, gene coexpression modules and tissue-speci c genes were discovered via hybrid sequencing. We analysed the phylogenetic relationship of MIKC-type MADS-box genes in A. thaliana, P. trichocarpa and L. chinense. These ndings will provide a basis for the study of the tissue characteristics and the establishment of a ower development model of L. chinense in the future.

Results
Global view of the PacBio sequencing data Using PacBio SMRT, we obtained 611,876 polymerase reads from 21 pooled samples (Additional le 9: Table S1). We then extracted 10,437,029 subreads with an average length of 2,177 bp from the polymerase reads (Additional le 9: Table S1). After processing them further, we obtained 498,059 circular consensus sequences (CCSs), each of which contained a poly(A) tail and 5′ and 3′ adaptors, with an average length of 2,755 bp (Additional le 9: Table S1). Among the CCSs, we detected 430,554 fulllength non-chimeric (FLNC) reads, with an average length of 2,568 bp (Additional le 9: Table S1). After polishing the reads with both iterative clustering for error correction (ICE) and Arrow, we obtained 227,276 polished consensus reads whose average length was 2,697 bp (Additional le 9: Table S1). By mapping the short reads generated from Illumina sequencing via LoRDEC, we corrected up to 100% of the sequencing errors [44]. We ultimately obtained 227,276 high-quality polished consensus reads with an average length of 2,696 bp for further analysis (Additional le 10: Table S2).

Isoform detection and characterization
Using TAPIS software, we classi ed and characterized the full-length isoforms [23]. In total, we obtained 89,640 isoforms, including 4,210 (4.70%) isoforms of known genes, 65,448 (73.01%) novel isoforms of known genes, and 19,982 (22.29%) isoforms of novel genes (Fig. 1a). We compared the lengths of the transcripts between the PacBio data and reference genome annotation, and found that the transcripts described in the reference genome annotation were shorter than those detected by PacBio Iso-Seq ( Fig.  1b), which helps to reveal the complexity of the L. chinense transcriptome. In addition, we compared the numbers of exons in the transcripts between the PacBio data and reference genome annotation. We found that single-exon genes represented a large proportion (30.37%) of the PacBio data, while in the reference genome, single-exon genes accounted for only 7.29% (Fig. 1c). The proportion of multiple-exon genes (≥ 2 exons) in the reference genome annotation was greater than that in the PacBio data (Fig. 1c).

Read mapping and identi cation of novel genes and TFs
Using GMAP software, we aligned the high-quality polished consensus reads to the reference genome and found that 1.13% (2,572/227,276) of the reads were unmapped [45].  Table S3).
We de ned reads that mapped to unannotated regions of the reference genome as reads from novel genes, and a total of 13,139 novel genes were detected. To better understand the novel genes, they were functionally annotated. In total, 9,343 (NR), 5,833 (NT), 6,494 (Pfam), 5,478 (KOG), 5,883 (Swiss-Prot), 9,177 (KEGG) and 6,494 (GO) novel genes were annotated in the 7 databases (Fig. 1g). In addition, 1,945 novel genes were annotated across all seven databases, and 10,905 novel genes were annotated in only one database (Fig. 1g). Furthermore, 2,234 novel genes were unannotated in all 7 databases, which may indicate that these genes have little coding ability and thus may represent lncRNAs [52].
We then used iTAK software to predict TFs [53]. We obtained a total of 5,532 TFs from 95 families (Additional le 12: Table S4 Table S4).
Identi cation of lncRNAs and fusion transcripts.
A fusion gene consists of multiple coding regions connected end to end and controlled by a set of regulatory sequences (including promoters, enhancers, and ribosome-binding sequences). A total of 887 fusion transcripts were identi ed in this study (Additional le 13: Table S5). By analysing the distribution of fusion transcripts on the scaffolds (because the reference genome has not yet been assembled at the chromosomal level), we found that these fusion events tended to occur between scaffolds (866) rather than within a scaffold (21) (Additional le 13: Table S5).

AS and APA analyses
AS is an important mechanism for regulating gene expression and proteome diversity, and it is also an important cause of the large differences between the number of genes and the number of proteins in eukaryotes [58,59]. By using the SUPPA tool, we identi ed 8,503 alternatively spliced genes, including 3,748 genes with skipped exons (SEs), 4,269 genes with retained introns (RIs), 404 genes with mutually exclusive exons (MXs), 3,625 genes with alternative 5′ splice sites (A5s), 1,521 genes with alternative rst exons (AFs), 4,195 genes with alternative 3′ splice sites (A3s), and 632 genes with alternative last exons (ALs) (Fig. 3a). Among the 7 basic alternatively spliced genes, RI genes were the most common, while MX genes were the least frequent (Fig. 3a).
Most genes in eukaryotes can generate a variety mRNA 3′ ends through APA, which greatly increases the complexity of the transcriptome [60,61]. By using the TAPIS pipeline [23], we found that 11,108 genes contained at least one poly(A) site, among which 4,387 genes contained a single poly(A) site and among which 6,271 genes contained more than one poly(A) sites (Fig. 3b). To identify the potential cis-element involved in polyadenylation, a motif enrichment analysis was performed to analyse the 50 nucleotides upstream from the poly(A) sites [23]. We discovered a conserved motif (AUAAA) upstream of the poly(A) cleavage site (Fig. 3c), and this motif also was present in maize, red clover (Trifolium pratense L.) and sorghum (Sorghum bicolor) [22][23][24]. Moreover, to investigate the preferential nucleotides at the poly(A) cleavage sites, we analysed the nucleotide composition of 50 downstream and 50 upstream nucleotides at all APA cleavage sites. We found that uracil (U) was enriched upstream of the APA cleavage sites while adenine (A) was enriched downstream of the cleavage sites (Fig. 3d). The same ndings were reported in red clover and sorghum [23,24].
Identi cation of DEGs and tissue-speci c genes in L. chinense To investigate gene expression patterns in the seven evaluated tissues of L. chinense, we used fragments per kilobase of transcript sequence per million mapped reads (FPKM) values to normalize the reads from Illumina sequencing. In total, we identi ed 3,032 DEGs between oral tissues (bracts, sepals, petals, stamens, and pistils) and vegetative tissues (leaves and the shoot apex), of which the expression of 878 DEGs was upregulated, whereas that of the other 2,154 was downregulated (Fig. 4a). We performed GO enrichment analysis to relate these 3,032 DEGs to their products, and the results showed that the top three terms containing categorized genes were 'catalytic activity', 'single-organism metabolic process', and 'transferase activity' (Additional le 1: Fig. S1a). Furthermore, KEGG pathway enrichment analysis revealed that the DEGs participated in 107 pathways, and the top 3 signi cantly enriched pathways were associated with biosynthesis of avonoids, glycosphingolipids, stilbenoids, diarylheptanoids and gingerol (Additional le 1: Fig. S1b).
In the different tissues, the euclidean distance method was used to perform a clustering analysis of all the genes to identify their clustering patterns [62]. The results showed that these genes exhibited different expression patterns; for example, a number of genes tended to be highly expressed in vegetative tissues, and many genes tended to be highly expressed in oral tissues (Fig. 4c). The 21,944 genes (with different FPKM values in different tissues) were then grouped into six subclusters via the H-cluster algorithm. The results showed that these genes presented different expression patterns (Additional le 2: Fig. S2). Interestingly, the expression pattern of the genes in subcluster 1 (10,124 genes), whose expression was upregulated in the stamens, pistils, shoot apex, and owers (the bracts, sepals, petals, stamens, and pistils combined) but downregulated in the leaves, was opposite that in subcluster 2 (2,289) (Additional le 2: Fig. S2). Interestingly, genes in subcluster 6 (296 genes) showed a strong tissue preference, exhibiting patterns of which their expression highly upregulated in the stamens and owers (Additional le 2: Fig. S2).
To better understand tissue-speci c genes in L. chinense, we investigated the distribution of tissuespeci c genes in L. chinense and the pathways with which these genes were associated. Genes with less than 1 FPKM in any tissue, were considered not expressed in that tissue [63]. In our study, we detected a total of 2,126 tissue-speci c genes in 7 tissues. Stamen tissue contained the highest number of tissuespeci c genes (819, 38.71%), followed by the leaf (372, 17.58%), shoot apex (343, 16.21%), bract (263, 12.43%), pistil (225, 10.63%) and petal (52, 2.46%) tissues, while the sepal tissue contained the minimum number of tissue-speci c genes (42, 1.98%) (Fig. 4b). The expression patterns of the 2,126 genes were tissue speci c (Fig. 4d), and KEGG pathway enrichment analysis also revealed that these tissue-speci c genes may play roles in different pathways. For example, bract-speci c genes were signi cantly enriched in plant-pathogen interactions (Additional le 3: Fig. S3a). Interestingly, the tissue-speci c genes in the shoot apex were enriched not only in plant-pathogen interactions, but also in monoterpenoid biosynthesis (Additional le 3: Fig. S3b). The petal-speci c genes were associated with thiamine metabolism (Additional le 3: Fig. S3c), while the stamen-speci c genes were signi cantly enriched in glycosphingolipid biosynthesis and starch and sucrose metabolism (Additional le 3: Fig. S3d). The pistil-speci c genes were enriched in avonoid biosynthesis (Additional le 3: Fig. S3e). Interestingly, the sepal-speci c genes were also enriched in monoterpenoid biosynthesis (Additional le 3: Fig. S3f).

Gene co-expression analysis
Weighted correlation network analysis (WGCNA) was used for gene co-expression analysis [64]. In total, 13 co-expression modules were obtained, with the number of genes involved ranging from 173 (tan module) to 4,503 (turquoise module) (Fig. 5a, Additional le 14: Table S6). In the shoot apex, genes from the black module were highly expressed, and KEGG enrichment analysis revealed that these genes were associated mainly with plant-pathogen interactions and plant hormone signal transduction (Fig. 5b, Additional le 4: Fig. S4). Genes from the green-yellow module tended to be expressed in the bracts and shoot apex, and these genes also participated in plant-pathogen interactions (Additional le 4: Fig. S4,  Fig. 5c). Genes from the magenta module, which were associated with glycolysis and the tricarboxylic acid (TCA) cycle, tended to be expressed in the petals and sepals (Fig. 5d, Additional le 4: Fig. S4), while genes from the pink module were highly expressed in the leaves, petals, and sepals and were involved in porphyrin and chlorophyll metabolism and photosynthesis (Fig. 5e, Additional le 4: Fig. S4). Genes belonging to the purple module were preferentially expressed in the petals and stamens (Additional le 4: Fig. S4). By using KEGG pathway enrichment analysis, we found that these genes played a role in oxidative phosphorylation and the TCA cycle (Fig. 5f). In the pistils, the expression of tan module genes was very high, and these genes were related to indole alkaloid biosynthesis (Fig. 5g, Additional le 4: Fig.  S4).
To determine the hub genes in the black, green-yellow, magenta, pink, purple, and tan modules, six algorithms, degree, edge percolated component (EPC), maximal clique centrality (MCC), closeness, radiality, and stress, in the CytoHubba package of Cytoscape software were used [65]. To improve reliability, only the genes ranked in the top 20 by all seven algorithms were considered hub genes. We ultimately identi ed 9, 8, 9, 5, 9, and 12 hub genes in the black, green-yellow, magenta, pink, purple, and tan modules, respectively (Additional le 5: Fig. S5, Additional le 15: Table S7). These hub genes play important roles in different pathways. For example, Lchi02285 (which encodes the MADS-box protein JOINTLESS, black module) played a role in regulating transcription (Additional le 15: Table S7) [66]. Novelgene6677 (which encodes the disease resistance protein RPM1, green-yellow module) had a substantial role in plant resistance (Additional le 15: Table S7) [67]. Lchi18683 (which encodes plastocyanin, pink module) was indispensable for photosynthesis (Additional le 15: Table S7).

MIKC-type MADS-box genes in L. chinense
Using HMMER software and the BLASTP tool, we identi ed 10 MIKC-type MADS-box genes (7 known genes and 3 novel genes) in the L. chinense genome [68]. By comparing MIKC-type MADS-box genes in A. thaliana and P. trichocarpa with those in L. chinense, we determined the phylogenetic relationship among these 105 genes and conducted a neighbour-joining phylogenetic tree (Fig. 6a). Among these genes, Lchi23168, Lchi01744 and Novelgene12125 were classi ed as member of the AP3/PI subfamily, Lchi02285 was classi ed as member of the SVP subfamily, Lchi20361 were classi ed as member of the SEP subfamily, Novelgene10568 were classi ed as member of the AGL12 subfamily, Lchi01587 and Lchi04024 were classi ed as member of the AG/SHP/STK subfamily, Lchi17876 was classi ed as member of the SOC1 subfamily, and Novelgene0718 was classi ed as member of the TM8 subfamily (Fig. 6a). We found that the number of MIKC-type MADS-box genes in L. chinense (red circle) was less than that in A. thaliana (yellow circle) and P. trichocarpa (blue circle), and no L. chinense MIKC-type MADS-box gene was classi ed as member of the AGL6, AGL17, AGL15, FLC or AP1/FUL subfamily (Fig.  6a). We believed that the poor quality of the genome annotation may limit identi cation of additional MIKC-type MADS-box genes in L. chinense, as three of the ten MIKC-type MADS-box genes were identi ed via hybrid sequencing, which indicated that the information of L. chinense genome annotation was incomplete.
By using hybrid sequencing, we determined the expression pro les of ten MIKC-type MADS-box L. chinense genes. Lchi23168 and Lchi01744 were B-type genes that were related to the determination of stamen and petal identity and were highly expressed in the stamens and petals (Fig. 6b). As an E-type gene involved in the determination of the identify of all oral organs, Lchi20361 was highly expressed in the pistils, petals and stamens (Fig. 6b). Furthermore, Lchi04024 was highly expressed in the pistils and stamens, Lchi01587 was highly expressed in the pistils, and these two genes were C/D-type genes related to the determination of carpel and stamen identity (Fig. 6b). Interestingly, we found that the expression levels of four MIKC-type MADS-box genes, Lchi17876 (SOC1 subfamily), Lchi02285 (SVP subfamily), Novelgene0718 (TM8 subfamily) and Novelgene10568 (AGL12 subfamily), were greater in vegetative tissues than in oral tissues (Fig. 6b). Notably, not all of the MIKC-type MADS-box genes were highly expressed only in oral organs, as some of them were also highly expressed in vegetative organs. For example, AGL12 is highly expressed in the A. thaliana root meristem, and the expression levels of SVP-like genes in Prunus mume and SOC1-like genes in P. bretschneideri are greater in vegetative organs than in oral organs [69][70][71].

RT-qPCR validation
To validate the accuracy of the gene expression levels detected by RNA sequencing, we performed RT-qPCR of twenty randomly selected DEGs. As shown in Additional le 6: Fig. S6 and Additional le 7: Fig.  S7, the expression levels of the ten DEGs measured by Illumina sequencing and RT-qPCR were correlated (R 2 = 0.947, p-value < 0.01) despite some differences in transcript abundance. These results indicated that the gene expression levels detected by Illumina sequencing were reliable.

Discussion
L. chinense, an economically and ecologically important tree species is continually being studied. From the study of the relationship between the geographical distribution of L. chinense and climate to the study of the characteristics and development of its different tissues, research on L. chinense is intensifying [1,5,6]. Especially since the genome of L. chinense was published, the understanding of the position of L. chinense in evolutionary history has improved 7 . However, the quality of the genome sequencing data is unsatisfactory, the L. chinense samples studied have been limited to only a few tissues, and no full-length transcriptome of L. chinense has been reported. Recently, SMRT sequencing combined with NGS has been used to improve the quality of genomes, discover fusion genes and lncRNAs, and determine the full-length transcriptome of many species, such as moso bamboo, S. miltiorrhiza and Populus spp. [25,29,72]. For these purposes, we used hybrid sequencing to obtain a reference transcriptome of L. chinense from 21 samples. We detected 13,139 novel genes, 5,532 TFs from 95 families, 7,527 lncRNAs and 887 fusion genes (Fig. 1g, Additional le 12: Table S4, Additional le 13: Table S5). Moreover, we also found that the lncRNAs were shorter and smaller than the mRNAs in terms of length and exon number (Fig. 2c and d). This phenomenon has also been observed in humans, Sus scrofa, Populus and Gossypium hirsutum [30,34,[72][73][74]. In addition, in the study of maize and red clover, fusion events tended to arise inter-chromosomally rather than intra-chromosomally [22,24]. In L. chinense, it is di cult to determine the regularity of the occurrence of fusion events at the chromosomal level because the genome has not yet been assembled at this level. However, these results still improve our understanding of the transcriptome of L. chinense.
Notably, our transcriptome analysis not only included the discovery of lncRNAs, TFs, and fusion genes but also included the discovery of AS and APA. AS and APA are important post-transcriptional regulatory mechanisms that can increase the complexity and exibility of the proteome and transcriptome [75]. In maize and sorghum, AS events occur in 45% and 38.5% of the genes, respectively [32]. In our study, 17.6% (8,503/48,408) of alternatively spliced genes were detected, and RI genes were the most frequent alternatively spliced genes (Fig. 3a). This result was consistent with the ndings in sorghum, red clover, and maize [22][23][24]. We suggest that these alternatively spliced genes may play important roles in maintaining homeostasis between transcripts and proteins in L. chinense. Because AS can cause alternatively spliced genes to produce different AS variants, some AS variants can encode novel proteins, some of which are degraded in different ways before they encode proteins [59,76]. Moreover, APA also plays an important regulatory role in maintaining RNA stability, ensuring accurate RNA localization and translation, and in plant development especially owering [60,77,78]. In A. thaliana, approximately 60% of the genes have multiple poly(A) sites [60]. In L. chinense, 13.0% (6,271/48,408) of the genes had more than one poly(A) site (Fig. 3b). As far as we know, the number and source of sequencing samples, sequencing depth, and analysis methods have an important impact on the analysis results of AS and APA, so it was not uncommon for the proportion of alternatively spliced genes and APA genes in L. chinense to be low [60,61,79]. Moreover, we found that the distribution of nucleotides upstream and downstream of the APA cleavage sites was consistent with the reports of previous poly(A) analyses in sorghum and red clover (Fig. 3d) [23,24]. We also found that a motif (AUAAA) upstream of the poly(A) cleavage sites was present in maize, red clover, sorghum and other plant species (Fig. 3c) [22][23][24]. This indicated that the motif was conserved. These ndings indicated that AS and APA made great contributions to complexity and exibility of the L. chinense transcriptome.
The discovery of DEGs was important for the study of plant development, metabolic pathways and the stress response. Researchers identi ed 3,118 DEGs that were related to L. chinense leaf shape development [6]. In addition, 962 DEGs were discovered to be associated with the alkaline stress response in rice [19]. By analysing the DEGs in the root, researchers revealed the pathway of tanshinone biosynthesis in S. miltiorrhiza [29]. Tissue-speci c genes, play important roles in plant defence, stress response, plant development and material metabolism [41,42,[80][81][82][83]. In Ferula asafetida, genes involved in terpenoid and phenylpropanoid metabolism tended to be expressed in the owers [83]. In the study of tomato (Solanum pennellii), researchers concluded that tissue-speci c genes play important regulatory roles during fruit development [41]. In addition, tissue-speci c genes can enhance Petunia hybrida tolerance under salt conditions [81]. In our study, we identi ed 3,032 DEGs between oral tissues and vegetative tissues and detected 2,116 tissue-speci c genes in L. chinense ( Fig. 4a and b). Interestingly, we found that bract-speci c genes were enriched in plant-pathogen interactions (Additional le 3: Fig. S3a).
According to a previous study, bracts have multiple functions, such as rain protection, pollinator attraction, and photosynthetic surfaces [84]. However, there are few reports on plant-pathogen interactions in bracts. We also found that the stamen-speci c genes were enriched in starch and sucrose metabolism (Additional le 3: Fig. S3d); it is easy to understand that stamen development is accompanied by starch accumulation and degradation [85]. Shoot-apex-speci c genes are involved in plant hormone signal transduction (Additional le 3: Fig. S3b). The shoot apex contains different primordia, and phytohormones participate in the regulation of primordial development. These results showed that tissue-speci c genes and DEGs may play roles in the functional diversity of tissues in L. chinense. Overall, these ndings greatly increase our understanding of tissue-speci c genes and DEGs in L. chinense.
We were interested not only in DEGs and tissue-speci c genes but also in co-expressed genes. Gene coexpression has an important impact on biological traits and phenotypic characteristics and has basic functions in plant growth, development, and environmental adaptability [86][87][88]. Co-expressed genes play important roles in the growth and development of moso bamboo [38]. In addition, co-expression of salt overly sensitive 1 (SOS1) and PM-localized H + -ATPase (AHA1) from Sesuvium portulacastrum in transgenic A. thaliana can enhance salinity resistance in transgenic plants [88]. By analyzing the coexpression of genes, we obtained additional data and provided new insight into the functions of organs in L. chinense. In our study, we identi ed 13 co-expression modules, and the genes in these modules participated in different pathways (Fig. 5, Additional le 4: Fig. S4). KEGG pathway enrichment analysis revealed that the genes from the green-yellow module were highly expressed in the shoot apex and bracts and were involved in plant-pathogen interactions (Fig. 5c, Additional le 4: Fig. S4). In pineapple, genes co-expressed in bracts participate in plant-pathogen interactions and these genes may paly roles in plant defence [43]. Therefore, we can infer that the bract and shoot apex may function in plant defence. We also found that leaves, petals, and sepals were the main organs for photosynthesis in L. chinense because the genes from the pink module were highly expressed in these organs, and those genes participated mainly in photosynthesis (Fig. 5e, Additional le 4: Fig. S4). These ndings indicated that the function of different L. chinense. tissues was the result of gene co-expression.
To the best of our knowledge, MIKC-type MADS-box genes include most ABCDE-type genes that play pivotal roles in ower development regulation and the determination of oral organ identity [89,90]. Many MIKC-type MADS-box genes have been identi ed in different species [10,[91][92][93]. In this study, we identi ed 10 MIKC-type MADS-box genes from 7 subfamilies (Fig. 6a). Phylogenetic analysis revealed that these ten genes were highly conserved evolutionarily. However, no AP1/FUL (A-type genes) subfamily genes were identi ed in this study, and the number of MIKC-type MADS-box genes was lower in L. chinense less than in A. thaliana, P. trichocarpa and Amborella trichopoda [94]. In actuality, L. chinense originated earlier than A. thaliana and P. trichocarpa but later than A. trichopoda, and in angiosperms, the number of ABCE-type genes has increased [7,94]. Theoretically, the number of A-type genes in L. chinense should be greater than zero. We believed that the poor quality of the L. chinense genome annotation prevented us from nding more MIKC-type MADS-box genes. Moreover, MIKC-type MADS-box genes function not only f in reproductive organ development and owering regulation but also in vegetative organ growth and the stress response. For example, OsMADS26 (AGL12 subfamily) is related to stress response, AGL12 regulates the proliferation of root meristem cells in A. thaliana, and SVP-like genes control dormancy and budbreak in apple [69,95,96]. Similarly, among the 10 MIKC-type MADS-box genes, 6 genes were highly expressed in reproductive tissues, and 4 genes were highly expressed in vegetative tissues (Fig. 6b). These ndings indicated that L. chinense MIKC-type MADS-box genes might be related to the development and growth regulation of oral and vegetative organs.

Conclusions
In this work, 13,139 novel genes, 5,532 TFs, 7,527 lncRNAs, 887 fusion transcripts, 8,503 alternatively spliced genes, and 11,108 genes with APA sites were identi ed via hybrid sequencing. On the basis of hybrid sequencing data, we used WGCNA to identify 2,126 tissue-speci c genes and 13 gene coexpression modules. KEGG pathway enrichment analysis further revealed that tissue-speci c genes and co-expressed genes functioned in different pathways. We also identi ed 52 hub genes from six coexpression modules of interest, and these genes played important roles in speci c pathways. Moreover, we identi ed 10 MIKC-type MADS-box genes that might be related to the development and growth regulation of oral and vegetative organs. These ndings will support future research on the differentiation of tissue functions in L. chinense.

PacBio library construction and sequencing
Twenty-one samples were pooled together with equimolar ratios. One microgram of total RNA was used to synthesize cDNA via SMARTer TM PCR cDNA Synthesis Kit (TaKaRa, Japan). After PCR ampli cation, we used a portion of cDNA for size fractionation via a BluePippin TM Size Selection System (Sage Science, MA) to enrich fragments that were longer than 4 kb. Two complete SMRT bell libraries (≤ 4 kb and > 4 kb) were then constructed after full-length cDNA ampli cation, damage repair, end repair, ligation of SMRT linkers, exonuclease digestion, application of binding primers, and application of binding DNA polymerase. Sequencing was then performed on the PacBio Sequel platform.
Illumina sequencing and quality control Twenty-one samples from 7 tissues (bracts, sepals, petals, stamens, pistils, leaves and the shoot apex) were sequenced on the Illumina HiSeq 2500 platform. For each sample, three micrograms of total RNA were used for library construction with a NEBNext® Ultra TM RNA Library Prep Kit. In-house Perl scripts were used to process the raw reads. We then obtained clean data after removing the reads with adaptors, reads containing poly(-N) sequences, and low-quality reads from the raw data.
PacBio data processing SMRT Link 6.0 software was used to produce effective subreads (length ≥ 200, read score ≥ 0.75). The CCSs were then selected from the subreads (passes ≥ 1, predicted accuracy ≥ 0.8). The pbclassify.py script was subsequently used to classify the CCSs into full-length and non-full-length reads (ignoring false poly(A) results, sequencing length < 200). Each CCS had a poly(A) tail and 5′ and 3′ adaptors and was considered an FLNC read. The full-length and non-full-length reads were then used to perform ICE, followed by a nal polishing by Arrow. Polished consensus reads were ultimately produced. The Illumina sequencing data were used to correct additional nucleotide errors in the consensus reads with LoRDEC software [44].
PacBio Iso-Seq data analysis GMAP software was applied to align the polished consensus reads (long reads) to the reference genome [45]. The polished reads were classi ed into ve categories: (a) unmapped, (b) multiply mapped, (c) uniquely mapped, (d) mapped to '+' sequences (sense sequences of the genome), and (e) mapped to '-' sequences (antisense sequences of the genome). According to the mapping results, reads that were aligned to the unannotated regions of the reference genome were de ned as novel genes. Seven databases, the NT, GO, Pfam, KEGG, NR, KOG, and Swiss-Prot databases, were used to annotate the functions of unmapped transcripts and novel genes [46][47][48][49][50][51]. Moreover, we investigated the number of fusion transcripts in the PacBio data. The fusion transcripts had to meet the following requirements: (a) a full-length transcript mapped to two or more gene loci in the reference genome; (b) each locus must cover 10% of the transcript; (c) the total coverage of the transcript with respect to the reference genome must be more than 99%; (d) each locus must be separated by more than 100 kb in the reference genome; and (e) the gene loci must be supported by at least two NGS reads.
We used iTAK software to predict TFs and used four additional tools (PLEK, CNCI, CPC, and the Pfam database) to predict lncRNAs, with the default parameters [47,[53][54][55][56]. Transcripts with no potential coding sequence were retained as our set of candidate lncRNAs.
SUPPA software (https://bitbucket.org/regulatorygenomicsupf/suppa) was used for AS analysis with the default parameters. SUPPA classi es AS events into 7 categories: SEs, MXs, A5s, A3s, RIs, AFs, ALs. We then used the TAPIS pipeline to predict APA sites [23], and we used MEME to analyse the nucleotide composition of the sequences upstream (-50 nt) and downstream (+ 50 nt) of all the APA sites for nucleotide bias [97].

Differential expression and GO/KEGG enrichment analyses
The number of reads that mapped to each gene was calculated by Cuffdiff software [98]. The FPKM value of each gene was calculated on the basis of the read count mapped to the gene and the length of the gene. Gene co-expression network analysis and identi cation of hub genes Gene co-expression analysis was performed via WGCNA [64]. First, on the basis of the expression level of each gene in the different samples, the correlation between every pair genes was calculated. The calculated results were then used to construct a gene expression network, and the gene expression network map was subsequently pruned to obtain different modules. The CytoHubba package in Cytoscape software (http://www.cytoscape.org/download.html) was used to identify hub genes in the different modules [65]. We used the degree, EPC, MCC, closeness, radiality, and stress algorithms to select genes ranked in the top 20 by all algorithms as hub genes.

Identi cation and phylogenetic analysis of MIKC-type MADS-box genes
We used the amino acid sequences of A. thaliana MIKC-type MADS-box genes as queries to against the L. chinense genome database via the BLASTP algorithm [7,10]. We then used HMMER software to search genes with SRF-TF (PF00319) and K-box (PF01486) domains on the basis of hidden Markov model (HMM) [68]. The genes identi ed in the results of the above two analyses were further identi ed via the Pfam database online tool (http://pfam.xfam.org/).
Proteins of MIKC-type MADS-box genes from A. thaliana, P. trichocarpa and L. chinense were used to construct phylogenetic tree [10,93]. We used MEGA 6 software and EVOLVIEW online software (https://evolgenius.info//evolview-v2/) to construct phylogenetic tree based on neighbour-joining algorithm, and the number of bootstrap replications was set to 1,000 [99]. Declarations Ethics approval and consent to participate: We con rm that the material collection presented here were conducted in accordance with the wild plant care regulations and natural reserves regulations set forth by the Decree of the state council of the People's Republic of China.
Consent for publication: Not applicable.
Availability of data and materials: All the raw data in this study have been uploaded to the public database of National Center for Biotechnology Information under PRJNA559687 (Release date:2019-08-12). The experimental materials were collected from a trial plantation (belonging to Nanjing Forestry University) in Xiashu, Jurong County, Jiangsu Province (119°13′E,32°7′N).
Competing interests: The authors declare that they have no competing interests.
Funding: The design of the study, sample collection, NGS and PacBio sequencing, data analysis and interpretation were supported by the National Natural Science Foundation of China (31770718, Figure 1 Statistics of Iso-Seq isoforms. a Classi cation of transcripts on the basis of the reference genome. b Comparison of isoform lengths between the PacBio data and reference genome annotation. c