Full-length transcriptome sequencing of the identified fruit shape-related genes of the Olecranon honey peach

Background: The olecranon honey peach belongs to the peach class. In terms of the fruit type, the Olecranon honey peach, shaped like an eagle’s beak, is larger than others. In this study, Full-length transcriptome sequencing of the Olecranon honey peach and the bud mutant variant of the peach was performed by Oxford Nanopore Technologies to comparatively analyze the differentially expressed genes and the transcriptome, to identify important fruit shape-related genes. Results:Full-length transcriptome sequencing was performed to analyze the two peaches. Consensus isoform was obtained and compared to the reference genome for a de-redundancy analysis. As a result, a final set of 58,596 transcript sequences was obtained. A total of 21,745 simple sequence repeats were obtained, and 18,322 alternative splicing (AS) events were identified. The comparative analysis of the non-redundant transcript revealed 2530 new gene loci and 37,364 novel transcripts. A total of 457 genes were differentially expressed in the two groups, including 169 up-regulated genes and 288 down-regulated genes. A total of 1519 transcripts were differentially expressed, of which 552 were up-regulated and 997 were down-regulated. Conclusions: In the case of the plant hormone signal transduction pathway, identified by the KEGG annotation, our analyses revealed that differential expression of genes 9229, 26004, 22504, 2822, 2826, 2824, ONT.1953, ONT.1950, and ONT.1953 was related to the shape of the peach and may regulate the production of large fruits, via endogenous hormones, secondary metabolites, and signal transduction. This study provided useful information on the shape-related genes and transcripts in the Olecranon honey peach. the transcriptome of the Olecranon honey peach with its bud variant but also predicted genes related to the fruit type and shape.

of peaches and a comprehensive genome-wide association study of the fruit shape. The results suggest that the PpCAD1 gene (ppa003772m, designated as PpCAD1) plays an important role in the peach fruit shape [3]. Fruit size has been predominantly selected for both domestication and improvement of peach farming [4]. ONT sequencing is a next-generation technology for nanopore sequencing [5]. DNA/RNA strands bind to the biological membrane-embedded nanopore protein under the control of motor proteins, and then the strands unwind. Based on the voltage difference across the biological membrane, the DNA/RNA strand passes through the nanopore protein at a specific rate.
Because of differences in the chemical properties of different bases on the DNA/RNA chain, when a single base or DNA molecule passes through the nanopore channel, changes in the electrical signals occur [6]. By detecting and responding to these signals, the corresponding base can be deciphered, enabling real-time sequencing [7]. Nanopore-based interpretation is based on the current signal, making the decision process complex. The nanopore interprets bases using the sophisticated "Recurrent Neural Network" algorithm according to the magnitude of the current [8].
Bud mutation is a reportedly important tool to establish new cultivars, particularly for vegetative propagation of fruits, and has been recognized as a critical resource in functional genomics studies of model plant species [9,10]. In Lianping, in the northern Guangdong province, China, bud mutation was found on a tree of the Olecranon honey peach. Lianping is a city located at the longitude 114°14′-114°56′, the latitude 24°06′-24°36′, and has a subtropical monsoon climate. Its annual average air temperature is 18.5-20.7℃, precipitation is 1700 mm-1800 mm, and sunshine duration is 1668 h. It is well-known that high protein levels or enzyme activities are not necessarily determined by a high gene transcript level [11]. In this study, we sequenced the transcriptome of the peach by third-generation sequencing using the Oxford Nanopore Technologies (ONT), which is based on nanopore sequencing and single-molecule real-time (SMRT) sequencing technologies [5]. We not only compared the transcriptome of the Olecranon honey peach with its bud variant but also predicted genes related to the fruit type and shape.

Comparison of two peach fruit quality indices
In this study, a marked difference in appearance between the Olecranon honey peach and its bud variant was observed ( Fig. 1A and 1B). The tip of the Olecranon honey peach is shaped like an eagle's beak is large, and warped. The two types of peaches also differ significantly in some quality indicators, such as weight, longitudinal diameter, Ttransverse diameter, hardness, out-of-juice rate, color and luster, nutrients, and mineral elements ( Table 1). The quality indicators, such as cluster analysis and NMDS analysis, of the two kinds of peaches, also differ ( Fig. 5A and 5B). Notably, the weight of the Olecranon honey peach is 205.50 g ± 0.98 g, whereas the weight of the budding variant of the peach is 116.26 g ± 1.23 g. The fruit shape index was calculated according to the ratio of longitudinal diameter to transverse diameter of the fruit. Their fruit shape indices did not differ at 1.02 and 1.06, respectively. This implies that the shape of both the Olecranon honey peach and the bud mutant variant are almost circular. However, there were large differences in the longitudinal and transverse diameters of the metabolites in these peaches. Table 1 Comparison of the quality indices of the two peach fruits.

Project
Bud mutant variant of the Transcriptomic Analysis From Ont Sequencing Young fruits were sequenced because of the fruit shape, the texture properties, and the cutting content depend on events that occur early during fruit development [12]. To compare the characteristics based on the fruit type, the gene expression dynamics and the full-length transcripts of the young fruit of the two types of peaches were determined by ONT, and total Clean Data in the fastq format were further filtered to remove short reads and low-quality reads. After quality filtering, the BeadNum ranged from 5,687,141 to 7,035,534 with N50 ranging from 1263 to 1423; the mean qscore was Q9 in each library (Table 2). Through cDNA sequencing, reads can be identified as full-length sequences when primers have recognized both ends of the reads.   Table 3 Statistics of the full-length sequence data. In cDNA sequencing, the primer identified at both ends of the reads was considered as a full-length sequence.  (Fig. 1D), the perfect dinucleotide (p2) was found to have the highest density, followed by the perfect mononucleotide (p1) and the perfect tri-nucleotide (p3), the perfect pentanucleotide (p5), and the hybrid SSR (p6) were found to have the lowest density. Ultimately, 34,665 transcripts were annotated in the eight databases (Table 4). Alternative Splicing Alternative splicing (AS), the process of removing introns from pre-mRNA and the process of rearrangement of exons to produce several types of mature transcripts, occurs before protein synthesis [13]. In this study, 18,322 AS events were identified in the transcripts by AStalavista [14].
The main types of AS included exon skipping (ES), alternative 3′ splice site (A3SS), Mutually exclusive exon (MEE), alternative 5′ splice site (A5SS), and Intron retention (IR) (Fig. 1E Table 2). In the two groups of samples, the percentages of A5SS and ES in the Olecranon honey peach were significantly lower than those in the bud peach. However, IR showed the opposite results (Fig. 1C).

Differential Expression Analysis Of Gene And Transcripts
Gene and transcript expression are time and spatial-specific. Under different conditions, the expression levels of genes and transcripts significantly differ. There were 457 Differential expression genes (DEGs) in the two groups which the Olecranon honey peach and the bud peach, including 169 up-regulated genes and 288 down-regulated genes ( Fig. 2A). Up-regulated genes showed higher expression levels in the bud peach than those in the Olecranon honey peach, whereas the downregulated genes showed the opposite results. Hierarchical clustering analysis was performed for the screened DEGs, and genes with the same or similar expression patterns were clustered. The genes were divided into two subclasses (Fig. 2B). The 457 DEGs were annotated using the best BLAST hit from eight protein databases. A total of 390 genes were annotated in the eight databases, and 226 genes were annotated using the GO database. These 226 genes were annotated DEGs and divided into three main GO functional categories (biological process, cellular component, and molecular function) and 36 subcategories (Fig. 2C). In the cellular component (CC) classification, the major subcategories were "membrane," "membrane part," "cell part," and "cell." For the molecular function (MF) classification, "catalytic activity" and "binding" were the most dominant subgroups. "Metabolic process," "cellular process," and "single-organism process" were the major subcategories in biological process (BP) classification. The GO annotation classifications of all genes were divided into 50 subcategories. The major subcategories of all genes were consistent with those of the DEGs.
Ninety-nine DEGs were annotated in the COG database, and "Carbohydrate transport and metabolism," "Transcription," and "Secondary metabolites biosynthesis, transport and catabolism" were the three significant subcategories in COG functional classification analysis of the DEGs 256 DEGs were annotated in Swiss-Prot; and 325 DEGs were annotated in eggNOG.
Among the two groups of the Olecranon honey peach and the bud mutant variant of the peach, 1519 transcripts were differentially expressed, of which 552 were up-regulated, and 997 were downregulated and were divided into four subclasses (Fig. 3A). When the FDR was found to be more than 2, the transcripts were considered to be different. The expression of some transcripts in the two groups showed a fold-change greater than 10 ( Fig. 3B). A total of 1332 differentially expressed transcripts (DETs) were annotated, with 796 annotated by GO. These 796 DETs were assigned into three main GO functional categories (BP, CC, and MF) and 41 subcategories (Fig. 3C). The major subcategories were "membrane," "cell part," and "cell" in CC classification. As observed for the DEGs, for MF classification, "catalytic activity" and "binding" were the dominant major subgroups.
"Metabolic process," "cellular process," and "single-organism process" were the major subcategories in BP classification. GO annotation classifications of all transcripts were divided into 52 subcategories.
A total of 470 of the 1332 DETs were annotated by KEGG and matched 91 pathways (S- Table 3); most annotated pathways in the transcriptome were related to carbon metabolism and amino acid biosynthesis in a total of 17 DETs. Additionally, 349 DETs were annotated in the COG database; 611 DETs were annotated in the KOG database; 1320 DETs were annotated in NR; 721 DETs were annotated in Pfam; and 805 of DETs were annotated in Swiss-Prot. Further, 1125 DETs were annotated in eggNOG, and most DETS were related to the post-translational modification, protein turnover, chaperones classes, and unknown functions (Fig. 3D). Based on these results, the number of annotated DEGs was found to be significantly lower than the number of DETs.
Differential expression of the genes for the fruit shape in two types of peaches Fruit shape is an important trait affecting the appearance of the peach [3], including the Olecranon honey peach. In this budding sport, we found that both the peach diameter and the part resembling an eagle's beak became smaller. Studies have shown that plant hormones affect plant growth and are closely related to fruit development, including the shape of the fruit. For example, auxin and cytokinin are important plant hormones that regulate fruit development [19]. The concentration of auxin in the center of the fruit is higher than that of the peel, and gibberellin is related to cell division [20] and is responsible for swelling in the tomato fruit development [21]. We identified 18 DEGs related to plant hormones (S-  Table 5). According to the GO annotation, the genes were related to biological regulation, reproductive processes, development processes, and growth (S- Table 5 rna4080. Both ONT.5631.5 and ONT.5631.9 were annotated as AUX1/LAX as well as auxin influx carriers (AUX1/ LAX family) in the ko04075 database. Rna13157 was annotated as a SAUR family protein that has been known to promote cell enlargement and plant growth (Fig. 4). ONT.942.6 was annotated as AHP, which is a histidine-containing phosphotransfer protein (Fig. 4). This protein is important in promoting cell division. ONT.4473.2 was annotated as EIN3 (Fig. 4) and was defined as an ethylene-insensitive protein 3. This protein facilitates fruit ripening and senescence. Rna4079 and rna4080 were annotated as TGA and defined as the transcription factor TGA by the KO annotation.
TGA affects the resistance of plants to diseases [25]. In the analysis of the tryptophan metabolism pathway by the KEGG annotation, we identified 3 DETs: rna36192, ONT.10241.19, and ONT.10241.4.

Discussion
Bud mutations (bud sports), a consequence of somatic genetic variation can lead to the occurrence of phenotypic alteration in plants [26]. Based on the quality indicators, the bud variant of the peach remarkably differs from the Olecranon honey peach, particularly in the fruit type. These differences indicate that although buds and the Olecranon honey peaches are grown on a similar tree, they do not belong to the same variety. In this study, we evaluated the fruit type-related indicators, including weight, horizontal and vertical diameters, fruit type index, and the part of the fruit that resembles the size of an eagle's beak. The bud mutant variant of the peach was typically smaller than the Olecranon honey peach based on these indicators. The Olecranon honey peach is a new variety of peach with high consumer value in China [2]. However, not all buds have a regulatory effect on the fruit type, such as the Beni Shogun apple mutation. The Beni Shogun mutation reportedly affect traits like skin coloration, fruit softening, and starch hydrolysis, whereas the loss of acidity, sugar accumulation, and weight are seemingly unaffected by the mutation [27].
To confirm that these two groups of peaches do not belong to the same variety, they were sequenced using the ONT SMRT long read isoform platform. This method overcomes the technical hurdle inherent with an accurate prediction of the full-length splicing subtypes in second-generation sequencing short-read data [28]. The ONT platform is a new approach commonly used to evaluate plants and animals, such as the polar bear [12], Hiptage benghalensis [29], zebrafish [30], and Sus scrofa [31]. Fruit ripening is a developmental process, and the fruit shape depends on events occurring early in fruit development [12,32,33]. The first stage post-anthesis, that is, the development process of the whole fruit development after fertilization, exhibits a rapid increase of cell division and cell proliferation [34]. This stage comprises differing cell division rates and durations in the fruit tissues, that significantly impacts the final fruit shape [12]. Fruit shape formation initiates a complex responsive network, ranging from a transcriptional control to a post-translational control. In the signal transduction pathway for plant hormones, the down-regulated gene 9229 and the pair transcriptome rna13157 were shown to affect auxin processing, promoting cell enlargement and plant growth with ONT.5631.5 and ONT.5631.9. The down-regulated ONT.942.6 was shown to act on the processing of cytokinin to promote cell division and shoot initiation. In a study of apple fruit type, fruit size was related to the ability of cellular expansion [35] and division [36]. In the tryptophan metabolism pathway, the gene 26004 showed a higher expression in the Olecranon honey peaches than that in the bud mutant variant of the peach. Noteworthy, the gene 26004 codes for flavin monooxygenase, which was first identified as a key auxin biosynthesis enzyme [23]. Overexpression of YUC in Arabidopsis leads to an auxin overproduction [37]. Brassinolide is an active brassinosteroid that

Plant materials
The cultivars used in this study were Olecranon honey peach and bud into chick peach, which were provided by the lianping shangping chick peach base in guangdong. Young fruits were immediately frozen in liquid nitrogen and stored at -80℃ for Full-length transcriptome sequencing. Ripe fruit were selected by experienced fruit growers at maturity (fruit hanging time of approximately 75 days) and quickly transported to a well-ventilated, sterile laboratory for fruit quality index was determined.

Determination Of The Fruit Quality Index
Three free ripe fruits were randomly selected from each variety to determine the fruit quality index.

Determination Of Soluble Solids
The soluble solid contents of fruits were estimated by refractometry (LYT-390; Linyu Trading Co., Ltd., Shanghai, China). The fruits were cut into small pieces and putted in the juicer(SJ21-150; Supor Co.,Ltd., zhengjiang, China) to obtain a homogenous juice. Two to three drops of the juice were placed on the surface of the prism, the sweeper was positioned to align with the light source, and the achromatic knob was adjusted to divide the field of vision into two.

Hardness Assay
Fruit hardness was estimated using a durometer (GY-4; Zhejiang Tuopu Instrument Company, Zhejiang, China) according to previously published methods [40]. Briefly, the durometer was preheated for 10 min and set to zero, and hardness was subsequently measured after recording the peak value . Three points were chosen and placed parallel to each other on the measuring table, after which the handle was pressed at a constant speed to read the peak value. The resulting readings were recorded as the hardness values in kg/cm 2 .

Color Variation
Three fruits from each group were selected prior to treatment to monitor color variation throughout the study according to the methods of Poel et al.

Titratable Acid
Refer to the method of Sandie et al. [42], Titratable acidity was measured by blending 50 g of diced flesh with 50 ml of deionised water for 2 min. The puree was titrated with 0.1 M NaOH to pH 8.1 using an autotitrator. Results were expressed as mg/g of malic acid equivalent (MAE). The measurements for each treatment were repeated 3 times and the results were averaged.

Full-length Transcriptome Sequencing
The experimental procedure was accomplished using the standard protocol by ONT, including sample quality testing, library construction, library quality testing and library sequencing. ONT's library construction currently mainly uses the method of connection sequencing kit (SQK-LSK109), which is directly constructed by end repair and adding sequencing adapters, using PromethlON sequencer. The database building process is as follows. First, RNA was extracted and detected the purity, Oxford Nanopore Technologies Long Read Processing Raw reads were first filtered with minimum average read quality score = 7 and minimum read length = 500 bp. Ribosomal RNA were discarded after mapping to rRNA database. Next, full-length, nonchemiric (FLNC) transcripts were determined by searching for primer at both ends of reads. Clusters of FLNC transcripts were obtained after mapping to reference genome with mimimap2, and consensus isoforms were obtained after polishing within each cluster by pinfish.

Remove Redundant
Consensus sequences were mapped to reference genome using minimap2 [43]. Mapped reads were further collapsed by cDNA Cupcake package with min-coverage = 85% and min-identity = 90%. 5' difference was not considered when collapsing redundant transcripts.

Find Fusion Transcript
The criteria for fusion candidates is that a single transcript must: must map to 2 or more loci; minimum coverage for each loci is 5% and minimum coverage in bp is ≧ 1 bp; total coverage is ≧ 95%; distance between the loci is at least 10 kb

Structure Analysis
Transcripts were validated against known reference transcript annotations with gffcompare. AS events including IR, ES, AD, AA and MEE were identified by the AStalavista tool. SSR of the transcriptome were identified using MISA. APA analysis was conducted with TAPIS [28]. CDS were predicted by TransDecoder.

Transcription Factors Prediction
Plant transcription factors were identified with iTAK [44].

Lncrna Analysis
Four computational approaches include CPC/CNCI/CPAT/Pfam/ were combined to sort non-protein coding RNA candidates from putative protein-coding RNAs in the transcripts.Putative protein-coding RNAs were filtered out using a minimum length and exon number threshold. Transcripts with lengths more than 200 nt and have more than two exons were selected as lncRNA candidates and further screened using CPC/CNCI/CPAT/Pfam that have the power to distinguish the protein-coding genes from the non-coding genes.

Gene Functional Annotation
Gene function was annotated based on the following databases: NR (NCBI non-redundant protein sequences); Pfam; KOG/COG/eggNOG (Clusters of Orthologous Groups of proteins); Swiss-Prot (A manually annotated and reviewed protein sequence database); KEGG; GO.

Quantification Of Gene/transcript Expression Levels And Differential Expression Analysis
Full length reads were mapped to the reference transcriptome sequence. Reads with match quality above 5 were further used to quantify. Expression levels were estimated by reads per gene/transcript per 10,000 reads mapped.

For The Samples With Biological Replicates
Differential expression analysis of two conditions/groups was performed using the DESeq R package (1.10.1). DESeq provide statistical routines for determining differential expression in digital gene expression data using a model based on the negative binomial distribution. The resulting P values were adjusted using the Benjamini and Hochberg's approach for controlling the false discovery rate.
Genes with a FDR < 0.01 and foldchange ≥ 2 found by DESeq were assigned as differentially expressed.

For The Samples Without Biological Replicates
Prior to differential gene expression analysis, for each sequenced library, the read counts were adjusted by edgeR program package through one scaling normalized factor. Differential expression analysis of two samples was performed using the EBSeq R package. The resulting FDR (false discovery rate) were adjusted using the PPDE(posterior probability of being DE).The FDR < 0.01 and fold change ≥ 2 was set as the threshold for significantly differential expression.

Functional Enrichment Analysis
GO enrichment analysis of the DEGs was implemented by the GO seq R packages based Wallenius non-central hyper-geometric distribution [45], which can adjust for gene length bias in DEGs. KEGG pathway enrichment analysis: KEGG [46] is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies (http://www.genome.jp/kegg/). We used KOBAS [47] software to test the statistical enrichment of differential expression genes in KEGG pathways.

Protein Protein Interaction
The sequences of the DEGs was blast (blastx) to the genome of a related species (the protein protein interaction of which exists in the STRING database: http://string-db.org/) to get the predicted PPI of these DEGs. Then the PPI of these DEGs were visualized in Cytoscape [48].

Statistical Analyses
Data are expressed as the mean ± standard deviation (SD) of three replicates. Significant differences between the means of different parameters were calculated using Duncan's multiple-range test using SPSS 17.0 software (SPSS Inc., Chicago, IL). p < 0.05 was considered to indicate statistical significance.
Preprocessed GeoChip data were further analyzed with different statistical methods: (i) hierarchical clustering for peach-population phenotype and transcriptome; (ii) detrended correspondence analysis combined with analysis of similarities, nonparametric multivariate analysis of variance (Adonis) and multi-response permutation procedure for determining the overall functional changes in the peaches; (iii) significant Pearson's linear correlation (r) analysis, as well as analysis of variance; (iv) used DataViz visual data analysis software to further analyze the relationship between differential transcriptome and fruit shape.  Table S1. Fusion transcript of each sample in the GFF file table.
Additional files 4: Table S4. Gene enrichment in the selected samples.
Additional files 5: Table S5. GO annotation classification for the selected genes.