Transcriptomic Analysis From Ont Sequencing
Young fruits were sequenced because of the fruit shape, the texture properties, and the cutting content depend on events that occur early during fruit development [12]. To compare the characteristics based on the fruit type, the gene expression dynamics and the full-length transcripts of the young fruit of the two types of peaches were determined by ONT, and total Clean Data in the fastq format were further filtered to remove short reads and low-quality reads. After quality filtering, the BeadNum ranged from 5,687,141 to 7,035,534 with N50 ranging from 1263 to 1423; the mean q-score was Q9 in each library (Table 2). Through cDNA sequencing, reads can be identified as full-length sequences when primers have recognized both ends of the reads. The number of full-length reads in the six libraries was 5,195,328, 481,897, 4,795,588, 5,346,222, 4,312,262, and 4,847,747, respectively. The full-length percentages (FL%) were found to be 77.57%, 77.81%, 77.20%, 77.86%, 77.68%, and 76.97%, respectively (Table 3). The consensus isoform was obtained by polishing the full-length reads. Fusion transcript prediction was performed on obtained consistent transcripts, and the fusion transcripts from 7 to 12 were obtained for each sample. (S-Table 1).
Table 2
Clean data statistical table. After the original fastq data were further filtered for short reads and low-quality reads, the total Clean data were obtained. A-1; A-2; A-3: The Olecranon honey peach; B-1; B-2; B-3: Bud mutant variant of the peach.
Sample Name | Read Num | Base Num | N50 | Mean Length | Max Length | Mean Q-score |
A-1 | 6,855,766 | 8,140,421,280 | 1,347 | 1,187 | 10,170 | Q9 |
A-2 | 6,344,028 | 7,573,280,925 | 1,339 | 1,193 | 11,367 | Q9 |
A-3 | 6,366,073 | 8,018,115,098 | 1,448 | 1,259 | 10,014 | Q9 |
B-1 | 7,035,534 | 8,059,133,822 | 1,263 | 1,145 | 21,540 | Q9 |
B-2 | 5,687,141 | 7,102,833,899 | 1,423 | 1,248 | 13,080 | Q9 |
B-3 | 6,458,809 | 8,229,558,235 | 1,410 | 1,274 | 21,190 | Q9 |
Table 3
Statistics of the full-length sequence data. In cDNA sequencing, the primer identified at both ends of the reads was considered as a full-length sequence.
Sample ID | Number of clean reads (except rRNA) | Number of full-length reads | Full-length percentage (FL%) |
A-1 | 6,698,079 | 5,195,382 | 77.57% |
A-2 | 6,193,058 | 4,818,907 | 77.81% |
A-3 | 6,211,916 | 4,795,588 | 77.20% |
B-1 | 6,866,714 | 5,346,222 | 77.86% |
B-2 | 5,551,215 | 4,312,262 | 77.68% |
B-3 | 6,298,495 | 4,847,747 | 76.97% |
Using the minimap2 software (Li 2018), all consensus isoforms were aligned to the reference genome for de-redundancy analysis, after obtaining 58,596 consensus isoforms. Transcripts above 500 base pairs were screened from the de-redundant transcripts, and the MISA software was used for SSR analysis. A total of 21,745 SSRs were obtained, including 6372 mononucleotide, 10,271 di-nucleotide, 4718 tri-nucleotide, 235 tetra-nucleotide, 48 pentanucleotide, and 101 hexanucleotide repeats. The density distribution of the different SSR types was statistically analyzed (Fig. 1D), the perfect di-nucleotide (p2) was found to have the highest density, followed by the perfect mononucleotide (p1) and the perfect tri-nucleotide (p3), the perfect pentanucleotide (p5), and the hybrid SSR (p6) were found to have the lowest density. The de-redundant transcripts of all the samples were compared with the known annotations of the reference genome, revealing 2530 new gene loci and 37,364 new transcripts. Sequence analysis of the newly discovered transcripts revealed 24,205 complete open reading frame (ORF) sequences, 2603 transcription factors, and 1005 lncRNAs, and 34,665 new transcripts were functionally annotated by comparison with eight protein databases. In total, 8502 transcripts were annotated in the Cluster of Orthologous Groups of proteins (COG )database; 20467 were annotated in the Gene Ontology Consortium (GO) database; 15,074 were annotated in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database; 19,471 were annotated in the euKaryotic Ortholog Groups (KOG) database; 14,296 were annotated in protein family (Pfam); 20,851 were annotated in Swiss-Prot; 31,324 were annotated in eggnog; and 34,595 were annotated in NR. Ultimately, 34,665 transcripts were annotated in the eight databases (Table 4).
Table 4
Statistics on the number of new transcripts annotated. Functional annotation was performed on new transcripts obtained from an alternative splicing analysis
Annotated database | New isoform number |
COG | 8502 |
GO | 20,467 |
KEGG | 15,074 |
KOG | 19,471 |
Pfam | 14,296 |
Swiss-Prot | 20,851 |
eggNOG | 31,324 |
nr | 34,595 |
All | 34,665 |
Alternative Splicing
Alternative splicing (AS), the process of removing introns from pre-mRNA and the process of rearrangement of exons to produce several types of mature transcripts, occurs before protein synthesis [13]. In this study, 18,322 AS events were identified in the transcripts by AStalavista [14]. The main types of AS included exon skipping (ES), alternative 3′ splice site (A3SS), Mutually exclusive exon (MEE), alternative 5′ splice site (A5SS), and Intron retention (IR) (Fig. 1E). Among the five main types of alternatively spliced transcripts, IR was found to be predominant, accounting for 36.18% of the AS transcripts, followed by A3SS (32.58%), A5SS (15.97%), ES (14.16%), and MEE (1.11%) (S-Table 2). This finding differs from those of the previous studies in various plants, such as tea, maize, cotton, rice, moso bamboo, Arabidopsis, and Populus [15–18], which were IR-predominant, followed by ES, A3SS, A5SS, and MEE. AS events were significantly more frequent in the Olecranon honey peach (from 3581 to 3860, an average of 3731) than those in the bud peach from 2310 to 2446, an average of 2376) (S-Table 2). In the two groups of samples, the percentages of A5SS and ES in the Olecranon honey peach were significantly lower than those in the bud peach. However, IR showed the opposite results (Fig. 1C).
Differential Expression Analysis Of Gene And Transcripts
Gene and transcript expression are time and spatial-specific. Under different conditions, the expression levels of genes and transcripts significantly differ. There were 457 Differential expression genes (DEGs) in the two groups which the Olecranon honey peach and the bud peach, including 169 up-regulated genes and 288 down-regulated genes (Fig. 2A). Up-regulated genes showed higher expression levels in the bud peach than those in the Olecranon honey peach, whereas the down-regulated genes showed the opposite results. Hierarchical clustering analysis was performed for the screened DEGs, and genes with the same or similar expression patterns were clustered. The genes were divided into two subclasses (Fig. 2B). The 457 DEGs were annotated using the best BLAST hit from eight protein databases. A total of 390 genes were annotated in the eight databases, and 226 genes were annotated using the GO database. These 226 genes were annotated DEGs and divided into three main GO functional categories (biological process, cellular component, and molecular function) and 36 subcategories (Fig. 2C). In the cellular component (CC) classification, the major subcategories were “membrane,” “membrane part,” “cell part,” and “cell.” For the molecular function (MF) classification, “catalytic activity” and “binding” were the most dominant subgroups. “Metabolic process,” “cellular process,” and “single-organism process” were the major subcategories in biological process (BP) classification. The GO annotation classifications of all genes were divided into 50 subcategories. The major subcategories of all genes were consistent with those of the DEGs. Ninety-nine DEGs were annotated in the COG database, and “Carbohydrate transport and metabolism,” “Transcription,” and “Secondary metabolites biosynthesis, transport and catabolism” were the three significant subcategories in COG functional classification analysis of the DEGs (Fig. 2D). To better explain the function of these DEGs, the DEGs were searched in the KEGG pathway database. Among the genes, 118 DEGs were annotated in the KEGG database, and 55 KEGG pathways were identified. Most were metabolism-related pathways, such as the brassinosteroid biosynthesis (ko00905), starch and sucrose metabolism (ko00500), and cysteine and methionine metabolism (ko00270), and other secondary metabolic pathways of peach plants. Additionally, there were 10 DEGs involved in genetic information processing, and 2 DEGs involved in cellular processes, environmental information processing, organismal systems (Fig. 2E). Moreover, 164 DEGs were annotated in the KOG database; 385 DEGs were annotated in NR; 242 DEGs were annotated in Pfam; 256 DEGs were annotated in Swiss-Prot; and 325 DEGs were annotated in eggNOG.
Among the two groups of the Olecranon honey peach and the bud mutant variant of the peach, 1519 transcripts were differentially expressed, of which 552 were up-regulated, and 997 were down-regulated and were divided into four subclasses (Fig. 3A). When the FDR was found to be more than 2, the transcripts were considered to be different. The expression of some transcripts in the two groups showed a fold-change greater than 10 (Fig. 3B). A total of 1332 differentially expressed transcripts (DETs) were annotated, with 796 annotated by GO. These 796 DETs were assigned into three main GO functional categories (BP, CC, and MF) and 41 subcategories (Fig. 3C). The major subcategories were “membrane,” “cell part,” and “cell” in CC classification. As observed for the DEGs, for MF classification, “catalytic activity” and “binding” were the dominant major subgroups. “Metabolic process,” “cellular process,” and “single-organism process” were the major subcategories in BP classification. GO annotation classifications of all transcripts were divided into 52 subcategories. A total of 470 of the 1332 DETs were annotated by KEGG and matched 91 pathways (S-Table 3); most annotated pathways in the transcriptome were related to carbon metabolism and amino acid biosynthesis in a total of 17 DETs. Additionally, 349 DETs were annotated in the COG database; 611 DETs were annotated in the KOG database; 1320 DETs were annotated in NR; 721 DETs were annotated in Pfam; and 805 of DETs were annotated in Swiss-Prot. Further, 1125 DETs were annotated in eggNOG, and most DETS were related to the post-translational modification, protein turnover, chaperones classes, and unknown functions (Fig. 3D). Based on these results, the number of annotated DEGs was found to be significantly lower than the number of DETs.
Differential expression of the genes for the fruit shape in two types of peaches
Fruit shape is an important trait affecting the appearance of the peach [3], including the Olecranon honey peach. In this budding sport, we found that both the peach diameter and the part resembling an eagle's beak became smaller. Studies have shown that plant hormones affect plant growth and are closely related to fruit development, including the shape of the fruit. For example, auxin and cytokinin are important plant hormones that regulate fruit development [19]. The concentration of auxin in the center of the fruit is higher than that of the peel, and gibberellin is related to cell division [20] and is responsible for swelling in the tomato fruit development [21]. We identified 18 DEGs related to plant hormones (S-Table 4), which were divided into up-regulated genes (ONT.7681, gene 5670, gene 25091, ONT.9923) and down-regulated genes (ONT.16582, gene 2826, ONT.13765, gene 22504, ONT.16845, gene 6609, gene 2822, gene 2460, ONT.1953, gene2824, ONT.1950, gene 9229, gene 2966, gene 26004); these genes were divided into three subgroups (S-Table 5). According to the GO annotation, the genes were related to biological regulation, reproductive processes, development processes, and growth (S-Table 5). The SAUR family protein (SAUR) indicates gene 9229 regulation in the signal transduction of plant hormones (ko04075) according to the KEGG annotation. It may play some role in an auxin signal transduction pathway involving calcium and calmodulin [22]. Gene 26004 regulates indole-3-pyruvate monooxygenase (YUCCA) in the tryptophan metabolism pathway (ko00380). This gene can promote the production of production factors because it plays a role as a key auxin biosynthesis enzyme [23]. Gene 22504 codes 3-epi-6-deoxocathasterone 23-monooxygenase (CYP90C1/ROT3) in the brassinosteroid biosynthesis pathway (ko00905). CYP90C1/ROT3 indicates the redundant brassinosteroid C-23 hydroxylase [24]. Gene ONT.1950, ONT.1953, gene 2822, gene 2824, and gene 2826 regulate the PHYB activation-tagged suppressor named as CYP73A1/BAS1 in ko00905, based on the KEGG annotation. Gene ONT.1950, ONT.1953, gene 2822, gene 2824, and gene 2826 and were shown to be involved in the brassinosteroid biosynthesis pathway with gene 22504. CYP90C1 and CYP73A1 play a key role in brassinolide biosynthesis.
Differential expression of the transcripts for the fruit shape in two types of peaches
In the analysis of the signal transduction pathways for plant hormones, we found seven DETs by the KEGG annotation: ONT.5631.5, ONT.5631.9, rna13157, ONT.942.6, ONT.4473.2, rna4079, and rna4080. Both ONT.5631.5 and ONT.5631.9 were annotated as AUX1/LAX as well as auxin influx carriers (AUX1/ LAX family) in the ko04075 database. Rna13157 was annotated as a SAUR family protein that has been known to promote cell enlargement and plant growth (Fig. 4). ONT.942.6 was annotated as AHP, which is a histidine-containing phosphotransfer protein (Fig. 4). This protein is important in promoting cell division. ONT.4473.2 was annotated as EIN3 (Fig. 4) and was defined as an ethylene-insensitive protein 3. This protein facilitates fruit ripening and senescence. Rna4079 and rna4080 were annotated as TGA and defined as the transcription factor TGA by the KO annotation. TGA affects the resistance of plants to diseases [25]. In the analysis of the tryptophan metabolism pathway by the KEGG annotation, we identified 3 DETs: rna36192, ONT.10241.19, and ONT.10241.4. Rna36192 annotated as YUCCA, an indole-3- pyruvate monooxygenase (Fig. 4), and ONT.10241.19 and ONT.10241.4 were annotated as katE/CAT, belonging to the catalase family. In the brassinosteroid biosynthesis pathway (ko00905), there were 11 DETs which were divided into 2 classes: CYP734A1 or BAS1, including ONT.1922.5, ONT.1923.2, ONT.1923.3, ONT.1949.1, ONT.1950.1, ONT.1953.1, rna3897, rna3899, and rna3900 (Fig. 4) and CYP90C1/ROT3 including ONT.15392.3 and rna31428. The former is defined as PHYB activation-tagged suppressor 1, whereas the latter is a 3-epi-6-deoxocathasterone 23-monooxygenase (Fig. 4). Using the DataViz visual data analysis software to further analyze the relationship between differential transcriptome and fruit shape, we discovered that the fruit shape index was related to ONT.4473.2, ONT.5631.5, rna13157, rna4080, ONT.10241.19, ONT.10241.4, rna36192, ONT.15392.3, ONT.1923.2, ONT.1923.3, ONT.1949.1, ONT.1950.1, ONT.1953.1, rna3897, rna3899, and rna3900, but not to ONT.5631.9, ONT.942.6, rna4079, ONT.1922.5. Moreover, the transverse diameter was related to all transcriptomes, expect for ONT.5631.9. In contrast, the longitudinal diameter did not correlate with any features of the transcriptome (Fig. 5C).