Chieh-qua development and sampling.
Throughout the growth season of chieh-qua, were sampled at three stages, 7 days post pollination (A), 3 days post pollination(B), before pollination(C) (Fig. 1a). Chieh-qua development is monitored by the measurement of chieh-qua fruit weight and soluble solids content. After pollination, the weight of the fruit increases slowly in the first three days, but it increases greatly in 3–7 days (Fig. 1b). However, the content of soluble solids decreases with the increase of fruit (Fig. 1c). Given our interest in the transcriptional changes that may be involved in regulating fruit early enlargement, we select unpollinated fruits and fruits developed for 3- and 7-days post pollination for RNA-seq.
Sequencing and de novo transcriptome assembly.
To obtain the chieh-qua fruits development transcriptome expression profile, one non-normalized library is constructed using fruits at different developmental stages from 0 dpp to 7 dpp fruit tissue. Illumina sequencing data from chieh-qua fruits is deposited in the NCBI SRA database under accession number PRJNA970527. In total, 536977732 Illumina PE raw reads are generated (Table 1). After removing adaptor sequences, ambiguous nucleotides and low-quality sequences, there are 533.845180 million clean reads remaining. Assembly of clean reads result in 104747 unigenes in the range of 201–14,209 bp with a N50 length of 2119 bp (Table 2).
Table 1
Summary of sequences analysis.
Sample | Raw reads | Raw bases | Clean reads | Clean bases | Error rate(%) | Q20(%) | Q30(%) | GC content(%) |
A1 | 57582732 | 8.69G | 57255182 | 8.56G | 0.0232 | 98.86 | 95.95 | 44.32 |
A2 | 63237646 | 9.55G | 62845702 | 9.39G | 0.0235 | 98.73 | 95.56 | 44.54 |
A3 | 57047340 | 8.61G | 56679826 | 8.47G | 0.0231 | 98.86 | 95.97 | 44.6 |
B1 | 58379262 | 8.82G | 58016168 | 8.68G | 0.0237 | 98.66 | 95.37 | 44.52 |
B2 | 58940310 | 8.9G | 58543660 | 8.73G | 0.0235 | 98.72 | 95.61 | 44.57 |
B3 | 61404706 | 9.27G | 61052804 | 9.12G | 0.0235 | 98.71 | 95.52 | 44.1 |
C1 | 63277140 | 9.55G | 62964486 | 9.42G | 0.0232 | 98.84 | 95.88 | 44.6 |
C2 | 59538590 | 8.99G | 59241530 | 8.86G | 0.0232 | 98.85 | 95.91 | 44.72 |
C3 | 57570006 | 8.69G | 57245822 | 8.55G | 0.0233 | 98.82 | 95.84 | 44.71 |
Note: A1, A2, A3: 7dpp fruit tissue. B1, B2, B3: 3dpp fruit tissue. C1, C2, C3: 0dpp fruit tissue. Q20: The percentage of bases with a Phred value > 20. Q30: The percentage of bases with a Phred value > 30. |
Sequence annotation.
By comparing with 6 public databases for similarity searching, 161282 transcripts and 104047 unigenes are obtained. Analyses show that 44,845 unigenes (41.79%) have significant matches in the Swiss-Prot database, 114,83 (10.96%) in the COG database. Our results also show that 43,773 (41.79%) of non-redundant unigenes demonstrate similarity to the known genes in NR database. In total, there are 56,807 unigenes (54.23%) successfully annotated in at least one of the NR, Swiss-Prot, Pfam, COG, GO and KEGG databases (Table 1).
In addition, all unigenes are subjected to a search against the COG database for functional prediction and classification. 11,483 non-redundant unigenes (Table 2) are subdivided into 24 COG classifications (Fig. 2). Among them, the cluster of ‘Translation, ribosomal structure and biogenesis’ (946) is the largest group, followed by ‘Posttranslational modification, protein turnover, chaperones’ (715), ‘General function prediction only’ (556), ‘Energy production and conversion’ (371) and ‘Signal transduction mechanisms’ (353). Only a few unigenes are assigned to ‘Cell motility’ (12) and ‘Nuclear structure’ (5).
Table 2
BLAST analysis of transcripts and unigenes against public databases.
| Transcript number(percent) | Unigene number(percent) |
NR | 90618(0.5619) | 43773(0.4179) | |
Swiss-Prot | 82407(0.5109) | 44845(0.4281) | |
Pfam | 67718(0.4199) | 34580(0.3301) | |
COG | 24772(0.1536) | 11483(0.1096) | |
GO | 51119(0.317) | 21944(0.2095) | |
KEGG | 53955(0.3345) | 31076(0.2967) | |
Total_anno | 104528(0.6481) | 56807(0.5423) | |
Total | 161282(1) | 104747(1) | |
According to the international standardized gene functional classification system GO database, 21,944 non-redundant unigenes are classified into three major functional ontologies (biological process, cellular component and molecular function) (Fig. 3). For biological process (BP), dominant subcategories are ‘metabolic process’ (11,676), ‘cellular process’ (10,866) and ‘single-organism process’ (5,599). In the category of cellular component (CC), ‘cell’ (7,756), ‘cell part’ (7,645) and ‘membrane’ (6,345) are highly represented. Among molecular function (MF) terms, ‘binding’ (11,234) and ‘catalytic activity’ (9,934) are most represented. However, within each of the three categories, few genes are assigned to subcategories of ‘rhythmic process’, ‘nucleoid’ and ‘metallochaperone activity’.
KEGG is a large-scale knowledge base for systematic analysis of gene function, linkage of genomic information and functional information. According to KEGG, 3,1076 unigenes (Table 2) are assigned to 6 major metabolic pathways (Fig. 4). The pathways involving the largest number of unique transcripts are ‘translation’ (2712), followed by ‘folding, sorting and degradation’ (1992), whereas ‘drug resistance: antimicrobial’ (2) is the smallest group.
Differential expression analysis of assembled early development of chieh-qua fruits.
To better survey the biological mechanism of early development of chieh-qua fruits after pollination, it is important to identify the DE genes between 3 different developing stages. In order to improve the accuracy of the measured expression level of further analysis, the data from three biological replicates are merged, and the FPKM (Fragments Per Kilobase per Million) value is calculated based on the merged data set. It’s shown that three biological repetitions of sample A and sample C are clustered together, while two biological repetitions of sample B are clustered together in the PCA analysis (Fig. 5). Based on the quantitative results of expression level, the genes with different expression between two groups were analyzed, and the difference analysis software was DESeq2, and the screening threshold was |log2FC| >=1 & p adjust < 0.05.
Table 3
Statistics of differential gene number
different_group | Total DEGs | up | down |
C_vs_B | 12982 | 6035 | 6947 |
C_vs_A | 6541 | 2479 | 4062 |
B_vs_A | 14314 | 6873 | 7441 |
Subsequently, the DEGs of three different fruits in early development stage are analyzed (Table 3, Fig. 6). Compared with the 0 dpp (C), there are differences in the expression of 12982 genes in the fruit tissue 3 dpp(B), among which 6035 genes are up-regulated and 6947 genes are down-regulated (Fig. 6a). Compared with 0 dpp (C), there are 6541 differentially expressed genes, of which 2479 genes are up-regulated and 4062 genes are down-regulated (Fig. 6b). Compared with 3 dpp (B), there are 14314 differentially expressed genes in the fruit of 7dpp (A), of which 6873 genes are up-regulated and 7441 genes are down-regulated (Fig. 6c).
Analysis of transcription factors
TF is a kind of protein which can combine with specific DNA sequence. It can recognize and bind to the cis-acting elements in the upstream regulatory region of genes through specific functional domain, which can activate or hinder the expression of genes. Transcription factor analysis is carried out on the assembled gene/transcript to infer whether the gene/transcript is a transcription factor and its transcription factor family. According to the analysis of transcription factors, the top 20 families of transcription factors are MYB_superfamily, C2HA, C2C2, etc (Fig. 7a). Among them, the number of nuigenes in MYB_ superfamily is the largest, with 213 nuigenes. During the early development of chieh-qua fruit, there are 94 unigenes of MYB_superfamily differentially expressing in three stages, among which there were 34 unigenes differentially expressed between B and C, 56 unigenes differentially expressed between C and A, and 45 genes differentially expressed between B and A, and the number of genes in all three groups of comparisons is four (Fig. 7b). The expression analysis of these differential myb_superfamily unigenes is described in detail in the supplementary materials. RT-qPCR is used to verify this expression. Among the eight selected unigenes of MYB_superfamily, the expression patterns of all unigenes in the early three stages of chieh-qua fruit are consistent with that of transcriptome sequencing (Fig. 7c,7d). Myb59 and myb18 exhibited higher expression during the fruit pollinated for one week(Fig. 7c,7d). While, myb4, myb-GT3b, myb108 and myb306 showed the highest expression levels in the fruits that have been pollinated for three days(Fig. 7c,7d). Additionally, myb340 and myb-Bhlh13 have showed higher expressions at the stage of unpollinated(Fig. 7c,7d).