We have developed the first de novo reference transcriptome assembly of guava, performed gene annotations, compared different fruit development stages to understand molecular pathway in fruit ripening (Fig. 1 - methods & Fig. 2) and compared 3 different genotypes with variable coloration in pulp and fruit skin/peel to understand the fruit colour development pathway in guava. AS is the widely grown table purpose guava genotype of India and has green foliage. Figure 1A shows that the flower buds at all the growth stages of AS are green in color, with white coloured flower petals. The immature and mature fruits both have white pulp and green skin. During ripening fruit skin turns yellow within 3 days after harvesting and stays yellow thereafter. PP has darker green foliage (Fig. 1B) and flower buds compared to AS. Although pulp colour in PP is white in immature fruit but turns pink in mature fruit (Fig. 1B). AC has green foliage (Fig. 1C), green flower buds, white flowers and white pulp of immature and mature fruits but the skin of fruit changes its colour from green to crimson red (apple color) at maturation within 3–5 days in winter season (Fig. 1C). We have compared the RNA-Seq (methods) of Allahabad Safeda LSt, MFb and MFr and fruit growth stages ImF, 0DF, 3DF, 7DF to understand the landscape of molecular changes in fruit development of guava. To identify inducible genes resulting into apple color development in colored genotypes, we compared red vs green skin of AC and 0DF of AS vs PP.
RNA-Seq data generation, de novo transcriptome assembly and annotation
The pair end libraries from different tissue types of AS, AC and PP were sequenced and 137.3, 20.24 and 20 million raw reads of 100 bp each were generated, respectively (Table 1). The low quality sequences were filtered at quality score ≥ 30 and 120 million high quality reads of AS belonging to 13 libraries were used for generating de novo reference transcriptome using Trinity assembler [21, 22]. A total of 2,79,792 transcript belonging to 84,206 components/genes with N50 of 3603 bp were obtained (Table 2). Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis [23] with eudicots identified 93.7% (1987/2121) complete BUSCO genes, 4.5% (95) fragmented orthologs and 1.8% (39) orthologs were missing ( Figure S1). Blast search against the nr protein database identified homologs for 219,924 transcripts. Protein family search identified 140,061 protein family domains. Gene ontology assessment with Blast2GO assigned gene ontology terms to 116,629 transcripts (Table 2; Data S1), where biological process consists of 87,954 transcripts, cellular components consist of 82,820 and molecular function consist of 96,308 transcripts (Fig. 3, Figure S2).
Table 1
Description of RNA-Seq paired-end data through Illumina high-throughput sequencing
Genotype | Tissue type | Raw reads (Millions) | Total Reads (Millions) |
| | Replicate 1 | Replicate 2 | Replicate 3 | |
Allahabad Safeda | LSt | 11.92 | 13.93 | 11.57 | 137.3 |
| MFb | 9.05 | 8.47 | 9.60 |
| MFr | 8.95 | 10.88 | 8.86 |
| ImF | 12.0 | | |
| 0DF | 10.34 | | |
| 3DF | 10.23 | | |
| 7DF | 11.50 | | |
Apple Colour | Green Peel | 10.35 | | | 20.24 |
| Red Peel | 9.89 | | |
Punjab Pink | ImF | 9.66 | | | 20.0 |
| 0DF | 10.34 | | |
Table 2
Allahabad Safeda transcriptome assembly statistics
Transcriptome Assembly |
Contigs/Transcripts | 279,792 |
Components/Genes | 84,206 |
% GC content | 43.08 |
Contig N50 | 3,603 |
Assembly length (MB) | 647.4 |
Functional annotation |
Transcripts with homologs | 219,924 |
Match with predicted protein | 9,958 |
Match with hypothetical protein | 7,790 |
Protein Family annotation |
Transcripts with Pfam domains | 140,061 |
Gene Ontology Annotation |
Transcripts with assigned GO terms | 116,629 |
Biological Processes | 87,954 |
Cellular Component | 82,820 |
Molecular Function | 96,308 |
Differential Expression analysis of Leaf, Flower and Fruit of Allahabad Safeda
In AS, 2,777 transcripts representing 2,139 genes were found differentially expressed in mixed fruit vs mixed flower buds, mixed fruit vs leaf & shoot tip and mixed flower bud vs leaf & shoot tip (Data S2). Clustering analysis shows a high correlation among the replicated samples > 0.96 for LSt, > 0.93 for MFb and > 0.97 for MFr (Table S1; Fig. 4A).
We identified 2,125 differentially expressed transcripts (DETs) in MFr compared to LSt, with 971 being up and 1,154 down regulated. In MFr and MFb comparison, 1445 DETs were found, of which 719 were up-regulated and 726 down regulated. A least number of 660 DETs were identified between MFb and LSt, with 447 up and 213 down-regulated (Data S2). Only 33 transcripts among the DETs were common among the three tissues types (Fig. 4B). In order to identify genes involved in fruit development, top 20 up-regulated transcripts were selected from MFr comparison to LSt and/ or MFb and 7 genes were found common with > 10 Log2FC. Interestingly, putting together transcripts of these two comparisons none of the gene was found down regulated (Table S2).
The most up-regulated gene (comp27411_c1) represented by six transcripts Hydroxycinnamoyl CoA shikimate (quinate hydroxycinnamoyltransferase, HCT) belonging to BAHD family of acyl-CoA-dependent acyl-transferases controls lignin [24, 25] and cutin biosynthesis [26]. Cinnamyl alcohol dehydrogenase (CAD) important for lignin biosynthesis [27, 28], expansins involved in cell wall loosening [29], ABC transporter encoding ATP dependent channels [30], Palmitoyl transferase involved in fatty acid oxidation [31], 1-aminocyclopropane-1-carboxylate oxidase (ACO) an ethylene biosynthesis gene [32], Subtilisin-like protease with a role in plant-pathogen interactions [33], 9-cis-epoxycarotenoid dioxygenase (NCED) a major ABA biosynthesis gene [34, 35] and Rbcx, a Rubisco assembly chaperon [36] are the top protein families represented by up-regulated transcripts in guava fruit (Table S2).
Metabolic Pathway Analysis of Fruit Tissue in Comparison to Leaf and Flower
The metabolic and regulatory pathway analysis of fruit, the major sink in comparison to strongest source, the leaf was performed with MAPMAN software [37] (http://mapman.gabipd.org) with all DETs at FDR < 0.001. A total of 2125 transcripts were found significantly regulated in fruit compared to leaf (Fig. 5 and Data S2). General metabolism shows that transcripts involved in light reactions, C3 cycle, photosynthesis, tetrapyrole pathway (controlling chlorophyll biosynthesis), starch synthesis, amino acid biosynthesis except phenylalanine (input for secondary metabolism), lipid degradation, Raffinose biosynthesis, cell wall associated leucin rich repeat and arabinogalactan-proteins are down-regulated in fruit. Importantly sucrose biosynthesis, gluconeogenesis, conversion of starch to reducing sugars like glucose and fructose, wax biosynthesis, phenylalanine generation, glycolipid synthesis (for generating mono and di galactosyl diacylglycerol for food reserve storage in seeds), cellulose synthesis, trehalose biosynthesis, mitochondrial electron transport chain, cell wall degradation pectate lyases (PeLs) and PGs are up-regulated in fruit. Discreet furcation of these pathways in fruit tissue are in general concordance with its biological role of alluring birds for seed dispersal [16]. However, pectin esterases involved in plant cell wall modification and subsequent breakdown and long chain fatty acid biosynthesis genes catalyzing the cutin synthesis exhibit a mixed response (Fig. 5A). Regulation overview (Fig. 5B) with MAPMAN shows that most of the transcripts mapping to ABA, ethylene, cytokinin, gibberellin and salicylic acid signal transduction pathways were up-regulated whereas Jasmonate, Auxin, and Brassinosteroid were down-regulated with few transcripts showing up-regulation.
In ethylene pathway the ethylene biosynthesis genes, 1-aminocyclopropane-1-carboxylate synthase (ACS), ACC oxidase 3, Ethylene receptor 2 (ETR2) homolog of Arabidopsis, ethylene response factor ERF-1, basic helix-loop-helix (bHLH) TF and pyridoxine biosynthesis gene PDX1.2 were found up-regulated. ABA biosynthesis and signaling factors including NCED and ABA binding factor 4 (ABF4), B3 domain containing high-level expression of sugar-inducible gene 2 (HSI2), highly ABA-induced 1 (HAI1), hypostatin resistance 1 (HYR1), a UDP glycosyltransferase (UGT) and GRAM domain family protein were highly up-regulated, only ABA-responsive TB2/DP1 (HVA22 family protein) showed down-regulation. Auxin leucine-rich repeats (LRR), F-box TIR receptor, TCP family, ARF and AUX/IAA transcription factors were found down regulated indicating the auxin signaling down-regulation in guava fruit development. Interestingly, Brassinosteroid insensitive (BRI) encoding receptor kinase is up-regulated indicating overall up-regulation of BR signal transduction and responses.
We identified forty transcription factor families with multiple transcripts belonging to MYB, MADS, HB, WRKY, ARF, bHLH, AP/EREBP, bZIP, NAC, AUX/IAA, B3, Jumonji, Polycomb. These families showed both up and down regulation, indicating their importance in modulation of fruit development. Sucrose cytosolic invertase 2 (CINV2), responsible for conversion of sucrose to monosaccharides like fructose and glucose showed up-regulation and is in line with increase in sucrose catabolism in developing fruits.
Cellular response (Fig. 5C) depicts down regulation of transcripts belonging to biotic stress and is in line with fruits being more prone to pathogen and insect damage in comparison to leaves. Phytoene synthase (PSY) and lycopene beta cyclase (lcy-b) responsible for accumulation of α and β- carotene shows over-expression indicating up-regulation of carotenoid biosynthesis pathway. Ubiquitin and autophagy dependent degradation pathways (Fig. 5D) shows up-regulation of 44 transcripts, indicating increased protein turnover process. Near similar results were obtained in a comparison of fruit vs flower transcripts (Data S2).
Up-regulation Of Secondary Metabolites During Fruit Ripening
We compared RNA-Seq at different fruit maturity and ripening stages in AS. Comparison of mature fruit 0DF to ImF identified 220 differentially regulated transcripts, with 75 showing up-regulation and 145 showing down regulation (Data S3). However, at ripening 3DF vs 0DF 366 transcripts were differentially regulated with 232 up-regulated and 144 down-regulated (Data S4). Interestingly, during over-ripening 7DF vs 3DF only 11 transcripts showed differential regulation with only one down regulated (Data S5). The major up-regulated genes in 0DF vs ImF (Data S3; Figure S3) include Alpha-Expansin, cellulose synthase, phospho-enol-pyruvate carboxylase kinase, β-amylase, PSY, CAD and COMT family of lignin biosynthesis genes and other o-methyl transferases. However, flavonoid pathway genes other than lignin biosynthesis, pectin methyesterases, light reactions, calvin cycle and photorespiration were down-regulated.
In 3D vs 0D (Data S4, Figure S4 A) there is upregulation of transcripts for cellulose synthase, expansins, increased fatty acid synthesis and elongation, PSY, phenylalanine biosynthesis genes arogenate dehydratase, flavonoid biosynthesis related transcripts like UGT - Hypostatin Resistance 1 (HYR1), Flavonoid 3',5'-hydroxylase 2 (F3′5′H), Phenylalanine ammonia-lyase 3 (PAL 3) and 4-coumarate–CoA ligase 2 (4CL2). All the transcripts belonging to ABA, brassinosteroid, ethylene, cytokinin, and salicylic acid were up-regulated, while AUX and TCP transcripts related to auxin and gibberelic acid Insensitive (GAI) were down-regulated (Data S4, Figure S4 B). Transcripts corresponding to TF families of WRKY, AP2, bHLH, PHOR1 (ubiquitin ligase activity), MYB and C2C2.CO like were up-regulated. Ubiquitin and autophagy dependent protein turnover pathway were found up-regulated (Data S4, Figure S4 C).
Apple color in fruit skin is derived from up-regulation of secondary metabolism
Apple colour skin of AC develops in a short time period of ~ 3–5 days during winter season during fruit maturation. Comparison of FPKM value of transcript belonging to red vs green skin, identified only 52 DETs indicating very specific pathways involved in fruit colour development. Interestingly, all the transcripts of phenylpropanoid and lignin pathway showed up-regulation indicating the colour development in the skin of guava is result of over expression of phenypropanoid and lignin biosynthesis pathway (Figure S5; Data S6). Comparison of PP mature 0DF with AS 0DF identified 19 DETs with 9 up-regulated and 10 down-regulated (Data S7). However, only omega-hydroxypalmitate O-feruloyl transferase-like indicated a footprint of secondary metabolism pathway.
Fruit colour development in coloured guava genotypes is concomitant with ripening process
We hypothesized that pink to reddish fruit skin or pulp colour development may share the fruit ripening related genes as the colour development in apple colour skin or pulp is concomitant with ripening process. The cluster analysis of DETs in AS ImF vs 0DF, 3DF vs 0DF, 7DF vs 3DF, PP ImF vs 0DF, and AC_RP vs AC_GP was carried out. AS mature fruit, PP mature fruit, AC green peel and AC red peel cluster in single group of mature fruit stages. AS 3DF and AS 7DF made another group with mostly similar expressions and 11 DETs. Third cluster consisted of AS ImF and PP ImF, again a result of similar fruit stage (Figure S6; Data S8).
qRT-PCR for two well-known candidates for coloration in fruits and vegetables PAL and PSY 2 was carried out to see the reproducibility of differential FPKM expression values. Expression of PAL in AS fruit was maximum at 3DF green to yellow peel colour turning stage (~ 9.5X compared to ImF_AS; Fig. 6A). However, expression in 0DF_ PP (pink pulp) is ~ 3X higher compared to ImF_AS. Interestingly, expression in AC_RP was found the highest (~ 1.5 X compared to 3DFr_AS). These results indicated that the expression of PAL increases with ripening, but is also genotype dependent and might have contribution towards red coloration in peel of apple colour genotypes. However, PSY 2 expression (Fig. 6B) shows a general trend of increase in expression with maturity and ripening in all the three genotypes. These results also indicated that observations recorded in our comparative transcriptomic data are in line with qRT-PCR analysis.