Tissue-specific de novo transcriptome analysis in kiwifruit [Actinidia deliciosa (A Chev) Liang et Ferguson]

Background Kiwifruit [Actinidia deliciosa (A Liang et Ferguson] is a sub-tropical vine from the Actinidiaceae family native from China. This specie has an allohexaploid genome (from a diploid and autotetraploid parents) contained in 174 chromosomes producing a climacteric and fleshy fruit called kiwifruit. Currently there's no too much genomic and transcriptomic information about this species. In this low molecular knowledge context, the main goal of this work is to construct a tissue-specific de novo transcriptome assembly generating a differential expression analysis among these specific tissues to obtain new useful database for a better knowledge of vegetative, floral and fruit growth in different phenological states of Actinidia deliciosa cv. ‘Hayward’. Results In the present study we have analyzed different whole transcriptomes from shoot, leaf, flower bud, flower and fruit at 4 development stages (7,50,120 and 160 days after flowering; DAF) in kiwifruit by using RNA-seq. We sequenced twenty-four libraries, obtaining 604,735,364 reads which were assembled using Trinity software. The first version of Actinidia deliciosa de novo transcriptome contained 142,025 contigs (x̅=1,044bp, N50=1,133bp). CEGMA and BUSCO were used for assembly quality assessment, obtaining close to 90.0% (35.1% partial) and over 85.0% (18.3% partial) of the ultra-conserved genes for eukaryote and plants, respectively. Annotation was performed with BLASTx against TAIR10 protein database and we found an annotation proportion of 35.6% (50,508), leaving 64.4% (91,517) of the contigs assembly without annotation. Conclusions These results represent a reference transcriptome for allohexaploid kiwifruit generating a database of Actinidia deliciosa genes related to leaf, flower


Abstract Background Kiwifruit [Actinidia deliciosa (A Chev) Liang et Ferguson] is a sub-
tropical vine from the Actinidiaceae family native from China. This specie has an allohexaploid genome (from a diploid and autotetraploid parents) contained in 174 chromosomes producing a climacteric and fleshy fruit called kiwifruit. Currently there's no too much genomic and transcriptomic information about this species. In this low molecular knowledge context, the main goal of this work is to construct a tissue-specific de novo transcriptome assembly generating a differential expression analysis among these specific tissues to obtain new useful database for a better knowledge of vegetative, floral and fruit growth in different phenological states of Actinidia deliciosa cv. 'Hayward'.

Results
In the present study we have analyzed different whole transcriptomes from shoot, leaf, flower bud, flower and fruit at 4 development stages (7,50,120 and 160 days after flowering; DAF) in kiwifruit by using RNA-seq. We sequenced twenty-four libraries, obtaining 604,735,364 reads which were assembled using Trinity software.
Annotation was performed with BLASTx against TAIR10 protein database and we found an annotation proportion of 35.6% (50,508), leaving 64.4% (91,517) of the contigs assembly without annotation.
Conclusions These results represent a reference transcriptome for allohexaploid kiwifruit generating a database of Actinidia deliciosa genes related to leaf, flower and fruit development. Thus, the present study provides a high valuable information, identifying over 20,000 exclusive genes including all tissue comparisons, which are associated with the proteins involved in different biological processes and molecular functions. Transcriptome assembly and refining as well as the assembly metric assessment, has implied an enough quality to be a putative database of this specie and high number of ultra-conserved proteins were found.
With respect to transcriptome close to 65% of contigs did not match with any protein. Therefore, future functional annotation will be required in order to obtain a better knowledge of the tissue-specific development.

Background
Actinidia deliciosa or Actinidia chinensis deliciosa is an allohexaploid specie (2n = 6x) with an estimated genome size of 4.4 Gbp and 174 chromosomes [1,2,3]. It is a species originated in southwestern China and belongs to the genus Actinidia which covers at least 76 species [4]. It has an important economic role due to his edible fruit known as kiwifruit, being also a species of nutritional importance due to his high level of vitamin C [5]. The kiwifruit worldwide production in 2017 reached about 4.38 million tons, being China the most important producer, followed by Italy, New Zealand, Iran and Chile. However, most of the Chinese kiwifruit are locally consumed, with less than 0.2% of the total being exported (http://faostat.fao.org).
Actinidia genus is mainly composed of dioecious plants. The flowers are axial with five petals usually white colored and sepals that can be fused with the base or not.
The fruit is a berry and has black seeds embed within the flesh where colors and morphology vary depending on the cultivar. Actinidia deliciosa cv. 'Hayward' is one of the most widely distributed cultivars, bears fruits with green flesh and a fuzzy appearance. The characteristic flavor of the kiwifruit is one of the important aspects for consumer acceptance and commercial potential. The balance of sweetness and acidity is given by the sugars such as glucose, fructose, sucrose and myo-inositol in small quantities (relative to other trees) and the acids found in mature fruit such as quinic acid, citric acid and in less proportion, malic acid [6].
As far as we know there is scarce information about genomic and transcriptomic resources in this species mainly taking in account the ploidy level. At genomic level, the whole genome sequencing of heterozygous 'Hongyang' cultivar [7] and the 'Red5' genotype [8] has been reported. In addition, transcriptomic analysis is relatively recent in kiwi species with a great potentiality regarding its physiology, fruit ripening studies or resistance to diseases. These transcriptomic studies included the sequencing of male and female transcriptomes of A. arguta flower buds [9], the study of fruit development in the cultivar "Hongyang" based on long noncoding RNA expression and alternative splicing [10], the premature budbreak dormancy analysis [11], the genome-wide gene expression profiles in male and female plants using high-throughput RNA sequencing (RNA-Seq) [12], and the study of infection by Pseudomonas syringae pv. actinidiae (Psa) [13,14]. Finally, from a metabolomics point of view, due to the economic and nutritional importance of redfleshed kiwifruit, some studies have reported new findings on anthocyanin biosynthesis in A. chinensis [15,16] and A. arguta [17].
All the transcriptomic studies above mentioned were focused on specific areas that deal with the most important problems of kiwifruit and that suppose important agronomic and economic impact. In addition, these transcriptomic approaches gain an especial interest in referenced commercial varieties as 'Hayward' which is the most cultivated green fleshy kiwifruit variety in the world. Therefore, it would be necessary to complete his genomic and transcriptomic information with new gene database available for breeders, nutritionists and plant biologists from the transcriptomic analysis of the different tissues.
The aim of this work is to construct a de novo transcriptome in green fleshed kiwifruit 'Hayward' (Actinidia deliciosa) using RNA-Seq analysis from leaf, flower and fruit tissues in different development phases including a differential gene expression analysis among vegetative and reproductive tissues to generate a database of Actinidia deliciosa genes related to leaf, flower and fruit development.

Refined de novo transcriptome assembly
In order to obtain a multiple-tissue de novo transcriptome of genes from Actinidia deliciosa, RNA libraries were constructed and sequenced from shoots, leaves, flower buds, flowers and fruits at 7, 50, 120 and 160 DAFB with a total of 604,735,364 million pair-end reads (150bp x 2) (Supplementary Table S1). All raw reads in FASTQ format including paired-end and replicates are available from the NCBI Short Read Archive (SRA) database under bioproject number PRJNA564374 (https://www.ncbi.nlm.nih.gov/sra/PRJNA564374). These reads were processed and trimmed to remove low quality reads and then used as the input to assemble the transcriptome with Trinity software. The assembled and refined transcriptome contains 142,025 total contigs where 106,603 contigs are trinity "genes" unigenes.
Moreover, the average size for contigs was 1,044, reaching the largest one 36,186 bp while the shortest is just 501 bp long (Table 1). In general, the N50 of contigs were checked alongside the length distribution of contigs and percentage of input reads that maps back to the assembled transcriptome, but in order to conclude about the quality of an assembled transcriptome it is needed more analyses and assessments [18].
To assess the quality for our de novo transcriptome we performed a biological based analysis approach. The biological analysis needs the use of the ultra-conserved proteins encoded in the assembled transcriptome, searching them and reporting how many are completed, fragmented and missing. Therefore, we used CEGMA and BUSCO to evaluate the completeness of ultra-conserved protein coding genes for eukaryote and plant species. CEGMA found 140 (56.8%) complete, 87 (35.1%) fragmented and 20 (8.1%) missing genes from its 248 eukaryotic ultra-conserved proteins data set. However, BUSCO found 1,007 (69.9%) complete, 263 (18.3%) fragmented and 170 (11.8%) missing genes from the ultra-conserved plant database (1,440 total plant-specific proteins) (Table 1). Finally, an additional quality control of the samples and biological replicates was performed using Trinity auxiliary scripts ( Fig. 2) showing high similarities between biological replicates and variable differences among different tissues samples.
In order to perform a general comparison of the number of gene expression differences between tissues, a differential expression analysis was performed. When comparing all the replicates and tissues (using a normalized factor based on foldchange between each expression and the mean expression across samples) with filters of false discovery rate (FDR) ≥ 0.005 and FC ≥ 4 we found around 49,000 differential expressed genes ( Fig. 2; Supplementary Fig. S1). In addition, to obtain a better understanding about tissues comparation, differential gene expression was evaluated by edgeR [19], grouping different tissues as follow: shoot vs leaf, flower bud vs flower, fruit 7d vs fruit 50d, fruit 7d vs fruit 120d and fruit 7d vs fruit 160d

Gene annotation and enrichment analysis
To explore the main gene functions involved in the vegetative growth leaves, floral and fruit development from the biological processes and molecular function point of view we performed a Singular Enrichment Analysis (SEA) by Panther G0 slim [20] and AgriGo v2 [21], using the top hit specie Vitis vinifera as reference ( As for GO terms annotations by AgriGO v2 (Supplementary Table S4) only eight GO terms were found for shoot vs leaf according to p-value, thus we can highlight the response to stimulus (1.3e-04; GO:0050896) and nucleic acid binding (1.8e-05; GO:0003676) for biological processes and molecular function respectively.
According to floral and fruit development we found 119 and 56 GO terms respectively. A total of 69 GO terms were exclusives for flower bud vs flower while only six were exclusives for fruit7d vs 50d-120d-160d (Supplementary Table S4). discussion According to the assembled and refined transcriptome, a total of 142,025 contigs (x̅ = 1,044bp, N50 = 1,133bp) were obtained which 106,603 contigs correspond to trinity "genes" unigenes. Within assembled contigs the size ranged from 501 bp to 36,186 bp long. In other studies, as Actinidia eriantha [22], the assembly of filtered reads reached a total of 69,396 unigenes obtaining sizes between 201 and 9,602 bp, which are indicating a larger number of contigs and size in this study.
As for the completeness of ultra-conserved protein evaluation by CEGMA and BUSCO, the scores suggest that the assembled transcriptome contains an important number of ultra-conserved proteins complete or fragmented, and a minor proportion of them missing, meaning that the transcriptome reached a high-quality standard in terms of completeness. However, CEGMA and BUSCO complete scores over 95% have been reported for twelve plant genomes including the model plant Arabidopsis thaliana and the fruit tree Pyrus communis L. var 'Bartlett' [23].
Thus, this biological approach is indicating the highest quality score specially for BUSCO which reach almost 70% of complete genes found from the dataset. This 70% from BUSCO score seem to be lower against to quality scores reported above in different plant genomes but the construction of de novo transcriptome from three different plant tissues may be explain this difference. Anywise, the sum of the values of complete and fragmented genes is close to 90%, which is quite high considering the construction of a de novo transcriptome. In addition, despite the fact that we are talking about different species, in a specific tissue de novo transcriptome assembly of Ilex paraguariensis [24] obtained around 73% of complete genes reaching close to 85% if we include the fragmented genes being this quality score value similar to our transcriptome assembly.
At this moment, the high throughput sequencing (HTS) it is becoming more and more accessible and friendly, as this happens, more species are getting sequenced for multiple purposes including gene expression profiling, epigenomics, genomics, and transcriptomics approaches [25].
In spite of everything, there is still species with economic relevance that do not have this omics data resources yet. This is the reason why assembling a highquality transcriptome becomes an important task when comes to studying those species. However, assembling a high quality de novo transcriptome depends mainly in the quality, quantity and software parameters, and even if the transcriptome was assembled, it represents a challenge in terms of determining how good it is assembled.
As a result of annotation process against the TAIR10 protein database, only 35.6 % of contigs were annotated, therefore a 64.4% remained without score. This low match could be explained because there is a lack of information in protein databases, which implies a lower knowledge in the Actinidia genus at protein level. Other studies in kiwifruit Actinidia deliciosa var 'Jinkui' [26] obtained 140,187 unigenes of which 56,912 were functionally annotated while [9] [28]. As for proteins network related to response to abiotic stimulus (GO:0009628), it is related to flower development which may be is conditioned by abiotic stresses as it was been recently reported in flower buds of transgenic blueberry [29].
As for fruit development (fruit7d vs 50d-120d-160d), some protein IDs were associated to protein metabolic process (4.1e-3; GO:0019538), DNA polymerase activity (3.9e-10; GO:0034061), transferase activity (6.1e-06; GO:0016740) or catalytic activity (1.6e-03; GO:0003824). In other plant species as cucumber [30] some of the main proteins linked to fruit development are involved in the processes with a wide ploidy level variability. The measured metrics and scores of the transcriptome assembly proved to have enough quality to be a putative database of this species transcripts. In addition, the biological approach for the transcriptome assembly ensure that our transcriptome have a high quality containing and important number of ultra-conserved proteins. As for the transcriptome annotations, around 35% of contigs were annotated, however close to 65% of contigs did not match with any protein, which is indicating a large number of sequences not associated to functional annotations. Therefore, this work provides valuable information identifying 22,962 exclusive genes for all tissue comparison allowing the association of different protein ID with different biological process and molecular function. Finally, we have to highlight that more information would be necessary in order to deepen in the processes related to vegetative growth of leaves or the reproductive development of the flower and fruit. In addition, the information provided in this study regarding genes not associated with proteins will require the future functional annotation in new studies. methods Plant material and RNA sequencing Sample tissues of Actinidia deliciosa cv ´Hayward' were collected from Germán Greve Silva experimental station in Rinconada de Maipú (University of Chile). RNA was extracted using three biological replicates including shoots, leaves, flower buds, flowers and fruits at 7, 50, 120 and 160 after full bloom (DAFB) (Fig. 1)  Transcriptome assembly and refining Total reads were analyzed (pre and post trimming) and trimmed with FASTQC software and FLEXBAR software (https://github.com/seqan/flexbar) in order to filter low quality reads (phred score less than 25) and remove reads either short or with Ns on the sequence. Remaining reads after trimming were used as input in the transcriptome assembly using Trinity software (https://github.com/trinityrnaseq).
The refinement of the transcriptome was made using CD-HIT (http://weizhonglilab.org/cd-hit/) to remove duplicate contigs with a 0.9 identity setting and Corset

Differential expression analysis
Raw counts of each contig were obtained with RSEM (https://deweylab.github.io/RSEM/) using Bowtie2 aligner (http://bowtiebio.sourceforge.net/bowtie2/index.shtml) to align all the input reads to the de novo assembled transcriptome, then with all counts per tissue and replicate, it was merged into a raw counts matrix. From the raw counts matrix was calculated a TPM-TMM normalized matrix used for the study alongside the raw counts matrix.
Differentially expressed genes (DEGs) were used for further comparisons and downstream analyses. N10, N30 and N50: minimum contig length needed to cover 10, 30 and 50% of the genoma respectively.  Figure 1 Tissues assayed in kiwifruit cv 'Hayward' including shoots leaves unfolded, leaves completel Venny diagram for differential expression genes between different plant tissues: shoot vs lea Salazar et al. Figure S6.tif Salazar et al. Supplementary Table S2.xlsx Salazar et al. Figure S4.tif Salazar et al. Figure S3.tif Salazar et al. Figure S2.tif Salazar et al. Figure S7.tif Salazar et al. Figure S1.pdf