RNA sequencing and de novo transcriptome assembly
RNA sequencing of leaf transcriptome from J.adhatoda generated 51,639,915 raw reads and the pre-processing of raw data was done to remove the adapter sequences, low-quality reads of 39,134,610 were generated, and a GC content of 41.69% was obtained.A total of 171,064 transcripts were assembled and the maximum and minimum unigene lengths were found to be 15630 bp and 201 bp, respectively with a N50 value is 2065 bp.
Assessment of gene completeness
Gene completeness of J.adhatoda revealed that the annotation of 30857 (28%) of full-length unigenes, 14071 (13.2%) quasi full-length unigenes, 22737 (21.3%) of partial unigenes, and 39221 (36%) unigenes did not match to any proteins in PLAZA 4.5 dicots plant database.
Functional annotation of unigenes
The assembled unigenes of J. adhatoda were annotated for sequence similarity search as well as for comparison using BLASTX against a non-redundant protein database at NCBI with a 1E-5. The results showed 105572 annotated unigenes and 40288 non-annotated unigenes, of which 4772 unigenes were predicted due to inadequate genomic information in public databases. The unigene similarity search reveals that the plant has the best similarity with Erythrantheguttata (55%), Coffeacanephora (6.3%), Utriculariagibba (3.4%), among others. Thus, the results indicated that J.adhatoda is more closely associated with Erythrantheguttata.
Functional classification of unigenes
GO classification was done using the BLAST2GO tool in order to classify the genesbased on gene annotation into three different categories: molecular function, cellular component, and biological process and 47 sub-categories. A total of 143277 genes were assigned to 20 classes in the biological function, 26732 genes in organic substance metabolic process, 26713 genes in primary metabolic process, 21100 genes in nitrogen compound metabolic process, 19714 genes in the cellular metabolic process, and 11476 genes in the biosynthetic process. In molecular function, 93438 genes were classified into 14 sub-categories, which include the heterocyclic compound of 21276 genes, organic cyclic compound binding with 21276 genes, 16105 genes with transferase activity, 13563 genes with small-molecule binding, and 12904 genes with hydrolase activity. Additionally, 95725 genes were assigned to cellular component GO term with 13 sub-categories where intracellular GO term hadthe highest number of genes (23055), the intracellular part 21946 genes, 18296 genes of intracellular organelle, and 16066 genes of the membrane-bound organelle.
Biological pathway analysis
The biochemical pathways of J.adhatoda were identified by mapping the unigenes to the KEGG pathway database with the help of the KAAS and BLAST2GO software with 5981 unigenes annotated to 144 biochemical pathways. A total of 6409 unigenes were assigned to the metabolic pathway, including nucleotide metabolism, carbohydrate metabolism, and amino acid metabolism having 2539, 347, and 477 unigenes, respectively (Fig.3). The KEGG pathway analysis of secondary metabolite biosynthesis was divided into 12 sub-categories, where the highest number of unigenes were found in sesquiterpenoid and triterpenoid biosynthesis (57unigenes), followed by Ubiquinone and another terpenoid-quinone biosynthesis (26 unigenes), terpenoid backbone biosynthesis (22 unigenes), and phenylpropanoid biosynthesis (11 unigenes) (Fig.4).
The major genes involved in the tryptophan biosynthesis pathway from the KEGG database are presented in Table 2, and the genes identified in this pathway from transcriptome data havebeen depicted in Table 4.
Identification of simple sequence repeats (SSRs)
A total of 25978 SSRs were identified from 106886 sequences, with 138976200 bp of sequences containing 4374 sequences of more than 1 SSR, and 1951 number of SSRs were present in compound formation. The number of SSR loci with 8530 di-nucleotide repeats, 6802 tri-nucleotide repeats, 665 tetra-nucleotide repeats, 96 pentanucleotide repeats, and 61 hexanucleotide repeats are represented in Table 2.SSRs with five tandem repeats (3984) were the most common in J. adhatoda,followed by six tandem repeats (3902), seven tandem repeats (2289), nine tandem repeats (2087), eight tandem repeats (1622), and ten tandem repeats (589). Among di-nucleotide repeats, AT/AT was found to be the highest with 2968 repeats, followed by AG/CT with 2294 repeats, and tri-nucleotide repeats AAG/CTT has the highest frequency of 1737 and AAT/ATT with 1669 repeats and other motifs distributed uniformly (Fig. 5)
The expression levels of de novo assembled unigenes of the J.adhatoda leaf transcriptome were calculated based on TPM values with the help ofthe SALMON tool. In the tryptophan biosynthesis pathway, the TPM values of key enzymes involved are6.0903,33.6854, 11.527, 1.6959, and 8.1662for anthranilate synthase alpha, anthranilate synthase beta, arogenate/prephenate dehydratase, chorismate synthase, and chorismate mutase, respectively (Table 3). The top 10 most abundant unigenes in the J.adhatoda leaf transcriptome are represented in Table 4.
Gene expression analysis of J.adhatoda leaf tissue
The qRTPCR analysis was done to analyse the expression pattern of the selected tryptophan biosynthesis gene and validate the transcriptome data. The genes selected are anthranilate synthase alpha (EC: 18.104.22.168), anthranilate synthase beta (EC: 22.214.171.124), arogenate/prephenate dehydratase (EC: 126.96.36.199), chorismate synthase (EC: 188.8.131.52) and chorismatemutase (184.108.40.206). Anthranilatesynthase alpha, anthranilatesynthase, chorismate synthase and chorismate mutase showed significant up-regulation in mature leaf when compared to young leaf and root. Arogenate/prephenatedehydratase was downregulated in mature leaf tissue. Actin, Elongation factor 1 and Glyceraldehyde-3-phosphate dehydrogenase were used as a housekeeping gene for the gene expression analysis in this study. Anthranilate synthase beta subunit and arogenate/prephenate dehydratase with low TPM values were chosen for qRT PCR analysis,as they were also involved in the tryptophan biosynthesis pathway from the transcriptome data.