General features of Melia dubia cp
The complete cp sequence of Melia dubia retrieved from WGS data (GenBank accession: SRR5576232) through de novo assembly using GetOrganelle toolkit was identified and submitted to third party annotation section of the GenBank databases under the accession number TPA: BK059531.
The cp genome of M.dubia is a small, double-stranded circular structure with a genome size of 171,956 bp. It exhibits a typical angiosperm quadripartite structure with a pair of inverted repeats (IRA and IRB) of the same size of 38,604 bp; accounting for 45% of the genome separated by a large (LSC) and a small (SSC) single copy regions of 76,055 bp and 18693 bp with coverage of 44% and 11% respectively. The frequencies of all the four bases were 53,837, 54,192, 32,596 and 31,331 (Supplementary Figure 1). G+C content of all land plants cp genome plays a significant role in its own structural evolution. The universal GC content of M.dubia cp was found to be 37.18%, with uneven distribution along the single copy and inverted repeat regions. GC content was low at SSC region (31.19%), moderate at LSC region (35.23%) and high at IRs (40.55%) (Table 1).
Table 1 Summary of the Md cp genome characteristics
Characteristics
|
Length in bp
|
Genome coverage (%)
|
GC content(%)
|
Whole cp genome
|
171,956
|
100
|
37.18
|
LSC Region
|
76055
|
44
|
35.23
|
SSC Region
|
18693
|
11
|
31.19
|
IR Regions
|
38604
|
45
|
40.55
|
The GC content accounted for 40.55% in the IR regions, while in the LSC and SSC regions, GC contents accounted for 35.23 and 31.19 %, respectively. In contrast with the LSC and SSC regions, IR regions had higher percents of GC content (Table 2).
Table 2. Nucleotide composition and AT, GC content (%) in the four regions of the M.dubia cp genome
Region
|
A(%)
|
T(U)(%)
|
C(%)
|
G(%)
|
C + G (%)
|
A + T (%)
|
LSC
|
45.41
|
45.79
|
42.61
|
41.19
|
35.23
|
64.77
|
SSC
|
11.96
|
11.85
|
9.37
|
8.86
|
31.19
|
68.81
|
IRA
|
22.07
|
20.43
|
24.75
|
24.21
|
40.55
|
59.45
|
IRB
|
20.56
|
21.93
|
23.27
|
25.75
|
40.55
|
59.45
|
Total
|
31.31
|
31.52
|
18.96
|
18.22
|
37.18
|
62.82
|
Chloroplast genome annotation and contents
A total of 146 genes were annotated using a web-based programme called CPGAVAS2- an integrated Plastome Annotator and Analyzer and validated using BLAST X. M.dubia cp encodes 146 genes - 112 are unique; they include 79 CDS, 29 tRNAs and 4 rRNAs. 34 genes were in duplicate condition, thereby giving a total of 146 genes altogether that code for 101 CDS, 37 tRNAs and 8 rRNAs. For the 34 genes in duplicates, 22 were CDS, 8 were tRNAs and 4 were rRNAs (Figure 2).
The protein coding genes of this cp was mainly classified into 3 categories
- Genes for photosynthesis that contain 52 genes which include subunits of ATP synthase genes, photosystem I genes, photosystem II genes, NADH-dehydrogenase genes, cytochrome b/f complex genes, RUBISCO gene(rbcL).
- Genes for self-replication that enclose 38 genes including large subunit of ribosome, DNA dependent RNA polymerase,small subunit of ribosome.
- Other genes like subunit of acetyl-CoA-carboxylase (accD), c-type cytochrome synthesis gene (ccsA), Envelope membrane protein gene(cemA), protease gene(clpP) and maturase gene(matK) along with ycf genes having unknown conserved open reading frames (Table 3).
Table 3. List of protein-coding genes present in M.dubia cp genome
Category of genes
|
Group of genes
|
Name of genes
|
Genes for photosynthesis
|
Subunits of ATP synthase
|
atpA, atpB, atpE, atpF, atpH, atpI
|
Subunits of photosystem II
|
psbA, psbB*, psbC, psbD, psbE, psbF,psbH*, psbI, psbJ, psbK, psbL, psbM, psbN*, psbT*, psbZ, ycf3
|
Subunits of NADH-dehydrogenase
|
ndhA, ndhB*, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
|
Subunits of cytochrome b/f complex
|
petA, petB*, petD*, petG, petL, petN
|
Subunits of photosystem I
|
psaA, psaB, psaC, psaI, psaJ
|
Subunit of rubisco
|
rbcL
|
Self replication
|
Large subunit of ribosome
|
rpl14*, rpl16*, rpl2*, rpl20, rpl22*, rpl23*, rpl32, rpl33, rpl36*
|
DNA dependent RNA polymerase
|
rpoA*, rpoB, rpoC1, rpoC2
|
Small subunit of ribosome
|
rps11*, rps12*#, rps14, rps15, rps16, rps18, rps19*, rps2, rps3*, rps4, rps7*, rps8*
|
Other genes
|
Subunit of Acetyl-CoA-carboxylase
|
accD
|
c-type cytochrom synthesis gene
|
ccsA
|
Envelop membrane protein
|
cemA
|
Protease
|
clpP
|
Maturase
|
matK
|
Unknown
|
Conserved open reading frames
|
ycf1, ycf15*, ycf2*, ycf4
|
* Genes found in duplicates; #trans splicing gene
16 genes were in duplicates from CDS genes and found in inverted repeat regions. As with other land plants rps 12 in Melia is also a trans-splicing gene with its 5′ end located in LSC and two replicative 3′ ends at IRs. Fifty-one coding genes were in single copy and the remaining 16 were split genes. Twenty four genes in Melia Cp harbours introns of which 2 genes (ycf3, clpP) harbour a second intron. The intron size ranged from 2544 bp for trnK-UUU to 527 bp for trnL-UAA (Supplementary table 2). Among the 37 tRNAs, 6 were duplicate genes, 17 genes in single copy and 8 were split genes. Moreover, there were 8 ribosomal RNA genes in duplicate conditions located at IR regions.
Codon Usage Patterns
Melia cp uses 64 different codons that code for 21 different types of amino acids. The number of codons ranged from 84 to 2257. Methionine and tryptophan amino acids possess only one codon while the remaining amino acids hold 2-6 codons (Figure 3).
Analysis of repeat sequences in Melia dubia chloroplast
Identifying the diversity in species using molecular markers and its development can facilitate analysis of population genetics, species identification and polymorphism studies in M.dubia . Using MISA, we detected 81 SSR loci in M.dubia cp genome, which includes 97.5% mononucleotides and 2.5% dinucleotides (Supplementary Figure 2). The predominant SSR motifs were T and A with an average frequency of 51 and 44 %. This abundance contributes to the bias in AT rich sequences than GC in cp genome. Dinucleotides were completely composed of either AT or TA. No other nucleotides were present in M.dubia .
Repeat Structure and inversions
Repeat motifs play crucial role in genome rearrangement and phylogenetic analysis. The REPuter results for the cp genome of length 171,956 bp with maximum distance 3 and minimum repeat size 30 bp revealed 3 categories: forward (23), reverse (1), palindromic (26) repeats. The TRF tool identified sixty-five long tandem repeats with repeat unit size > 6 bp (Supplementary Table 3 and 4).
Phylogenetic relationship analysis
The complete cp genome of 10 Meliaceae members and Melianthus villosus outgroup species belonging to Geraniales order were downloaded from NCBI database and aligned along with the constructed cp genome of M.dubia . Phylogeny using cp genome was resolved and determined with 100% bootstrap in almost all nodes (Figure 3). Cp genome of Melia dubia and M. azedarach occurred as sister clades in the phylogenetic tree with a common internal node showing that they share a common ancestor and have a closer relationship than A.odarata, A.indica, A.polystachya, K.madagascariensis, S.mahagoni, C.odorata, C. guianensis, T.ciliata and a lower relationship with Melianthus villosus outgroup species.
Comparative genomics analysis
Comparative analysis using cp genome of M.dubia , M.azedarach and A.indica was performed using mVISTA. The cp genomes had a relatively high degree of conservation in exon regions than the CNS and introns. The sequence similarity was 72%. Trans splicing gene rps 12 was found to be conserved in 3 plastomes. trnG-UCC gene at LSC region was found to be more divergent in M.dubia and M.azedarach whereas shows complete conservation within M.dubia and A.indica. rpoC2-rpoC1, petL-petG, psbN-psbH, rpl23-trnI-CAU, psbH-petB were the intergenic regions which are highly conserved among 3 cp genomes (Figure 4) .