Complete plastid genome of A. indica
The complete plastid genome of A. indica has a typical quadripartite structure, and it is 86,212 bp in length, with 22,301 bp of the LSC region, 529 bp of the SSC region, and 31,691 bp each of the IR regions (Fig. 1). AT content of this plastid genome was 65.64%. Based on the DOGMA and GeSeq annotation, the plastid genome of A. indica contains 54 putative intact genes and three pseudogenes. These intact genes contain 24 tRNA genes, 4 rRNA genes, 8 rpl genes, 12 rps genes and 6 other genes, namely, ycf1, ycf2, accD, matK, infA and clpP (Table 1). The three pseudogenes are ψatpA, ψatpI and ψndhB. ψatpA and ψatpI genes in the LSC region of A. indica plastome became pseudogenes because of being truncated at the 88nd condon and a premature stop condon at the 32nd condon, respectively. ψndhB gene in the IR region became a pseudogene due to an internal stop codon at the 53rd condon.
Table 1
Summary of genes in the Aeginetia indica plastome
Function | Genes |
Ribosomal proteins large subunit | rpl2, rpl14, rpl16, rpl20, rpl22, rpl23, rpl33, rpl36 |
Ribosomal proteins small subunit | rps2, rps3, rps4, rps7, rps8, rps11, rps12, rps14, rps15, rps16, rps18, rps19 |
Transfer RNA genes | trnH-GUG, trnQ-UUG, trnS-GCU, trnC-GCA, trnD-GUC, trnY-GUA, trnE-UUC, trnS-UGA, trnG-UCC, trnM-CAU, trnS-GGA, trnL-UAA, trnA-UGC, trnF-GAA,trnW-CCA, trnL-UAG, trnN-GUU, trnL-CAA, trnfM-CAU, trnI-CAU, trnV-GAC, trnI-GAU, trnT-GGU, trnP-UGG |
Ribosomal RNA genes | rrn4.5, rrn5, rrn16, rrn23 |
Other protein-coding genes | ycf1, ycf2, accD, clpP, matK, infA |
Pseudogenes | ψndhB, ψatpA, ψatpI |
Supplementary information |
Additional file 1: Figure S1. Maximum likelihood tree of seven species in Orabanchaceae based on sequences of 20 plastid genes shared among them. Numbers in the nodes are bootstrap values. Scale in substitutions per site. |
Additional file 2: Figure S2. The expression of genes in the photosynthesis pathway observed in the Aeginetia indica transcriptome. Genes with detected expression were in the red boxes. With courtesy of © www.genome.jp/kegg/kegg1.html. |
Additional file 3: Figure S3. The expression of genes in the porphyrin and chlorophyll metabolism pathway detected in the Aeginetia indica transcriptome. Genes with detected expression were in the red boxes. With courtesy of © www.genome.jp/kegg/kegg1.html. |
Additional file 4: Table S1. Relaxation of purifying selection in parasitic plants of Orobanchaceae based on branch model analysis of 20 protein coding genes shared by seven species of Orobanchaceae. The likelihood ratio test was used to compare the three models (M0: one ratio model; M2: two ratio model; M3: three ratio model). P-values are in bold when they are less than 0.05. |
Additional file 5: Table S2. Expression level of unigenes of Aeginetia indica in the photosynthesis pathway based on transcriptome analysis. |
The SSC region in plastome of A. indica shows a severe reduction in size and only two genes, rpl15 and trnL-UAG, were found in this region (Fig. 1). The two IR regions undergone expansions which towards both the LSC and SSC regions. In L. philippensis and other autotropic plants, an intact ycf1 gene usually spans the IR and SSC regions, and rps8, rpl14, rpl16, rps3, rpl22 and rps19 genes were in the LSC region. Whereas, in A. indica, there is an intact ycf1 gene in each of the IR regions, and rps8, rpl14, rpl16, rps3, rpl22 and rps19 genes were all shifted into the IR regions.
Gene loss in the A. indica plastid genome
Compared with L. philippensis, there is substantial loss of genes in the A. indica plastid genome. Ten ndh (ndhA, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ and ndhK) genes were lost, and ndhB gene became a pseudogene, they encode subunits of NADH-dehydrogenase complex. All five psa (psaA, psab, psaC, psaI and psaJ) and 15 psb (psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT and psbZ) genes involved in photosystem I and photosystem II, were lost. Also, all six pet (petA, petB, petE, petF, petH and petI) genes, which encode cytochrome b6/f complex subunits with function in photosynthethic electron transport, were missing. In addition, four atp (atpB, atpE, atpF and atpH) genes enoding F-type ATPase subunits, four genes encoding DNA dependent RNA polymerase (rpoA, rpoB, rpoC1 and rpoC2), and genes encoding envelop membrane protein (cemA), large subunit of Rubisco (rbcL), haem attachment factor (ccsA), and photosystem assembly factors (ycf3 and ycf4) were lost as well.
Plastid genome rearrangements in A. indica
With Mauve 2.4.0, sequence alignment for the plastomes of A. indica and L. philippensis was shown in Fig. 2. We identified four locally co-linear blocks (LCBs) for the two species, and A. indica plastid genome has undergone two major inversions relative to L. philippensis. One is a 1,452 bp inversion which contains an intact accD gene and occurred in the LSC region, the other is a large inversion of 60,255 bp in length and it contains an intact infA gene at the boundary of the LSC and IRB regions, complete SSC and IRB region, and most of the IRA region.
Relaxed purifying selection of A. indica plastid genes
A total of 20 protein coding genes shared among the seven species in Orobanchaceae, including 10 rps genes, 7 rpl genes, and accD, infA and matK genes were used for phylogenetic analysis. The maximum likelihood tree was strongly supported, with bootstrap values of all branches being 100 (Figure S1). Three Striga species were clustered into one clade, and Buchnera americana was sister to them. Aeginetia indica was sister to the clade consisting of the former four species.
Non-synonymous (dN)/synonymous (dS) substitution rate ratio (ω) can be considered as an indicator for selection pressure. Two-ratio model (M2) was first compared with one-ratio model (M0). ω values of all genes but rpl20 and rps18 in the parasitic plant branch were larger than those of the nonparasitic plant branch (Table S1), and the likelihood ratio test showed that M2 is significantly better than M0 at nine genes, i.e. accD, infA, rpl22, rps11, rps14, rps19, rps2, rps3 and rps7, suggesting that these genes were under relaxed purifying selection in parasitic plants. Using three-ratio branch model (M3), we found that hemiparasitic species had higher or much higher ω than holoparasitic species at 13 of 18 genes (ω values of the remaining two genes are not available), while holoparasitic species had slightly higher ω than hemiparasitic species at only five genes (Table S1). This suggests that protein-coding genes retained in the plastome of A. indica still play important functional roles rather than experiencing more relaxed selective pressure than hemiparasitic species.
Transcriptome analysis for A. indica
We obtained 21.05, 19.04, 18.34 and 18.02 Gb clean reads for four tissues, i.e. flower, sepal, fruit, and stem, respectively. By de novo assembly of read data from the four tissues, we obtained a total of 205,380 transcripts, among which 153,986 were extracted as unigenes. The average length and N50 of these unigenes were 623.18 and 880 bp, respectively. There were 47,480 ORFs (Open Reading Frames) predicted from all unigenes by TransDecoder, and 42,007 of them could be annotated in Swissprot database, among 42,007 Swissprot annotations, 8,466 could be assigned to 131 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways.
The photosynthesis pathway (ko00195) from the KEGG pathway database contains 63 genes (30 plastid genes and 33 nuclear genes). In the A. indica plastome, genes involved in photosystem I and II, cytochrome b6f complex, and photosynthetic electron transport are completely lost. The only two F-type ATPase related genes (atpA and atpI) in its plastome are pseudogenes. Based on the transcriptome analysis, only 14 unigenes in the photosynthesis pathway had expression (Table S2). The 14 genes included one gene encoding PSII 6.1 kDa protein, seven involving in photosynthetic electron transport and six being components of F-type ATPase (Figure S2). Expression of other genes in this pathway was not detected, indicating that these genes were either lost or non-expressional. The results from plastome and transcriptome analyses indicated that the photosynthesis pathway in A. indica was completely lost.
The porphyrin and chlorophyll metabolism pathway (ko00860) is complicated in plants. Porphyrins are intermediates of heme and chlorophyll, and heme is required for chlorophyll biosynthesis [20]. In the pathway from alanine to protoporphyrin IX, expression of genes encoding the intermediate products including HemA, HemB, HemC, HemD, HemE, HemF, HemL and HemY, were observed in the transcriptome of A. indica (Figure S3). However, because of the absence of expression of divinyl chlorophyllide a 8-vinyl-reductase [EC:1.3.1.75], the chlorophyll synthesis pathway appears to end at divinyl-proto-chlorophyllide production in A. indica (Figure S3). Obviously, the chlorophyll synthesis pathway is not complete at the later stage and chlorophyll can not be synthesized in A. indica.