Overall organization and features of the four plastomes
The plastomes size of these four taxa ranged from 133,146 bp (S. monacanthus) to 134,450 bp (S. validus). They were typical quadripartite structure, consisting of a large single-copy region, (LSC, 68,076 − 68,877 bp), a small single-copy region, (SSC, 21,716 − 22,023 bp), and a pair of inverted repeat region (IRs, 21,674 − 21,775 bp). Figure 1 showed the plastid genome map. In addition to the differences in length, the GC content of these conserved plastomes also showed slight changes. According to the analysis, the GC content in the four plastomes ranged from 36.29% to 36.43 %, and the GC content in SSC region (39.39% − 39.69 %) was significantly higher than that in LSC region (36.22% − 36.36 %) and IR region (34.83% − 34.98 %) (Table 1).
Table 1
Plastome features of the four Selenicereus species.
Species
|
S. monacanthus
|
S. anthonyanus
|
S. grandiflorus
|
S. validus
|
Accession number
|
MW553055
|
MW553068
|
MW553069
|
MW553070
|
Length (bp)
|
Total
|
133,146
|
133,317
|
134,211
|
134,450
|
LSC
|
68,076
|
68,203
|
68,839
|
68,877
|
SSC
|
21,716
|
21,766
|
22,014
|
22,023
|
IR
|
21,677
|
21,674
|
21,679
|
21,775
|
GC content (%)
|
Total
|
36.40
|
36.43
|
36.34
|
36.29
|
LSC
|
36.25
|
36.36
|
36.24
|
36.22
|
SSC
|
39.69
|
39.54
|
39.40
|
39.39
|
IR
|
34.98
|
34.98
|
34.95
|
34.83
|
Gene numbers
|
Total
|
104
|
104
|
104
|
104
|
tRNA
|
30
|
30
|
30
|
30
|
rRNA
|
4
|
4
|
4
|
4
|
Protein-coding
|
70
|
70
|
70
|
70
|
Similar to previous reports in cacti plastomes, the 11 ndh genes in the analyzed plastomes were partially lost, including ndhA, ndhC, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ and ndhK. The second exon of ndhB gene was also lost, and only the first exon remained. By contrast, only the ndhD gene was intact. On the whole, the four plastomes were all composed of 104 unique genes, including 30 unique tRNA genes, 4 unique rRNA genes and 70 unique protein-coding genes. Moreover, we observed the loss of the first exon of the clpP gene, which might be pseudogenes with the same to gene ndhB (Table 2).
Table 2
Gene composition in the plastomes of Selenicereus.
Category of Genes
|
Group of Genes
|
Name of Genes
|
Ribosomal RNA
|
rRNA
|
rrn16S, rrn23S, rrn5S, rrn4.5S
|
Transfer RNA
|
tRNA
|
30 unique trna genes
|
Photosynthesis
|
Subunits of ATP synthase
|
atpA (x2), atpA, atpB, atpE, atpF* (x2), atpH, atpI
|
Subunits of photosystem II
|
psbA (x2), psbB, psbC, psbD, psbE, psbF, psbI (x2), psbJ, psbK (x2), psbM, psbN, psbT, psbZ
|
Subunits of NADH-dehydrogenase
|
ndhBψ, ndhD
|
Subunits of cytochrome b/f complex
|
petA, petB*, petD*, petG, petL, petN
|
Subunits of photosystem I
|
psaA, psaB, psaC, psaI, psaJ
|
Subunit of rubisco
|
rbcL
|
Self-replication
|
Large subunit of ribosome
|
rpl14, rpl16*, rpl2, rpl20, rpl22, rpl32, rpl33, rpl36
|
DNA dependent RNA polymerase
|
rpoA, rpoB, rpoC1, rpoC2
|
Small subunit of ribosome
|
rps11, rps12*, rps14, rps15, rps16, rps16*, rps18, rps19, rps2, rps3, rps4, rps7, rps8
|
Other genes
|
Subunit of Acetyl-CoA-carboxylase
|
accD
|
c-type cytochrom synthesis gene
|
ccsA
|
Envelop membrane protein
|
cemA
|
Protease
|
clpPψ (x2)
|
Translational initiation factor
|
infA
|
Maturase
|
matK (x2)
|
Unknown
|
Conserves open reading frames
|
ycf1, ycf1ψ, ycf3**, ycf2 (x2), ycf4
|
Note. (x2) indicates that the gene located in the IRs and thus had two complete copies, * and ** indicate that genes containing one/two introns. ‘ψ’ indicates that it is a pseudogene. |
Furthermore, we also clearly observed the loss of intron in two genes: rpl2 and rpoC1. Due to the loss of some genes, exons and introns, the number of intron-containing genes in the plastomes of Selenicereus species were significantly reduced compared to most other non-cactus species. Except for the trans-splicing gene, rps12, there were only 5 protein-coding genes (petB, petD, rpl16, rps16 and atpF,) containing one intron, and only one gene (ycf3) containing two introns. Moreover, 5 tRNA genes containing one intron (trnL-UAA, trnT-CGU, trnK-UUU, trnA-UGC and trnE-UUC).
In the four Selenicereus plastomes, there were 10 protein-coding genes (atpF, atpA, clpP, psbI, psbK, rps16, matK, psbA, ycf2, ycf1) and 8 tRNA genes (trnR-UCU, trnT-CGU, trnS-GCU, trnQ-UUG, trnK-UUU, trnH-GUG, trnM-CAU and trnL-CAA) were found located in the IR regions, they all have two copies. Among the protein-coding genes, two genes (ycf1 and atpF) are partially located in IR region. All rRNA is located in the SSC region.
Repeat And Ssrs Analysis
Microsatellites (simple repeat sequences, SSRs) are usually 6 bp tandem sequences in eukaryotic genomes [26]. Their high polymorphism and codominant inheritance make them popular molecular markers [27, 28], which play an important role in the identification of species and the evaluation of evolutionary relationships [29]. Among the four plastomes, S. monacanthus had the largest number of SSRs (74), followed by S. anthonyanus and S. validus (both were 61), and S. grandiflorus (55). Most of these SSRs were homopolymers of A/T mononucleotide, and on average, they accounted for 64.34% of the total SSRs. Dinucleotides (18.85%), tetranucleotides (8.61%), and trinucleotides (3.69%) were followed. Pentanucleotide and hexanucleotide repeats were rare in Selenicereus plastomes, accounting for 1.23% and 1.64% of all SSRs, respectively (Fig. 2 and Additional file 1: Table S1).
We detected a large number of dispersed repeats in the four plastomes. A total of 807 dispersed repeats were identified, including 618 forward repeats (with length ranged from 30 to 415 bp), 146 palindromic repeats (30 to 415 bp), 39 reverse repeats (30 to 41 bp), and four complementary repeats of 30 bp in length (Additional file 1: Table S2). The detailed distribution of these dispersed repeats in each plastome was shown in Fig. 2. Notably, the number of forward repeats in S. grandiflorus and S. validus was significantly greater than that in the other two taxa. The dispersed repeats not only serve as potential markers for rearrangement, but were also crucial for inducing mutations [30].
Genomic Divergence
Sequence identity analysis based on mVISTA [31]was performed among the 4 plastomes, with the reference being the plastome of S. validus. We found that the plastome sequences of the four species were quite conservative, in general, IR regions were more conserved than LSC and SSC regions, and the hypervariable regions were mainly found in non-coding sequences. Nevertheless, several coding-regions showed significant differences in the sequences (Fig. 3), such as accD, clpP, ycf1 and ccsA. The accD gene, in particular showed unusual sequence divergence. In addition, there were significant differences among several non-coding regions: trnF-rbcL, trnM-accD and trnN-trnR.
According to the results of DNA sequence polymorphism obtained by DnaSP (v6.0) [32], we identified six hypervariable regions, there were trnF-GAA-rbcL (Pi = 0.05567), ycf1 (Pi = 0.059), clpP-trnS-GCU (Pi = 0.03067 ), clpP-trnT-CGU (Pi = 0.03167), rpl22-rps19 (Pi = 0.02067), and the highest Pi value of accD gene, including the intergenic region trnM-accD, with Pi value ranging 0.00667 to 0.167 (Fig. 4). The maximum Pi value for this region was given in parentheses. The results were similar to those based on mVISTA, suggesting that these regions could be used as a potential DNA barcodes.
We analyzed 67 orthologous genes in the protein-coding regions of the four plastomes. In our study, a total of 19 genes (atpA, matK, petD, petG, petN, psaC, psaI, psaJ, psbA, psbE, psbF, psbH, psbI, psbK, psbM, psbN, psbT, psbZ, rps16) in the four species were completely conserved, and 27 genes had a mutation rate of less than 1.0%. However, we also found that some protein-coding genes had a high level of mutation (Fig. 5 and Additional file 1: Table S3), for example, the mutation rates of 2 genes were more than 2%, and the mutation rates of 3 genes (rpl36, ycf1 and rpl22) were more than 3%. The highest mutation rates were observed in three genes: rpl32 (12.34%), accD (10.05%) and clpP (7.44%).
Contraction And Expansion Of Irs
We analyzed the IR/SC boundaries and their adjacent genes in the four plastomes, and compared them to previously published related plastomes. It is not difficult to find that the IR/SC border and the adjacent genes of Selenicereus plastomes were very similar in structural characteristics except for small differences in gene position. However, we found that the IR lengths and IR boundaries of the four plastomes newly reported here were varied greatly from those previously reported in cacti and related species. The length of IR regions was observed more than 20,000 bp in Opuntia quimilo and all other reported non-cactus species in the order Caryophyllales. However, it was only 8,530 bp in Rhipsalis baccifera, and less than 2,000 bp in most cacti genera, such as Mammillaria, Carnegiea, Lophocereus et al. Here, in our four sequenced Selenicereus plastomes, the IR lengths were ranged from 21,674 to 21,775 bp, indicating that the cacti had undergone a drastic expansion/contraction event in the IR region.
Furthermore, we also analyzed the IR boundaries of plastomes in the IR region that over 2,000 bp. As shown in Fig. 6, in two non-cactus species, the rps19 gene span the LSC/IRb border, and an rps19 pseudogene was duplicated in the IRa. The same, the ycf1 gene span the SSC/IRa border, and an ycf1 pseudogene was duplicated in the IRb.
In O. quimilo, the two LSC/IR boundaries were ycf15-trnV and trnV-trnH, and the two SSC/IR boundaries were ndhG-trnL and ndhG-ndhE, respectively. By contrast, in R. baccifera, the two LSC/IR boundaries were rpl23-trnI and trnI-trnH, and the two SSC/IR boundaries were inside of ycf1. Due to the dynamic changes of IRs, the IR boundaries were also changed in the four Selenicereus plastomes. Although the two SSC/IR boundaries were similar to R. baccifera, the second exon of atpF gained access to the IR regions, the first exon of atpF was not. Thus, a previously unreported LSC/IR boundaries at the intron region of atpF was formed. This result suggested that the IR boundaries in cacti plastomes were extremely unstable compared with other Caryophyllales plastomes.
Phylogenetic Analysis Based On Conserved Protein-coding Genes
In this study, we constructed phylogenetic trees by using the 56 shared plastidial genes as datasets. The tree reconstruction based on Maximum Likelihood (ML) method and Bayesian Inference (BI) method had a highly consistent topology. The stable topological structure and high bootstrap/posterior probability support values of each node indicated the reliability of phylogenetic tree (Fig. 7).
The phylogenetic analysis involved 15 species of subfamily Cereoideae and two outgroups (Pereskia sacharosa and O. quimilo). In our tree, the four Selenicereus species form a monophyletic clade supported by strong support values. The red-fleshed pitaya (S. monacanthus) was most closely related to S. anthonyanus compared to other two Selenicereus species. This result also suggested that Hylocereus and Selenicereus were paraphyletic groups, the genus Hylocereus belonged to Selenicereus based on the 56 conserved plastidial genes.