In our study, the complete CP genomes of V. montana and V. fordii were sequenced, assembled, and annotated, which were 160,906 bp and 161,494 bp in length, respectively. Compared to V. fordii, the genome of V. montana was smaller, but it contained the same genes. As shown in figureures 2 and 3, the CP genomes of V. montana and V. fordii are typically circular with four regions that consisted of a pair of IRs of 53,568 kb and 53,638 in V. montana and V. fordii, respectively, which is separated by an SSC region of 18,732 kb and 18,760 in V. montana and V. fordii, respectively, and a LSC region of 88,606 kb and 89,098 in V. montana and V. fordii, respetively (Table 1). In both species, 135 genes were annotated in those regions, including 81 protein genes, 4 tRNA genes, 29 rRNA genes, and 21 genes that were duplicated in the IRs (Table 1). Furthermore, a total of 2,280 SNPs and 257 indels were identified within the V. montana and V. fordii genomes. The CP genome sequence similarity between the two species was 98.4%, which suggested a close relationship between V. montana and V. fordii that was also indicated by the phylogenetic analysis. Through these comparative analyses, the similarity between the two Vernicia species in CP genome was easily identified.
It was reported that the CP genomes in most land plants contain two identical IR regions, which would have lower nucleotide substitution rates and fewer indels than in the LSC and SSC regions (Li et al 2017; Kim and Lee 2004). Similarly, in our study, 30 indels and 380 SNPs were identified in the IR regions of the Vernicia CP genome (Table 2), where there are more duplicated genes (21 in Vernicia) than in other Euphorbiaceae species. In contrast, the IGSs and intron regions had more indels than the protein-coding genes, and thus, seemed to have evolved more quickly than the protein-coding genes. Traditionally, the nucleotide substitutions and indels in the CP genomes were used as DNA and barcoding markers in the phylogenetic analysis of many land plants (Clegg et al 1994; Morton and Clegg 1995; Katayama and Uematsu 2005). Hence, certified indel site information can be an important resource in future studies.
Repeat sequences are useful for studying genome rearrangements, while acting as important molecular markers in plant population genetics as well as evolutionary and ecological studies (Cavalier-Smith 2002). In the CP genome of V. montana, 94 microsatellites were identified, which included 51 SSRs, 18 long repeat sequences, and 25 compound microsatellite repeats (Supplemental tables S1-3). Meanwhile, 126 microsatellites were detected in the V. fordii CP genome, which included 70 SSRs, 25 long repeat sequences, and 31 compound microsatellite repeats (Supplemental tables S4-6). The lengths of the repeat units ranged between 10 to 233 bp, where a large number of repeats were distributed within the IGS regions, while the IRs accounted for the majority of the repeats. In addition, we found many repeats in the ycf2 gene, including two forward repeats and four palindromic repeats in both Vernicia CP genomes. Additionally, most of the repeats were found in the non-coding regions of the tung tree CP genome. Hence, the non-coding regions in CP genomes can act as important molecular markers for future phylogenetic studies (Small et al 1998).
Organelle genome sequencing is becoming an important approach for phylogenetic and taxonomic studies at low taxonomic levels (Yang et al 2013). Based on the current developments in the technology, thousands of organelle genomes have been sequenced, which can greatly mitigate the current reliance of phylogenetic research on relatively short sequences (Parks et al 2009). Whole CP genome sequences could provide more adequate information for phylogenetic and population-based studies, improving the discrimination efficiency when identifying species (Yang et al 2013). In fact, phylogenomic studies have currently gained popularity, where the possibility of using organelle-scale “barcodes” has recently been widely considered and applied (Parks et al 2009; Kuang et al 2011; Yang et al 2013). These CP genomes contained moderate variations that could provide sufficient phylogenetic information for resolving evolutionary relationships (Yang et al 2013). In the Euphorbiaceae family, several studies have analyzed the phylogenetic relationships based on CP DNA sequences (Daniell et al 2008; Tangphatsornruang et al 2011; Li et al 2017). Furthermore, to the best of our knowledge, no study has focused on the phylogenetic relationships between V. motana and V. fordii, and here, we have sequenced two Vernicia species using the Illumina sequencing technology, while performing a phylogenomic analysis. As expected, both V. montana and V. fordii included a sister group that had high bootstrap values (PP = 100). In addition, Vernicia and the four other species of the family Euphorbiaceae were clustered as a monophyly with high bootstrap values, which is consistent with previous studies (Li et al 2017). Additionally, Vernicia was suggested to be more closely related to Deutzianthus than other taxonomical groups.