Chloroplast genome assembly and annotation
We completed 15 new plastid genomes in this study listed in Table 1 through 9 to 21 million raw reads for each species (Fig. S1, Table S1). A total of 16 plastid genomes, including Belosynapsis ciliata, exhibit the typical quadripartite structure containing LSC and SSC regions separated by two inverted repeats (Fig. 1). Plastid genome sequences of Murdannia edulis and Belosynapsis ciliata are over 170 kb in length whereas that of Commelina communis is 160,116 bp in length (Table 1). In addition, Murdannia edulis and Belosynapsis ciliata have the lowest GC content (34.5 %) whereas Palisota barteri has the highest GC content (36.2 %) (Table 1). The highest length difference was observed in LSC region about 8,801 bp between Belosynapsis ciliata and Commelina communis, GC content was in SSC region about 3.4 % between Dichorisandra thyrsiflora and Murdannia edulis (Table 1). Plastid genomes of Commelinoideae have 131 genes, of which 111 are unique and 20 are duplicated in the IR regions (Table 2), except rpl22 gene which was not duplicated in tribe Tradescantieae. There are 77 protein-coding genes (CDS), 30 transfer RNA (tRNA) genes and 4 ribosomal RNA (rRNA) genes in examined Commelinoideae taxa (Table 2). In these genes, three CDS (rps12, clpP, and ycf3) have two introns, nine CDS (atpF, ndhA, ndhB, petB, petD, rpl2, rpl16, rpoC1, and rps16) and six tRNA (trnK-UUU, trnG-UCC, trnL-UAA, trnV-UAC, trnI-GAU, and trnA-UGC) have one intron (Table 2). The rps12 gene was trans-spliced, which has 5’ exon in LSC and 3’ exon and intron in the IR regions. Three pseudogenes (accD, rpoA, and ycf15) were identified from all Commelinoideae species, one (ycf15) of which was duplicated in the IR regions (Table 2). These three genes contained several internal stop codons due to insertions and deletions, thus are identified as pseudogenes. Also, we identified ndhB as pseudogene in two species (Pollia japonica and Rhopalephora scaberrima) in consequence of point mutation.
Comparative chloroplast genome structure and nucleotide diversity
The aligned data of whole plastid genomes showed high similarities in coding genes, and high variations in non-coding genes (Fig. 3). We found several genome structure variations among Commelinoideae species. Murdannia edulis and Streptolirion volubile had one inversion from rbcL to psaI intergenetic spacer (approximately 3 kb) and petN to trnE-UUC (approximately 2.8 kb), respectively. Amischotolype hispida and Belosynapsis ciliata had two large inversions from trnV-UAC to rbcL and psbJ to petD about approximately 5 kb and 16 kb, respectively. The IR-SSC boundary was similar among species of Commelinoideae (Fig. 4). All plastid genomes have incomplete duplicated ycf1 gene in the IRB-SSC junctions. We also found an expansion of IR regions in Commelineae species which resulted duplication of rpl22 genes (Fig. 4).
We analysed nucleotide divergences of CDS, tRNA, and rRNA to explain variant characteristics among the 16 Commelinoideae plastid genomes (Fig. 2, Table S3). Nucleotide diversity (Pi) for each CDS ranges from 0.00427 (psbL) to 0.09543 (ycf1) with an average of 0.03473. Nine CDS (rps3, ndhG, ndhD, ccsA, rps15, rpl32, ndhF, matK, and ycf1) have remarkably high values (Pi > 0.05) and seven CDS (psbL, rpl23, rps19, ndhB, rpl2, rps7, rps12) have low values (Pi < 0.01; Fig. 2). Compared with Tradescantieae, Commelineae have relatively higher values in almost CDS (Fig. 2). In Tradescsantieae, however, the rpl22 gene has higher value (Pi = 0.04655) in comparison with Commelineae. In tRNA and rRNA, Pi values range from 0 (trnT-UGU, trnH-GUG, trnV-GAC, trnI-GAU) to 0.02697 (trnQ-UUG) with an average of 0.006. Commelineae have the highest value in the trnL-UAA (Pi = 0.02941) while Tradescantieae have no value in this gene. We tried to find latent phylogenetically informative genes for the Commelinoideae by checking individual CDS with high values (Pi > 0.045) and over 500 bp length. Ten CDS (ndhH, rpoC2, ndhA, rps3, ndhG, ndhD, ccsA, ndhF, matK, and ycf1) were checked respectively with ML analysis and compared positions among 16 genera of Commelinoideae with Fig. 5. Total four CDS (ndhH, rpoC2, matK, ycf1) have similar topology in Commelinoideae even though the other monocot groups were unclear.
Phylogenetic analysis
The aligned 77 chloroplast protein-coding genes had 65,481 characters, of which 16,380 were parsimony informative. The MP analysis produced single most-parsimonious tree (tree length = 72,586, CI = 0.488, RI = 0.626). The tree topologies from among MP, ML, and BI were found to be congruent with each other with 100% bootstrap (PBP, MBP) values and 1.00 Bayesian posterior probabilities (PP) supporting in almost all nodes except Palisota which was unresolved in MP analysis (not shown) (Fig. 5). The result suggested that Palisota was sister to the group consisting of the rest of Commelinoideae (Fig. 5). In Tradescantieae, Streptoliriinae was positioned at the basal node. Then, Dichorisandrinae divided into two clades ((Dichorisandra, Siderasis), (Cochliostema, Geogenanthus)) with relatively low support values in both MP and ML analysis (PBP = 74, MBP = 84, PP = 1) (Fig. 5). Among remain four subtribes, where two clades ((Coleotrypinae and Cyanotinae), (Tradescantiinae and Thyrsantheminae)) were formed with high support values (PBP = 100, MBP = 100, PP = 1), respecively (Fig. 5).