In this study, the complete chloroplast genomes of three species of subfamily Aroideae were assembled using Illumina sequencing technology followed by a comparative analysis, all methods were carried out in accordance with relevant guidelines and regulations. A good level of similarity was observed among three genomes in terms of genome structure, gene content and gene arrangements, however the chloroplast genome of C. gigantea showed differences with C. bicolor and X. sagittifolium in SSC/IR boundary, and C. bicolor and X. sagittifolium in terms of the expansion of IRs to merge part of ycf1 (Fig. 3). Similar structural variation was found in 14 species of Aroideae, including Pistia stratiotes, Xanthosoma helleborifolium, Zamioculcas zamiifolia and Zomicarpella amazonica. Notably, the complete ycf1 region was included in the IR of Carlephyton glaucophyllum and Typhonium blumei, and a significant correlation between CPG size and IR size was observed. These results indicated that most of the variations in chloroplast genome structure occur due to the contraction and expansion of IR region .
The comparison of the chloroplast genome sequences obtained from sequence divergence analysis showed us clear differences between species at the molecular level. The intron region showed the highest variable rate, followed by the SSC, LSC, protein-coding regions, and IR region with the having the smallest rate. Our results are consistent with the previous studies on the chloroplast genomes of many land plants [13, 19, 26]. The nucleotide diversity of noncoding regions was higher than that in coding regions, suggesting suitability of the noncoding regions in Aroideae for the molecular marker identification, this is consistent with previous research in angiosperm chloroplast genomes, Thirteen intergenic regions (specifically trnS-trnG) with highest-level of divergences (Pi > 0.17) could be developed as specific molecular markers for species identification . Similarly, psaC-ndhE, trnN-ndhF, ccsA-ndhD, rps15-ycf1, petD-rpoA, atpH-atpI, rpl32-trnL, rps19-rpl2, trnL-ccsA have been reported for the discrimination of potential molecular markers and DNA barcodes [13, 26, 28]. The six highly variable regions (atpH-atpI, psaC-ndhE, trnN-trnF, trnS-trnG, ndhG-ndhI, rps15-ycf1) contained at least three SSRs in C. gigantea, C. bicolor or X. sagittifolium (Table S2). Previously, highly variable regions have been compared for whole-genome sequences in Rosaceae and indicated as hotspots in positive correlation with the distribution of SSRs . These results would improve our understanding of cp genome of Aroideae by the repeats identification and nucleotide diversity analysis.
Analysis of the adaptive evolution of genes has an important reference value in examining the change of gene structure and functional mutations. The KA/KS ratio may reveal the constraints of natural selection on organisms, and the estimation of these mutations contribute greatly in understanding the dynamics of molecular evolution [25, 26, 29]. In the present study, there were seven genes (accD, ndhF, ndhK, rbcL, rpoC1, rpoC2, matK) under positive selection with significant selective sites. Among these, the accD gene encodes the β-carboxyl transferase subunit of acetyl-CoA carboxylase , which is an important regulatory enzyme for fatty acid synthesis. The accD has been reported as an essential gene required for leaf development , and as a contributor in leaf longevity . Considering the fact that Aroideae species commonly have large leaf area, the finding of the accD under positive selection might indicate that it is an essential factor for leaf development. Similarly, rpoC1 and rpoC2 encodes the RNA polymerase β, which might play an important role in the regulation of pollination and sex differentiation . The matK encodes an intron maturase (maturase K) which is involved in the cutting/splicing of Group II RNA transcriptional introns . Furthermore, three other genes (ndhF, ndhK, and rbcL) under positive selection showed photosynthesis linked roles, indicating their role in photosynthesis and carbon fixation in Ariodeae. These genes (accD, rbcL, ndhK) to have been reported to undergo positive selection in the Monsteroideae (Araceae) . Most of the species in Aroideae family are distributed in creeks, streamside, wetlands, and moist mountains. Therefore, chloroplast functional genes, involved in energy metabolism and plant development, might play key roles during the adaptation and development of the Aroideae species to their respective ecological niches.
Based on similar morphological characteristics and the size of nuclear genome, defining the phylogenetic relationships in Aroideae is an important and difficult goal to reach . Complete chloroplast genome sequence is a great molecular resource for exploring phylogenetic relationships compared to whole nuclear genome in Aroideae [1, 15]. Phylogenetic analysis using the chloroplast genome sequence has been applied to evaluate evolutionary relationships of species [13, 26, 34]. Phylogenetic tree constructed in this study based on complete chloroplast genome, CDS, LSC, SSC, IR, and intergenic regions, showed results in consistence with the traditional classification system [2, 3], indicating the rational of the classification of Aroideae. Furthermore, our phylogenetic analysis improves traditional classification by differentiating Colocasia and Xanthosoma with a remote molecular level link, even the shape and size of leaf and petiole of C. gigantea are very similar to X. sagittifolium (Fig. 1). Despite the markable differences of C. bicolor with X. sagittifolium, a closer relationship was observed in the phylogenetic tree. Moreover, presence of the S. colocasiifolia in the Colocasia’s clade, indicates the reliability of genetic information to better understand the phylogenetic relationships in Aroideae.
Accurate discrimination of germplasm is very important for its utility, breeding new cultivars and evolutionary relationships . Discrimination based on only morphological traits in Aroideae would not provide the complete picture of the family unless combined with the DNA markers. Previously, researchers focused on mutational and evolutionary dynamics in chloroplast genome of Aroideae [1, 22, 23], however, development and application of DNA barcodes have been rarely reported. DNA barcodes are defined as the DNA sequences with a high mutation rate to identify a species within a family . Plastid (chloroplast) genome have such hotspot regions to be used as DNA barcodes for identification purposes in closely related species [16, 36]. Here, three candidate DNA (highly variable regions) barcodes such as atpH-atpI, psaC-ndhE, trnS-trnG were detected (Fig. 8, Table 3), in order to validate the discrimination effect of these molecular markers, the combined DNA barcodes of atpH-atpI + psaC-ndhE + trnS-trnG were manually extracted from other 13 published chloroplast genomes of Ariodeae spesies , the phylogenetic tree contained 30 Aroideae species and Alisma plantago-aquatica was analysed (Fig. S3), and the relationships among these species in the phylogenetic tree were almost consistent with the previous taxonomic structure . As our results showed, most of the candidate DNA regions are in LSC region and these regions can discriminate Ariodeae species successfully when used in combination forms. Similar results were reported for chloroplast genomes of Oryza , Cucurbitaceae  and Rosaceae . Therefore, these variable regions could be employed as specific DNA barcodes for identification purposes and genetic diversity studies in subfamily Aroideae.