Comparative And Phylogenetic Analyses of Six Kenya Polystachya (Orchidaceae) Species Based On The Complete Chloroplast Genome Sequences

DOI: https://doi.org/10.21203/rs.3.rs-1070116/v1

Abstract

Background: Polystachya Hook. is a large pantropical orchid genus (c. 240 species) distributed in Africa, southern Asia and the Americas, with the centre of diversity in Africa. Chloroplast (cp) genomes of plants are highly conserved and can provide much more informative DNA sites and generate much better resolution for plant phylogenies. However, for Polystachya, the whole cp genome including its structure features are yet unknown and its phylogenetic placement of the genus within the Orchidaceae is still unclear.

Results: In this study, the complete cp genomes of six Polystachya species were assembled based on genome skimming. We subjected them to comparative genomic analyses and reconstructed their phylogenetic relationships. The results exhibited that the cp genomes had a typical quadripartite structure with conserved genome arrangement and moderate divergence. The cp genomes of the six Polystachya species ranged from 145,484 bp to 149,274 bp in length and had almost similar GC content of 36.9%-37.0%. Gene annotation revealed 113 unique genes. In additions, 19 genes are duplicated in the inverted regions, and 17 gene possessed intron. Comparative analysis of the overall sequence identity among six complete cp genomes confirmed that for both coding and non-coding regions in Polystachya, SC regions exhibit higher sequence variation than IRs. Furthermore, there were various amplifications in the IR region among the six Polystachya species. Most of the protein-coding genes of these species had a high degree of codon preference. We screened out specific SSR and found seven relatively highly variable loci. Moreover, 13 genes were discovered with significant positive selection. Phylogenetic analysis suggested that the six Polystachya species formed a monophyletic clade and had more closely related to tribe Vandeae. Phylogenetic relationships of the family Orchidaceae inferred from the 85 cp genome sequences were generally consistent with previous studies and robust.

Conclusions: Our study reported the complete cp genomes of the six Polystachya species, and provided detailed structural analysis and comparative analysis results, which can contribute to the development of DNA markers for use in the study of genetic variability and evolutionary studies in Polystachya. In addition, the present results further demonstrate the phylogenetic position of Polystachya.

Background

The genus Polystachya Hook (1824: 103; Orchidaceae) comprises approximately 240 species, of which most species are distributed in Africa, a few ones extend to tropical and subtropical Americas, and only three species in Asia [1-6]. It is difficult to give exact number of species of the genus for while some species are widespread, others are narrow endemics. Furthermore, the taxonomic nomenclature of the genus is also complex since there are about 500 names available associated with the currently recognized species [2, 7]. Representatives of the genus are usually epiphytic, occasionally lithophytic or terrestrial perennial herbs [8, 9] (http://powo.science.kew.org/taxon/325981-2;http://www.theplantlist.org/tpl1.1/search?q=Polystachya). Most species of Polystachya are compact plants that take up little space with long-lasting and scented flowers, worth cultivating [8]. The ploidy levels of the genus range from diploids (2n=2x=40) to hexaploid (2n=6x=120), where it is generally believed that polyploidy recently formed in the Neotropics, Madagascar and Reunion [10-12]. Previous studies indicated that there exists a close relationship between Polystachyinae and the tribe Vandeae. Based on morphological data analysis, Polystachyinae was placed within a larger Vandeae [13-15]. However, Carlsward et al. (2006b) maintained a stricter definition of Vandeae in their studies, citing several morphological characters of Vandeae, including monopodial habit, loss of mucilage and tilosomes, and the presence of spherical silica bodies in leaf sclerenchyma, to differentiate it from Polystachyinae [16] . A study by Li et al. (2019), using only one sample of Polystachya found that this genus belongs to Vandeae based on plastid genome sequence analysis. Nevertheless, the results of mitochondrial genome sequence analysis indicated that it may belong to Malaxideae, also supported by their shared morphological features such as distinctive pseudobulbs, terminal inflorescences, floral mentum and waxy pollinia [17].

The monophyly of the genus Polystachya has been reported in several studies, which is sharp contrast to the poor of monophyly observed in the taxonomic sections described within the genus [6, 18]. The genus was usually divided into 15 sections based on morphological characteristics, but these natural groupings were not fully supported in currently available molecular studies [6, 19, 20]. In these molecular studies, all the sections either are polyphyletic or paraphyletic, except sect. Isochiloides (Russell et al., 2010b). A well-resolved phylogenetic hypothesis could help clarify the infrageneric classification of the genus and might be used to redefine sections as a step towards a much-needed generic revision. The latest taxonomic work to attempt an account of the entire genus was A Monograph of the Subtribe Polystachyinae Schltr. (Orchidaceae) by edited Mytnik-Ejsmont (2011) [18]. The genus Polystachya species are reclassified into 13 sections in the book. The classification system divided Elasticae, Affines (only Polystachya affinis), Isochiloides, Aporoidea, Dendrobianthe (including Polystachya dendrobiiflora) and Polystachya longiscarpa (originally in Dendrobianthe) into separate genera. Kermesina and Polystachate were assigned to subsection level. At present, Polystachya dendrobiiflora is an acceptable name, while Dendrobianthe dendrobiiflora is treated as its synonym, thus Polystachya dendrobiiflora is still used in this study. Apart from sect. Polystachya which is pantropical, all sections are confined within Africa and Madagascar with two being endemic to Madagascar and the Indian Ocean islands. Although DNA sequence analysis has been informative in studies of the genus as a whole, relationships between members of this pantropical species group. However, the genus has a notoriously complicated taxonomy, with several sections that are widely used but probably not monophyletic [6, 12, 18, 19]. In addition, individuals from different geographical regions with duplicate samples may also not be monophyletic groups [21]. Although some work on the genus, rather little of their results support the currently recognized Mytnik-Ejsmont’s all sections, and even contradictory. Additionally, taxonomical and phylogenetic uncertainties remain in some sections or subsections or species because of poor internal resolutions, low bootstrap support, and inconsistent/incongruent plastid and nuclear gene phylogenies. It is generally believed that hybridization, polyploidization, ambiguous species definition, low sequence divergence level, reticulate evolution, incomplete lineage sorting, and plastid capture may complicate the issue for phylogenetic reconstruction of genus Polystachya [6, 12, 22]. Moreover, naming in Polystachya species is also complicated due to the influence of factors such as erroneous species identification, multiple synonyms, and highly variable morphological characters. Therefore, the development of more effective genetic resources based on increasing samples is necessary for further phylogenetic studies of Polystachya. In recent years, an increasing number of researchers have focused on the cp genome to develop genetic markers for phylogenies. The cp genome sequences have been successfully used to evaluate relationships of different taxonomic units and obtained better results in handling the phylogenetic relations of many difficult groups [17, 23-31]. Accordingly, it is considered to be an informative and valuable resource for phylogenetic analysis in plants at multiple taxonomic levels.

Chloroplast is an important semi-autonomous organelle in plant cells with a complete genetic system and its genetic information is called chloroplast genome. The cp genomes in general are inherited uniparentally, and maternally in most angiosperms species at a slower evolutionary rate of change compared to nuclear genomes [32]. The typical cp genomes in angiosperms have a highly conserved quadripartite circular structure comprised of a pair of inverted repeat regions (IRs, about 20-28 kb) and two single copy regions (large single copy region, LSC, 80-90 kb; small single copy region, SSC, 16-27 kb) [33]. These genomes usually encode 120-130 genes with sizes in the range of 120-160 kb and overall GC content is typically on the order of 30-40% [34-36]. Variation in genome size can be mainly attributed to IR expansion/contraction or even loss. The structure, length, GC content, gene type, gene content and order of cp genomes are generally conserved. When studying plant origin and phylogenetic relationships, the plant genome is made up of three parts: nuclear, mitochondrial and chloroplast genomes. Compared with nuclear and mitochondrial genomes, the chloroplast genomes are small, less prone to recombination, and have low rates of nucleotide substitutions, hence can provide distinct genetic information for phylogenetic studies [37, 38]. Furthermore, the phylogenetic reconstruction based on complete cp genome sequences may reduce errors and uncertainty resulting from insufficient sampling of DNA sequence [39].

With the rapid development of next-generation sequencing (NGS) technology, it is now more convenient and cheaper to obtain cp genome sequences, feasible to address various phylogenetic questions at the different taxonomic levels. Currently, over 6000 cp genomes of plants are available in the National Center for Biotechnology Information (NCBI) organelle genome database (https://www.ncbi.nlm.nih.gov), among which about 500 complete cp genome sequences (c. 370 species) of Orchidaceae have been released by NCBI (2021/5/20). Up to now, there is only one report on cp genome sequences of the genus Polystachya [21]. However, a reassembly of the original data from the report revealed that the cp genome sequences reported in that paper were both incomplete.

In this study, we newly sequenced complete cp genomes of six Polystachya species and conducted comparative genomic analyses. Another 79 published cp genome sequences (77 from Orchidaceae; Allium cepa from Allioideae and Iris gatesii from Iridoideae were chosen as outgroups) downloaded from NCBI database were used to construct phylogenetic trees. Our sampling Orchidaceae materials covered 5/5 subfamilies, 16/22 tribes and 20/49 subtribes of Orchidaceae. The objectives of this study are: (1) to perform genome annotation and comparative genomes analysis to understand the structure of plastomes within Polystachya and to provide genetic resources for future research in the genus, (2) to reconstruct a more comprehensive and better-resolved phylogenetic tree for exploring the phylogenetic position of Polystachya in Orchidaceae and (3) to screen potential DNA markers in cp genomes that can be used for phylogenetic analysis and classification of Polystachya.

Results

Chloroplast Genomes Structure and Features

We obtained the six complete chloroplast genomes of Polystachya species, these cp genomes size ranged from 145,484 bp (P. tenuissima) to 149,274 bp (P. dendrolliflora). Like most angiosperms, all newly sequenced Polystachya plastomes displayed a typical quadripartite structure consisting of a pair of inverted repeats IR regions (IRA and IRB; 25,049-25,716 bp) separated by one large single copy region (LSC; 82,104-83,848 bp) and one small single copy region (SSC; 11,894-14,822 bp) (Figure 1, Table 1). The six cp genomes were all AT-rich, overall GC content ranged from 36.9% to 37.0%, and the GC content of IR region (43.2-43.3%) was always higher than that of LSC and SSC regions (34.3-34.5% and 28.8-29.5%, respectively) (Table 1). This high GC percentage in the IR regions could be due to the presence of eight ribosomal RNA (rRNA) sequences in these regions. Previous studies have also shown similar results, with the high GC content in IR regions being related to the presence of all rRNA genes in this region [40]. By comparing all sequenced Polystachya cp genomes generated in this study, we found that they had highly conserved gene content, gene number, orientation and intron number.

A total of 105-109 unique genes were identified in the six cp genomes, including 68 protein-coding genes (PCGs), 30 transfer RNA (tRNA) genes, and four ribosomal RNA (rRNA) genes (Table 1TableS2). The difference in gene number was due to the variation in number of in ψndh among the six species. The gene distribution in these six cp genomes was exactly the same: the LSC regions encoded 63 protein-coding genes and 22 tRNA genes, and the SSC regions contained 12 protein-coding genes and one tRNA gene. Moreover, 19 genes were duplicated in IR regions, including 6 PCGs (rpl2, rpl23, ycf2, rps19, rps7 and rps12), 8 tRNA genes (trnH-CUG, trnI-CAU, trnL-CAA, trnV-GAC, trnI-GAU, trnA-UGC, trnR-ACG and trnN-GUU), 4 rRNA genes (rrn16, rrn23, rrn4.5 and rrn5) and one pseudogene (ψndhB) (Table S2). The remaining non-genic regions including introns, intergenic spacers (IGS), and pseudogene (ψ). Sixteen genes possessed introns: 13 genes (7 PCGs: rps16, atpF, rpoC1, petB, petD, rpl16, rpl2; 6 tRNAs: trnK-UUU, trnG-GCC, trnL-UAA, trnV-UAC, trnI-GAU and trnA-UGC) have only one intron, while another three PCGs (rps12, ycf3 and clpP) contain two introns (Table S2 and S3). Among the 16 intron-containing genes, 13 were present in LSC region and three were duplicated in the IR regions. At the same time, we found that the exon length was almost the same in the above 16 intron-containing genes, but the length of introns changed in all these genes. Interestingly, among the 11 plastid genes encoding the subunits of the NAD(P)H dehydrogenase complex (ndh genes), some genes were lost and some were pseudogenized. The trnK-UUU gene contain the longest intron, matK gene is located within trnK-UUU intron. Rps12 was a special trans-splicing gene, whose first exon is located in the LSC region, while the second and third exons reside in IR regions.

Table 1. Chloroplast genome characteristics of six Polystachya species

Characteristics

P.dendrobiiflora

P.adansoniae

P.tenuissima

P.bennettiana

P. modesta 

P.concreta

Genbank numbers

OK930071

OK930072

OK930073

OK930074

OK930075

OK930076

Voucher

HGW-M229

HGW-M230

HGW-M231

HGW-M232

HGW-M233

HGW-M234

Total length (bp)

149274

147680

145484

145995

148853

146717

LSC (bp)

83848

83450

82104

82669

83475

82718

SSC (bp)

14822

14132

12192

11894

14622

13179

IRs (bp)

25302

25049

25594

25716

25378

25410

Total number of genes

125 (106)

128 (109)

124 (105)

124 (105)

125 (106)

124 (105)

PCGs

74 (68)

74 (68)

74 (68)

74 (68)

74 (68)

74 (68)

tRNA

38 (30)

38 (30)

38 (30)

38 (30)

38 (30)

38 (30)

rRNA

8 (4)

8 (4)

8 (4)

8 (4)

8 (4)

8 (4)

Pseudogenes

5 (4)

8 (7)

4 (3)

4 (3)

5 (4)

4 (3)

GC content (%)

37.0

36.9

36.9

37.0

36.9

37.0

LSC (%)

34.4

34.3

34.3

34.3

34.4

34.5

SSC (%)

29.5

29.5

28.8

29.1

29.4

29.3

IR (%)

43.3

43.3

43.2

43.2

43.2

43.2

LSC: large single copy region; SSC: small single copy region; IR: inverted repeat; tRNA: transfer RNA; rRNA: ribosomal RNA. GC: guanine-cytosine. The numbers in parenthesis represent unique genes in cp genom

Codon Usage Analyses

Codon usage frequency and relative synonymous codon usage (RSCU) were calculated based on protein-coding genes. All the 74 protein-coding genes were composed of 19,308-19,373 codons and encoded 20 amino acids in the chloroplast genomes of the six Polystachya species. (Figure 2, Table S4). Among these amino acids, leucine (Leu; 1941-1955, 10.05-10.11%) is the most frequently used, cysteine (Cys; 209-216, 1.08-1.12%) is the least universal amino acid in the cp genomes of these species. The RSCU value analysis showed that almost all amino acids are encoded by 2-6 synonymous codons, except methionine (Met) and tryptophan (Trp), this strategy could protect protein mutations in biosynthesis. Relative synonymous codon usage is 1 for methionine (Met) and tryptophan (Trp). About half of the codons have RSCU >1 (30/64), and most (29/30, 96.7%) end with the base A or T. Similarly, about half of the codons have RSCU < 1 (32/64), with majority (29/32, 90.6%) ending with the base C or G. In nearly all of the protein-coding genes in Polystachya species had the standard ATG/CAT start codon (RSCU = 1), but rpl2 started with ATA/TAT. ATA/TAT as an initiation codon has been reported in other cp genomes [41, 42]. All three stop codons were present, with TAA being the most frequently used among the six plastomes (Table S4).

Repeat Sequences Analysis

Simple sequence repeats (SSRs), also known as microsatellite repeats, are shorter tandem repeats consisting of 1-6 bp repeat units, which are widely distributed in various regions of chloroplast genome. In this study, SSRs were done using Phobos and SSR Hunter software to identify the potential repeats based on the cp genome sequences of six Polystachya species. A total of 58 (P. dendrolliflora)-73 (P. adansoniae) SSRs were detected in the six cp genomes, including 38-42 mononucleotides (mono-), 8-13 dinucleotides (di-), 4-9 trinucleotides (tri-), 3-7 tetranucleotides (tetra-), 0-7 pentanucleotide (penta-), and 0-2 hexanucleotide (hex) (Figures 6A and 7, Table S6). Among them, two hexonucleotides ATATTA/TAATAT distributed in IR regions only existed in P. concreta (Figure 3). Statistical analysis of the locations of all identified SSRs showed that the number of SSRs located in the LSC region, SSC region and IR regions were 44-52, 8-12 and 4-14, respectively. Furthermore, we found that these SSRs were mainly distributed in IGS (38-54), and some in the CDS region (9-12) and introns (8-12). A/T (no C/G) is the only mononucleotide SSRs type in the six species (Figures 6A and 7), and the repeat units of the other five SSRs were also mainly composed of A or T (Figures 6A and 7). The tandem (T), forward (F), reverse (R), palindromic (P) and complement (C) repeat sequences in the six Polystachya cp genomes were conducted by tandem repeat finder and REPuter. We identified 11 (P. dendrolliflora)-37 (P. modesta) long repeat elements (Figure 3, Table S5), which 9-29 are tandem repeats. The lengths of these long repeat sequences were variable with range 10-126 bp, with the longest repeat (126 bp) was presented in P. adansoniae. Most tandem repeats were located in the IGS region of LSC, while the tandem repeats of the coding region were mainly located in exon of ycf1, ycf2, accD and rpoA (Figure 3, Table S5).

IR Expansion and Contraction

The contraction and expansion of IR borders are common evolutionary events and are the major reason for size differences between chloroplast genomes [43, 44]. We conducted a comparative analysis to investigate the expansion/contraction of IR among the six species, and found that the cp genomes of Polystachya species are greatly conserved. However, some structural variations were present in the four boundaries (LSC/IRB, IRB/SSC, SSC/IRA, IRA/LSC) (Figure 4). Our results showed that the genes rpl22-rps19-psbA and trnN-rpl32-ycf1 were located in the junctions of the LSC/IR and SSC/IR regions. In all of the six plastomes, the rpl22 gene spans the LSC/IRB junction region and extends to the IRA region for 23-66 bp. Rps19 is duplicated in the IR regions, the distance between the rps19 gene located in the IRB region and the rpl22 gene is 138-149 bp, and the distance between the rps19 gene located in the IRA region and the IRA/LSC boundary is 167-212 bp. IRB/SSC region is situated in the intergenic regions between trnN-GUU and rpl32, the distance from trnN-GUU to IRB/SSC region is 328-681 bp. In addition, the distance of ycf1 gene varies greatly on the IRB/LSC border, with a length of 5-359 bp in the IR region and 5023-5509 bp in the SSC region. IRA/LSC region is situated in the intergenic regions between rps19 and psbA, the distance from the psbA to the IRB /SSC boundary ranged from 129-186 bp.

Structure Comparison and Divergence Hotspot Identification Analysis

Mauve comparison found a reversal of about 1 kb in the IRB/SSC boundary region of the cp genome of P. modesta, in which the rpl32 gene was contained, resulting in the reversal of the rpl32 gene (Figure 5). Sequence identity plots of the six Polystachya species were generated, with the annotation of P. dendrobiiflora chloroplast genome as a reference (Figure 6). LSC and SSC regions were more divergent than IRs regions. Whereas, the coding regions were more conserved than the non-coding regions, the highly divergent non-coding regions among the six cp genomes appeared in the intergenic regions (IGS), such as trnS-GCU-trnG-GCC, atpH-atpI, petA-psbJ, trnN-GUU-rpl32, rpl32-trnL-UAG and psaC-rps15. The trnI-GAU intron was also relatively divergent. On the other hand, all the rRNA genes were highly conserved and were similar to other plants’ cp genomes [31]. For further understanding of the DNA polymorphism (Pi), the nucleotide variability value of 113 coding genes and 112 IGS regions were calculated among these cp genomes of the six Polystachya species (Figure 7). The results are the same as previous reports: the IR regions more conserved than LSC and SSC regions and almost all divergent regions are presented in non-coding regions. The Pi values of the LSC and SSC regions were mostly greater than the largest Pi value (0.01149) of IR regions. There were seven variable regions with high Pi values (≥0.05), all which located in IGS regions, including rps19-psbA (0.06227), trnS-GCU-trnG-GCC (0.12038), trnG-GCC-trnR-UCU (0.06667), atpH-atpI (0.06479), psbT-psbN (0.08280), rpoA-rps11 (0.08235) and trnL-UAG-ccsA (0.05503). These hotspot regions could be developed as molecular markers and used for DNA barcoding for future phylogenetic analyses and species identification of Polystachya. The Pi values received from coding regions ranged from 0.00000 to 0.01708 (ycf1) and the average value is 0.00556. However, IGS regions showed remarkably higher Pi values, the largest value was 0.12038 (trnS-GCU-trnG-GCC), with an average of 0.02158, which was 3.88-fold higher than that in coding genes.

Positive Selection Analysis

The ratio of non-synonymous (dN) to synonymous substitutions (dS), dN/dS, has been widely used to evaluate the natural selection pressure and evolution rates of nucleotides in genes [45, 46]. The dN/dS ratio > 1 specifies positive selection (adaptive evolution), while dN/dS ratio < 1 signifies negative selection (purifying evolution). We compared the ratio of non-synonymous (dN) and synonymous (dS) substitution for 68 protein-coding genes among six Polystachya species. Likelihood ratio tests (M1a vs. M2a, M7 vs. M8) supported the presence of positively selected codon sites (p < 0.05) (Table 2). According to the M8 models, a total of thirteen genes are under significant positive selection in the BEB method, in which eight genes (atpH, clpP, psbA, rbcL, rpl14, rpoA, rps3 and ycf1) harbored one significant positive selection site, three genes (ccsA, matK and rpl16) possessed two significant positive selection sites, while the rpoC2 gene had 5 sites under positive selection. In addition, it was found that the ycf2 gene located in the IR region had the highest number of positive selection sites, including 13 significant positive selection and one extremely significant positive selection sites.

Table 2. Positive selected sites detected in the cp genome of the six Polystachya species.

M8

 

Region

Gene Name

Selected Sites

Pr (w>1)

Region

Gene Name

Selected Sites

Pr (w>1)

LSC

atpH

1774G

0.955*

LSC

rps3

13885I

0.953*

SSC

ccsA

17025G

0.966*

SSC/IRA

ycf1

18225L

0.957*

 


17026C

0.971*

IR

ycf2

15032C

0.952*

LSC

clpP

11668L

0.960*

 

 

15064A

0.951*

LSC

matK

441T

0.959*

 

 

15339E

0.954*

 


469Q

0.957*

 

 

15823S

0.957*

LSC

psbA

287R

0.961*

 

 

15839Q

0.956*

LSC

rbcL

9003I

0.958*

 

 

15948L

0.957*

LSC

rpl14

13618D

0.952*

 

 

16192L

0.951*

LSC

rpl16

13651F

0.950*

 

 

16216I

0.992**

 


13739S

0.951*

 

 

16224K

0.954*

LSC

rpoA

13093Y

0.960*

 

 

16356R

0.958*

LSC

rpoC2

2429L

0.953*

 

 

16395R

0.960*

 

 

2512M

0.957*

 

 

16402R

0.952*

 

 

2529E

0.957*

 

 

16424Y

0.953*

 

 

3003I

0.951*

 

 

16634I

0.950*

 

 

3376Y

0.959*

 


 

 

* p > 95%; ** p < 99%. 

Phylogenetic Analyses

To investigate the phylogenetic position of Polystachya in the Orchidaceae family and the relationship among the Polystachya species, 85 complete chloroplast genome sequences were used in this study. The phylogenetic analysis of the genus Polystachya within the Orchidaceae produced similar phylogenetic tree topology basing on both the Maximum Likelihood (ML) and Bayesian Inference (BI). Therefore, the ML topology was selected for discussion, and ML bootstrap (BS) and posterior probabilities (PP) values are given above branches (Figure 8). In all sampled species, Polystachya species formed a monophyletic clade with a strong support (Figure 8, BS, PP =100%, 1.00, respectively), where P. dendrolliflora is located at the base of this clade (Figure 8). Furthermore, the six Polystachya species and other all species of tribe Vandeae clustered together forming a monophyletic clade with strongly supported values (100%, 1.00). The phylogenetic relationships of the sampled Orchidaceae species are basically consistent with the results of Jin's (2019) study [17], except that the phylogenetic positions among the tribes of Vandeae, Epidendrea and Cymbidiinae are slightly different. The plastomes phylogeny of the 85 orchid species confirmed the relationship of (Apostasioideae (Vanilloideae (Cypripedioideae (Orchidoideae, Epidendroideae)))) at the subfamily level, and the phylogenetic relationships of the 16 tribes were also established.

Discussion

Sequence Variation

In this study, we collected six species of Polystachya and obtained their complete chloroplast genome sequences. As with most angiosperm, their cp genomes are generally inherited as haplotype, lack of recombination, within species cp genome structure was highly conserved [32, 47]. As expected, we found that the structure, gene content and gene order of the six Polystachya species were highly conserved, and ranged in size from 145,484 bp (P. tenuissima) to 149,274 bp (P. dendrolliflora), containing 79 PCGs, four rRNA genes, and 30 tRNA genes (Table 1, Figure 1). The length of the LSC, SSC, and IRs varied in the range of 84,046-89,021 bp, 16,914-18,821 bp, and 23,902-25,914 bp, respectively. Chloroplast genome size variation among different species, or even within different individuals of the same species, which has been reported in other species, such as Camptotheca acuminate [48, 49], Eucommia ulmoides [50], Rosa rugosa [51-53] and Calanthe davidii [54]. This, besides polyploidy of plant material, has been shown to be due to the expansion/contraction of IR [55]. The GC content in six Polystachya species was almost similar (36.9%-37.0%). Although Polystachya cp genomes were AT rich, the higher GC content in IRs region, is most likely due to the presence of rrn4.5, rrn5, rrn16, and rrn23 [56-58], which is consistent with the previously published Orchidaceae cp genomes [59, 60]. Non-coding region, especially introns may have accumulated mutations more rapidly than coding region, hence have an influence at gene expression level [61]. In this study, 16 genes possess introns among the six Polystachya species, and the lengths of their exons were almost the same, whereas the lengths of introns varied in all these genes. These changes in length may affect the size of the cp genome and gene expression. Moreover, trnK-UUU gene containing the longest intron, and rps12 are considered as a trans-splicing gene (Table S2), this identical phenomenon were consistent with those of most other members of Orchidaceae species [54, 62].

Inverted repeat (IR) expansion/contraction and sequence inversion represent important mechanisms of chloroplast genome rearrangement and show the diversity of cp genome structure in plants [63]. A detailed comparison on four IR/SC junctions of the six Polystachya plastomes showed that the border structures were highly similar with one another (Figure 4). Although the boundary regions of the cp genome sequences of the Polystachya species were relatively stable, we found expansion in IR regions, where rpl22 gene in the LSC region expanded by 23-66 bp to the IRB region and ycf1 gene in the SSC region expanded by 2-359 bp to the IRA region. Ycf1 gene was fully located in SSC region in P. adansoniae with 2 bp away from the SSC/IRA border. Jansen et al. (2011) reported that rpl22 had been transferred to the nucleus in Fagaceae [64], whether the rpl22 gene was also transferred into the nucleus in Polystachya species remains to be investigated. The pseudogenization of ycf1 gene is common in dicots, since the IR/SC boundary is within the ycf1 gene [26, 65]. An inversion of approximately 1kb, within the IRB/SSC boundary region, was observed in the cp genome of P. modesta. However, the region corresponding to the inversion region was located in the SSC region in the remaining five species. This inversion caused the rpl32 gene of P. modesta to reverse direction.

As a result of high of mutation, the potential SSR makers were detected for species authentication and reveal DNA sequence variation. Moreover, repeat sequences were also considered as one of the important reasons affecting gene duplication, expansion, and genome rearrangement [66-68]. A total of 58-73 SSRs and 11-37 long repeat sequences were identified, which were vastly distributed in the IGS region of LSC. The repeats in the coding regions were mainly located in exon of ycf1, ycf2, accD, ccsA, clpP, rpoA and rpoC2. Most of the SSRs types are mononucleotide repeats, and A/T (no C/G) is the only mononucleotide SSRs type in the six Polystachya species. A previous research suggested that polyA and polyT have a more stable framework compared to polyC and polyG [69]. The large variation of long repeats in close species may reflect a certain degree of evolutionary flexibility [70]. The mVISTA percent identity plot and sliding window analysis showed that the most divergent regions were located in the trnS-GCU-trnG-GCC, atpH-atpI, petA-psbJ, trnI-GAU intron, trnN-GUU-rpl32, rpl32-trnL-UAG and psaC-rps15 regions in the Polystachya plastomes. A comparative analysis of the six Polystachya species revealed that IR regions were mostly conserved and non-coding regions were more highly divergent than coding regions.

NDH complex coding genes lost or pseudogenized

The chloroplast NAD(P)H-dehydrogenase-like (NDH) complex is located in the thylakoid membrane and plays an important role in mediating photosystem I cyclic electron transport (PSI-CET) and facilitating chlororespiration. A chloroplast genome usually contain a total of 113 genes, comprising 6 atp, 11 ndh, 6 pet, 9 rpl, 4 rpo, 12 rps, 4 rrn, 5 psa, 15 psb, 30 trn and 11 ungrouped genes. However, we found that 11 cp-ndh genes were lost or pseudogenized in the six Polystachya cp genomes (Figure 9). Generally, ndh genes loss or pseudogenization is a common phenomenon in cp genomes of Orchidaceae, which is widespread in subfamilies Epidendroideae and Vanilloideae, and also occurs in subfamilies Cypripedioideae and Apostasioideae, but rare in the subfamily Orchidoideae. In addition, ndh genes are more frequently lost or pseudogenized in epiphytes than in terrestrial orchid plants [28, 71]. However, given that most orchid species in which ndh gene non-functionalization has occurred retain photosynthetic capacity, it is inferred that the function of this gene class may be affected by the nuclear or mitochondrial genomes. Sometimes, the gene function may be maintained by RNA editing after pseudogenization, but the possibility is low because most pseudogenization entails not only base changes but also indels. However, given that ndh genes were lost in all non-photosynthetic species, the loss of the ndh class gene is considered to be a precondition for the development of mycoheterotrophs. Furthermore, the ndh gene loss events also occur in other plants, such as some species of Apodanthaceae, Circaeasteraceae, Corydalis, Geraniaceae, Pinales [72-78], where have been reported to lose almost the entire set of cp-ndh genes [79]. The loss of plastome genes may be due to the transfer into the nucleus, substitution of a nuclear encoded mitochondrial targeted gene or substitution of a nuclear gene for a plastid gene. Translocation of ndh genes to the chondriome in Cymbidium has been reported, and the different levels of ndh gene degradation among even closely related species in Cymbidium may be due to multiple bidirectional intracellular gene transfers between two organellar genomes [80]. Similarly, Picea have also reported ndh translocations from the plastid to the nuclear genome during the course of evolution [73]. In Erycina pusilla, at least 10 truncated ndh gene fragments were transferred to the mitochondrial (mt) genome [81]. Using comparative genome analyses, Lin et al. found that nuclear NDH-related genes have also been lost in orchids without cp-ndh genes [81, 82].

Divergent hotspots and adaptive evolution

The divergent regions as molecular markers could provide abundant valuable information for DNA barcoding and phylogenetic studies, and numerous phylogenetic reconstructions researches using divergent hotspots [67, 83]. We found that the nucleotide sequence diversity was higher in the non-coding regions than the coding regions, which is generally consistent with most previous studies on the chloroplast genomes of angiosperms [84]. In our study, we compared interspecific chloroplast diversity in six Polystachya species, which indicated that IR regions were mostly conserved and non-coding regions in the cp genome had most the variation compared to protein-coding regions. This finding is generally consistent with most previous studies of the cp genomes of angiosperms [84]. Among the seven IGS regions (rps19-psbA, trnS-GCU-trnG-GCC, trnG-GCC-trnR-UCU, atpH-atpI, psbT-psbN, rpoA-rps11 and trnL-UAG-ccsA) have high nucleotide diversity values (Pi > 0.05) which were most divergent regions identified (Figure 7). No mutation existed in IRs regions. Based on the calculated Pi values for each gene region, we have conducted a comprehensive comparative analysis of all published chloroplast genomic data in Polystachya, and observed that IR region had a lower diversity than LSC and SSC regions, trnS-GCU-trnG-GCC with the highest variability values. These regions could potentially serve as universal candidate DNA barcodes for Polystachya species marker identification studies. It is noteworthy that the chosen of divergent regions and the value of Pi are related to species selected, different collections were used, and different molecular markers will be selected out consequently. Thus, more suitable and accurate barcodes need to be further explored according to different sample categories.

The dN/dS ratios have been widely used to infer the evolutionary dynamics and identify adaptive signatures among species [85]. Some studies indicates that the variation rates of the chloroplast genomes can be influenced under different environmental pressures, which may be the main reason for the differences in the number of genes in the cp genomes of some genera or species [86]. In our study, a total of 107 positively selected sites (including 32 significant and 1 extremely significant sites) were detected in the BEB method, and distributed in atpH, ccsA, clpP, matK, psbA, rbcL, rpl14, rpl16, rpoA, rpoC2, rps3, ycf1 and ycf2 genes (Table 3). Of these loci, ycf2 gene has the largest number of significant positively selected sites, and although it is the longest plastid gene reported in angiosperms and its function is yet largely unknown [87]. Despite this, ycf2 provides a low-cost alternative to comprehensive multigene or genome datasets for investigating angiosperm relationships [88, 89]. Positive selection has also been found in ycf2 in Bulbophyllum [90] and Malvaceae [91]. In addition, the genes ccsA, clpP, matK, rbcL, rpl14, rpl16, rpoA, rpoC2, ycf1 and ycf2 identified in this study are also the targets of positive selection in many other flowering plants [91-93].

Phylogenetic relationship

The chloroplast genomes have been widely employed as an important data to resolve lineages within species phylogenetic analysis. Especially, the whole cp genomes or protein coding regions have been considered more effective and accurate than the use of single gene sequences or several fragments [94, 95]. Consensus trees from BI and ML were almost similar, but the BI tree had higher resolution. Polystachya species formed a monophyly with strong support (BS, PP =100%, 1.00, respectively) based on the protein coding sequences of cp genomes. The monophyly of the Polystachya has been similarly inferred in previous studies [6]. P. tenuissima (sect. Cultriformes) was sister to P. bennettiana (sect. Caulescentes), whereas P. modesta (sect. Polystachya) and P. concreta (sect. Polystachya) were sister taxa. These two clades are more closely related to each other than to P. adansoniae (sect. Polychaete). P. dendrolliflora represents the earliest extant lineage to diverge from the rest of the genus. Mytnik-Ejsmont (2011) proposed to divide P. dendrolliflora into an independent genus Dendrobianthe. Chase (2015) argued that there were no convincing arguments to split up the monophyletic and easily recognized genus Polystachya into smaller genera. Thus, Dendrobianthe dendrobiiflora is still treated as a synonym of P. dendrobiiflora (in TPL, POWO, IPNI). P. modesta is morphologically the most similar to pantropical tetraploid (including P. concreta) accessions and could be one of the parent species [12]. Morphological homology among species may be caused by introgression, plastid capture, hybridization and other factors, which makes it difficult for species delimitations in the genus Polystachya. Furthermore, the six Polystachya species and other all species of tribe Vandeae clustered together forming a monophyletic clade with strongly supported values (BS, PP=100%, 1.00). The phylogenetic relationships of all the sampled Orchidaceae species above subtribes are basically consistent with the results of Jin's (2019) study based on cpCDS, except that the phylogenetic positions among the tribes of Vandeae, Epidendrea and Cymbidiinae are slightly different [17]. However, the results for Orchidaceae phylogenetic analysis of this study based on the mtCDS are not completely consistent with those based on the cpCDS. Due to only one Polystachya sampling in his study, the phylogenetic position of the genus Polystachya in Orchidaceae (Vandeae or Malaxideae?) remains to be further explored. We recommend the combination of our finding with either mitochondrial genome data, ITS or low-copy nuclear genes to explore the exact phylogenetic position of Polystachya in Orchidaceae. Although all phylogenetic relationships cannot be resolved by using only the complete cp genomes, our results suggest that plastome-wide analysis will provide higher a resolution for some disputed taxonomic relationships. In addition, the classification of the genus has not been well solved up to now due to species delimitation difficult, hybridization, introgression, low resolution, incomplete lineage sorting, chloroplast capture and reticulation evolution [6, 12]. Thus, it is necessary to select high-resolution sequences on the basis of wide sampling to reconstruct the phylogenetic tree of the genus Polystachya for further elucidate the phylogeny and evolution within the genus.

Conclusions

The complete chloroplast genomes of Polystachya adansoniae, P. bennettiana, P. dendrobiiflora, P. tenuissima, P. modesta and P. concreta of the genus Polystachya were reported in this study. The cp genomes length and gene content of the six Polystachya species were comparatively conserved. Although genomic structure and size were highly conserved, the IR/SC boundary regions were variable among these six cp genomes of Polystachya, and about 1kb sequence reversal occurred in the IRB/SSC region of the P. modesta. The different types of the codon preference, which had the vital roles in studying species’ evolution, were established. We also identified SSRs and TFPRC repeat sequences, 13 positive selection genes and 16 variable regions, which provide a reference for developing tools to further studies of Polystachya species. Furthermore, the phylogenetic constructions of the 85 cp genomes illustrated robust and consistent relationships with high support. Phylogenetic analysis indicated that the genus Polystachya was more closely related to tribe Vandeae. The cp genomes will contribute to the development of genetic resources in resolving the phylogenetic analysis and species authentication of Polystachya, and in facilitating the exploration their structural differences. Nevertheless, only limited species were implored in this study, and thus, we believe that further studies such as multiple molecular data (chloroplast genome, mitochondrial genome and transcriptome, etc.) analysis based on more extensive sampling of Polystachya species, are needed to clarify the molecular evolution and phylogenetic relationship of Polystachya

Materials And Methods

Samplings and DNA Extraction

Plant materials of the six Polystachya species: Polystachya dendrobiiflora Rchb.f. (voucher number: HGW-M229), Polystachya adansoniae Rchb.f. (voucher number: HGW-M230), Polystachya tenuissima Kraenzl. (voucher number: HGW-M231), Polystachya bennettiana Rchb.f. (voucher Number: HGW-M232), Polystachya modesta Rchb.f. (voucher number: HGW-M233) and Polystachya concreta (Jacq.) Garay & H.R.Sweet (voucher number: HGW-M234) were collected from Kenya in the joint field investigations performed by the National Museums of Kenya (NMK) and Sino-Africa Joint Research Center, CAS (SAJOREC) during 2015 to 2018. Young and fresh leaves were sampled and immediately preserved using silica gel [96]. The voucher specimens were deposited at the East African Herbarium (EA) in National Museums of Kenya and the Herbarium of Wuhan Botanical Garden, CAS (HIB). Total genomic DNA was extracted using a modified cetyltrimethylammonium bromide (CTAB) method [97]. DNA integrity was examined by electrophoresis in 1% (w/v) agarose gel, their purity was determined using a NanoDrop spectrophotometer 2000 (Thermo Scientific; Waltham, MA, USA) at 260 and 280 nm, and precisely quantify DNA concentration with Qubit 2.0 (Life Technologies, CA, USA).

High Throughput Sequencing, Genome Assembly and Annotation

The purified high-quality genomic DNA was used to construct paired-end (PE) libraries by shearing the genomic DNA into short fragments of approximately 350 bp before sequencing in 150 bp paired-end mode was implemented on an Illumina HiSeq 2500 platform (Illumina, Inc., San Diego, CA, USA) at Novogene Company (Tianjin, China). Genomes assembly were performed using GetOrganelle v1.7.1 [98] and CLC Genomic Workbench v10 (CLC Bio., Aarhus, Denmark) with the default parameters. The quality of the newly assembled genomes was evaluated on read level basis by aligning the trimmed raw reads to the de novo assemblies using Geneious mapper, Geneious v8.0.2 with medium- to low-sensitivity option and iteration up to five times [99, 100]. The resulting complete chloroplast genomes were automatically annotated using the Perl script Plastid Genome Annotator (PGA) and the Annotation of Organellar Genomes (GeSeq) (https://chlorobox.mpimp-golm.mpg.de/geseq.html) [101, 102]. Further annotation confirmation was performed with published genomes the cp genomes of the tribe Vandeae, Cymbidieae and Epidendreae sampled species in this study were used as the reference sequences. According to the annotation results of the two softwares, manual corrections of start/stop codons and intron/exon boundaries were performed in Geneious v8.0.2. All transfer RNA (tRNA) genes were proofread with the web server tRNAscan-SE v2.0 (http://lowelab.ucsc.edu/tRNAscan-SE/) [103]. A gene was classified as a pseudogene if its reading frame was truncated (incl. due to a premature stop codon) or frameshifted compared with Orchidaceae species [104]. Physical maps of the circular plastomes were visualized using the Organelle Genome DRAW (OGDRAW) (http://ogdraw.mpimp-golm.mpg.de/) [105]. All annotated complete cp genome sequences were deposited into GenBank under the accession numbers listed in Table 1 and Table S1 .

Repeat Sequences Analysis

Simple sequence repeat (SSR) markers were identified in these plastome sequences using Phobos v3.3.12 [106] and SSR Hunter v1.3 [107] with minimum repeat thresholds of 10 for mononucleotide (mono-) repeats, 5 for dinucleotide (di-) repeats, 4 for trinucleotide (tri-) repeats, and 3 for tetranucleotide (tetra-), pentanucleotide (penta-), and hexanucleotide (hexa-) repeats. The size and location of larger repeat sequences including forward (F), palindromic (P), and reverse (R) repeats were searched using online program REPuter (https://bibiserv.cebitec.uni-bielefeld.de/reputer) [108] with the set as: Hamming distance of 3 and minimum repeat size of 30 bp. Tandem Repeats Finder v4.07 (https://tandem.bu.edu/trf/trf.html) [109] was employed to discover tandem repeats (T) using default settings.

Comparison and Divergence Hotspot Identification Analysis

To investigate the structure difference exists, the expansion/contraction of the IR regions was assessed by comparing the positions of SC/IR junctions and their adjacent genes using IRscope [110], complete chloroplast genomes were aligned with MAFFT v7. The differentiation regions for the cp genomes of the ten newly sequenced species were performed and plotted by the mVISTA online program (http://genome.lbl.gov/vista/mvista/submit.shtml) [111] in Shuffle-LAGAN model, and with Polystachya dendrobiiflora as the reference. Genome rearrangement, inversions and gene synteny was detected using MAUVE [112] with default settings. All protein-coding genes of each genome sequence were extracted in Geneious v8.0.2 to examine the relative synonymous codon usage (RSCU) using MEGA v7.0 [113]. The codon usage frequency and relative synonymous codon usage (RSCU) of the six species were conducted based on 78 PCGs using MEGA v7.0. The nucleotide diversity (Pi) of each plastome was implemented to evaluate by DnaSP v6 [114].

Positive Selection Analysis

The positive selection sites were evaluated in 68 protein-coding sequences (CDS) of the chloroplast genomes of six Polystachya species using the CodeML program in the PAML package implemented in EasyCodeML software [115]. The software was used for calculating the ratios of nonsynonymous (dN) and synonymous substitution (dS) (ω = dN/dS). In preset mode, we used site model based on the default four pairs of site-specific models (M0 vs M3, M1a vs. M2a, M7 vs. M8, and M8a vs. M8) to elucidate the signatures of adaptation across cp genomes with likelihood ratio test (LRT). The p value of Chi square (χ2) less than 0.05 was considered to be significant. In addition, the EasyCodeML required inputs for analysing selection are the aligned sequences in PAML format and a tree file in Newick format (without branch length and numerical value). Here, each single-copy CDS sequences were aligned using codons in the MAFFT v7 program [116] and then concatenated using the program Concatenate Sequence. The analyses of selective pressures were conducted along the ML tree in Newick format, which based on the whole CDS region was used to determine the phylogenetic relationships of these species. Comparing the four pairs of site-specific models, M7 vs. M8 was calculated to identify positive selection sites based on both ω and LRTs values. The Bayes empirical Bayes (BEB) method was used to statistically identify sites under positive selection with posterior probabilities ≥ 0.95 [90].

Phylogenetic Analysis

Phylogenetic relationship analysis was done based on the 79 CDS of 85 species chloroplast genomes, of which six Polystachya species were newly sequenced in this study and other 77 cp genome sequences of Orchidaceae species downloaded from NCBI database for exploring the phylogenetic position of Polystachya in Orchidaceae. These sampling Orchidaceae materials represented 5/5 subfamilies, 16/22 tribes and 20/49 subtribes of Orchidaceae. According to previous studies [17, 28], Allium cepa (MK335926, Allioideae) and Iris gatesii (KM014691, Iridoideae) were chosen as outgroups. GenBanK accession numbers and detailed information of all samples used in this study are listed in Table S1. Multiple sequence alignment was performed using MAFFT v7 plugin integrated into PhyloSuite v1.1.15 [117]. Phylogenetic analyses were performed using two different algorithms: Maximum Likelihood (ML) and Bayesian Inference (BI). ML phylogenies were conducted using RAxML v8.2.12 [118] with 1,000 bootstrap replicates and the GTRGAMMA model. BI phylogenies were inferred using MrBayes 3.2.6 [119] with the best-fitting substitution model by ModelFinder [120]. The Markov Chain Monte Carlo (MCMC) algorithm was run for 2,000,000 generations, sampling every 1000 generations, in which the initial 25% of sampled data are discarded as burn-in. Samples were combined and convergence of chains was checked in Tracer v1.7.2 [121]. Figtree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/) was used to visualize and annotate trees.

Abbreviations

NGS: Next-generation sequencing; cp: Chloroplast; LSC: Large single copy; SSC: Small single copy; SC: Single copy; IR: Inverted repeat; CDS: Coding sequence; RSCU: Relative synonymous codon usage; SSR: Simple sequence repeats; Pi: DNA polymorphism; ML: Maximum likelihood; BI: Bayesian Inference.

Declarations

Acknowledgments: We are grateful to the National Museums of Kenya (NMK) for the acquisition of all legal permits required and allowing us to use the facilities of the EA Herbarium, the National Commission of Science, Technology and Innovation (NACOSTI) for granting research permit, the Kenya Forest Service (KFS) and Kenya Wildlife Service (KWS) for granting fieldwork permits, the KFS and KWS rangers for providing security during the field investigations. We also wish to thank National Wild Plant Germplasm Resource Center for all kinds of support. We acknowledged Peninah Cheptoo Rono, Fredrick Munyao Mutie and Vincent Okelo Wanga for giving comments on the manuscript paper. We thank the editor and the constructive comments of the four anonymous reviewers who helped us to improve the manuscript.

Author Contributions: HGW, ZZX and GM collected and identified the plant materials; WQF, HGW and ZCF designed and supervised the research; JH performed the experiment, analyzed the data and drafted the manuscript; HGW, ZCF, TJ, YJX, DX and ZZX repeatedly proof-read the manuscript. All authors have read and approved the final manuscript.

Funding: This work was supported by the International Partnership Program of Chinese Academy of Sciences (151853KYSB20190027), the National Natural Science Foundation of China (31970211) and Sino-Africa Joint Research Center, CAS (SAJC202101).

Availability of data and materials

All the newly sequenced sequences in this study are available from the National Center for Biotechnology Information (NCBI) (accession numbers: OK930071-OK930076; see Table 1 and Additional Table 1: Table S1). Information for other samples used for phylogenetic analysis download from Genbank can be found in Additional Table 1: Table S1.

Ethics approval and consent to participate

The relevant permits for this research were granted by National Commission of Science, Technology and Innovation (NACOSTI) of Kenya (NACOSTI/P/19/20003/28091), Kenya Wildlife Service (KWS) and Kenya Forest Service (KFS). No materials from animal or human were used in this research.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Author details

1 CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China. 2 Sino-Africa Joint Research Center, Chinese Academy of Sciences, Wuhan 430074, China. 3 University of Chinese Academy of Sciences, Beijing, 100049, China. 4 East African Herbarium, National Museums of Kenya, P.O. Box 45166, 00100 Nairobi, Kenya.

References

1. Dressler RL. Phylogeny and classification of the orchid family. Cambridge: Cambridge University Press; 1993.

2. Peraza-Flores LN, Fernández-Concha GC, Romero-González GA. Taxonomic notes in American Polystachya (Orchidaceae): the identity of P. foliosa (Hook.) Rchb.f. and the reestablishment of P. caracasana Rchb.f. J Torrey Bot Soc. 2011;138(4):366-380.

3. Mytnik-Ejsmont J, Szlachetko DL, Baranow P, Górniak M. A phylogenetic and morphological study of Polystachya sect. Superpositae (Orchidaceae) with description of a new species from Cameroon. Plant Syst Evol. 2013;300(1):19-28.

4. Mytnik-Ejsmont J, Baranow P. Taxonomic study of Polystachya Hook. (Orchidaceae) from Asia. Plant Syst Evol. 2010;290(1-4):57-63.

5. Mytnik-Ejsmont J, Szlachetko DL, Górniak M. Chelystachya, a new genus of the subtribe Polystachyinae (Orchidaceae). Biodiv Res Conserv. 2011;23(1):15-27.

6. Russell A, Samuel R, Rupp B, Barfuss MHJ, Šafran M, Besendorfer V, et al. Phylogenetics and cytology of a pantropical orchid genus Polystachya (Polystachyinae, Vandeae, Orchidaceae): Evidence from plastid DNA sequence data. Taxon. 2010a;59(2):389-404.

7. Govaerts, R., Campacci, M. A., Baptista, D. H., Cribb, P. J., George, A., Kreuz, K., and Wood, J. World checklist of Orchidaceae. Royal Botanic Gardens. 2021.

8. La Croix I. African orchids in the wild and in cultivation. Portland: Timber Press; 1997.

9. Mytnik-Ejsmont J, Kevin DL, Magdalena N, Dorota Ł, Joanna K, Dariusz SL. Labellum and gynostemium micromorphology in Polystachya (Orchidaceae). Plant Syst Evol. 2020;307(1).

10. Šafran M. Ploidity level and heterochromatin distribution in species from genus Polystachya (Orhidaceae). University of Zagreb. Faculty of Science. Department of Biology; 2012.

11. Rupp B, Samuel R, Russell A, Temsch EM, Chase MW, Leitch IJ. Genome size in Polystachya (Orchidaceae) and its relationships to epidermal characters. Bot J Linn Soc. 2010;163(2):223-233.

12. Russell A, Samuel R, Klejna V, Barfuss MHJ, Rupp B, Chase MW. Reticulate evolution in diploid and tetraploid species of Polystachya (Orchidaceae) as shown by plastid DNA sequences and low-copy nuclear genes. Ann Bot. 2010b;106(1):37-56.

13. Freudenstein JV, Rasmussen FN. What does morphology tell us about orchid relationships?—A cladistic analysis. Am J Bot. 1999;86(2):225-248.

14. Carlsward BS, Whitten WM, Williams NH, Bytebier B. Molecular phylogenetics of Vandeae (Orchidaceae) and the evolution of leaflessness. Am J Bot. 2006a;93(5):770-786.

15. van den Berg C, Goldman DH, Freudenstein JV, Pridgeon AM, Cameron KM, Chase MW. An overview of the phylogenetic relationships within Epidendroideae inferred from multiple DNA regions and recircumscription of Epidendreae and Arethuseae (Orchidaceae). Am J Bot. 2005;92(4):613-624.

16. Carlsward BS, Stern W, Bytebier B. Comparative vegetative anatomy and systematics of the angraecoids (Vandeae, Orchidaceae) with an emphasis on the leafless habit. Bot J Linn Soc. 2006b;151(2):165-218.

17. Li YX, Li ZH, Schuiteman A, Chase MW, Li JW, Huang WC, et al. Phylogenomics of Orchidaceae based on plastid and mitochondrial genomes. Mol Phylogenet Evol. 2019;139:106540.

18. Mytnik-Ejsmont J. A monograph of the subtribe Polystachyinae Schltr. (Orchidaceae): Fundacja Rozwoju Uniwersytetu Gdanskiego; 2011.

19. Russell A, Samuel R, Bogarin D, Fernando S, Wijesundera S, Klejna V, et al. Genetic variation and phylogenetic relationships of a pantropical species group in Polystachya (Orchidaceae). Bot J Linn Soc. 2011;165(3):235-250.

20. Cribb PJ. Studies in the genus Polystachya (Orchidaceae) in Africa. Kew Bull. 1978;32(4):743-766.

21. de Abreu NL, Alves RJV, Cardoso SRS, Bertrand YJK, Sousa F, Hall CF, et al. The use of chloroplast genome sequences to solve phylogenetic incongruences in Polystachya Hook (Orchidaceae Juss). PeerJ. 2018;6:e4916.

22. Rieseberg LH, Soltis DE. Phylogenetic consequences of cytoplasmic gene flow in plants. Evol Trends Plants. 1991;5(1):65-84.

23. Moore MJ, Bell CD, Soltis PS, Soltis DE. Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms. P Natl A Sci. 2007;104(49):19363-19368.

24. Jansen RK, Cai ZQ, Raubeson LA, Daniell H, dePamphilis CW, Leebens-Mack J, et al. Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. P Natl A Sci. 2007;104:19369-19374.

25. Barrett CF, Davis JI, Leebens-Mack J, Conran JG, Stevenson DW. Plastid genomes and deep relationships among the commelinid monocot angiosperms. Cladistics. 2013;29(1):65-87.

26. Liu DK, Tu XD, Zhao Z, Zeng MY, Zhang S, Ma L, et al. Plastid phylogenomic data yield new and robust insights into the phylogeny of Cleisostoma-Gastrochilus clades (Orchidaceae, Aeridinae). Mol Phylogenet Evol. 2020;145:106729.

27. Kim YK, Jo S, Cheon SH, Kwak M, Kim YD, Kim KJ. Plastome evolution and phylogeny of subtribe Aeridinae (Vandeae, Orchidaceae). Mol Phylogenet Evol. 2020;144:106721.

28. Kim YK, Jo S, Cheon SH, Joo MJ, Hong JR, Kwak M, et al. Plastome evolution and phylogeny of Orchidaceae, with 24 new sequences. Front Plant Sci. 2020;11:22.

29. Xiang XG, Jin WT, Li DZ, Schuiteman A, Huang WC, Li JW, et al. Phylogenetics of tribe Collabieae (Orchidaceae, Epidendroideae) based on four chloroplast genes with morphological appraisal. PLoS One. 2014;9(1):e87625.

30. Luo J, Hou BW, Niu ZT, Liu W, Xue QY, Ding XY. Comparative chloroplast genomes of photosynthetic orchids: insights into evolution of the Orchidaceae and development of molecular markers for phylogenetic applications. PLoS One. 2014;9(6):e99016.

31. Zhai W, Duan XS, Zhang R, Guo C, Li L, Xu GX, et al. Chloroplast genomic data provide new and robust insights into the phylogeny and evolution of the Ranunculaceae. Mol Phylogenet Evol. 2019;135:12-21.

32. Birky CW. Uniparental inheritance of mitochondrial and chloroplast genes: mechanisms and evolution. Proc Natl Acad Sci USA. 1995;92(25):11331.

33. Jansen RK, Raubeson LA, Boore JL, dePamphilis CW, Chumley TW, Haberle RC, et al. Methods for obtaining and analyzing whole chloroplast genome sequences. Methods Enzymol. 2005;395:348-384.

34. Ruhlman TA, Jansen RK. The plastid genomes of flowering plants. In: Chloroplast Biotechnology: Methods and Protocols. Edited by Maliga P. Totowa, NJ: Humana Press; 2014: 3-38.

35. Daniell H, Lin CS, Yu M, Chang WJ. Chloroplast genomes: diversity, evolution, and applications in genetic engineering. Genome Biol. 2016;17(1):134.

36. Palmer JD. Comparative organization of chloroplast genomes. Annu Rev Genet. 1985;19(1):325-354.

37. Zheng XM, Wang JR, Feng L, Liu S, Pang HB, Qi L, et al. Inferring the evolutionary mechanism of the chloroplast genome size by comparing whole-chloroplast genome sequences in seed plants. Sci Rep. 2017;7(1):1555.

38. Ravi V, Khurana JP, Tyagi AK, Khurana P. An update on chloroplast genomes. Plant Syst Evol. 2007;271(1-2):101-122.

39. Parks M, Cronn R, Liston A. Increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes. BMC Biol. 2009;7:84.

40. Asaf S, Khan AL, Khan AR, Waqas M, Kang SM, Khan MA, et al. Complete chloroplast genome of Nicotiana otophora and its comparison with related species. Front Plant Sci. 2016;7:843.

41. An WL, Li J, Yang ZR, Huang YY, Huang S, Zheng XS. Characteristics analysis of the complete Wurfbainia villosa chloroplast genome. Physiol Mol Biol Plants. 2020;26(4):747-758.

42. Wu ML, Li Q, Hu ZG, Li XW, Chen SL. The complete Amomum kravanh chloroplast genome sequence and phylogenetic analysis of the commelinids. Molecules. 2017;22(11):1875.

43. Zong D, Zhou AP, Zhang Y, Zou XL, Li D, Duan A, et al. Characterization of the complete chloroplast genomes of five Populus species from the western Sichuan plateau, southwest China: comparative and phylogenetic analyses. PeerJ. 2019;7:e6386-e6386.

44. Wang RJ, Cheng CL, Chang CC, Wu CL, Su TM, Chaw SM. Dynamics and evolution of the inverted repeat-large single copy junctions in the chloroplast genomes of monocots. BMC Evol Biol. 2008;8:36.

45. Li YT, Zhang J, Li LF, Gao LJ, Xu JT, Yang MS. Structural and comparative analysis of the complete chloroplast genome of Pyrus hopeiensis-“wild plants with a tiny population”-and three other Pyrus species. Int J Mol Sci. 2018;19(10):3262.

46. Yang ZH, Nielsen R. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol. 2000;17(1):32-43.

47. Wicke S, Schneeweiss GM. Next-generation organellar genomics: potentials and pitfalls of high-throughput technologies for molecular evolutionary studies and plant systematics. 2015;158.

48. Yang ZY, Ji YH. Comparative and phylogenetic analyses of the complete chloroplast genomes of three Arcto-Tertiary Relicts: Camptotheca acuminata, Davidia involucrata, and Nyssa sinensis. Front Plant Sci. 2017;8:1536.

49. Wang WW, Liu H, He Q, Yang WL, Chen ZY, Wang MC, et al. Characterization of the complete chloroplast genome of Camptotheca acuminata. Conserv Genet Resour. 2016;9(2):241-243.

50. Wang WC, Chen SY, Zhang XZ. Whole-Genome Comparison Reveals Heterogeneous Divergence and Mutation Hotspots in Chloroplast Genome of Eucommia ulmoides Oliver. Int J Mol Sci. 2018;19(4):1037.

51. Yin XM, Liao BS, Guo S, Liang CL, Pei J, Xu J, et al. The chloroplasts genomic analyses of Rosa laevigata, R. rugosa and R. canina. Chin Med. 2020;15:18.

52. Kim YS, Heo KI, Nam S, Xi H, Lee S, Park J. The complete chloroplast genome of candidate new species from Rosa rugosa in Korea (Rosaceae). Mitochondrial DNA B. 2019;4(2):2433-2435.

53. Jiang H, He J, Meng J. Characterization of the complete plastid genome of a chinese endangered species Rosa rugosa Thunb. Mitochondrial DNA B. 2019;4(1):1679-1680.

54. Chen YQ, Zhong H, Zhu YT, Huang YZ, Wu SS, Liu ZJ, et al. Plastome structure and adaptive evolution of Calanthe s.l. species. PeerJ. 2020;8:e10051.

55. Dodsworth S, Leitch AR, Leitch IJ. Genome size diversity in angiosperms and its influence on gene space. Curr Opin Genet Dev. 2015;35:73-78.

56. Curci PL, De Paola D, Danzi D, Vendramin GG, Sonnante G. Complete chloroplast genome of the multifunctional crop globe artichoke and comparison with other Asteraceae. PLoS One. 2015;10(3):e0120589.

57. Wang ML, Wang X, Sun JH, Wang YH, Ge Y, Dong WP, et al. Phylogenomic and evolutionary dynamics of inverted repeats across Angelica plastomes. BMC plant biology. 2021;21(1):26.

58. Yang XM, Zhou TT, Su XY, Wang GB, Zhang XH, Guo QR, et al. Structural characterization and comparative analysis of the chloroplast genome of Ginkgo biloba and other gymnosperms. J For Res. 2020;32(2):765-778.

59. Dong WL, Wang RN, Zhang NY, Fan WB, Fang MF, Li ZH. Molecular evolution of chloroplast genomes of orchid species: Insights into phylogenetic relationship and adaptive evolution. Int J Mol Sci. 2018;19(3):716.

60. Vu HT, Tran N, Nguyen TD, Vu QL, Bui MH, Le MT, et al. Complete chloroplast genome of Paphiopedilum delenatii and phylogenetic relationships among Orchidaceae. Plants. 2020;9(1):61.

61. Xu JW, Feng DJ, Song GS, Wei XL, Chen L, Wu XL, et al. The first intron of rice EPSP synthase enhances expression of foreign gene. Sci China Ser C. 2003;46(6):561-569.

62. Jiang K, Miao LY, Wang ZW, Ni ZY, Hu C, Zeng XH, et al. Chloroplast genome analysis of two medicinal Coelogyne spp. (Orchidaceae) shed light on the genetic Information, comparative genomics, and species identification. Plants. 2020;9(10).

63. Wicke S, Schneeweiss GM, dePamphilis CW, Muller KF, Quandt D. The evolution of the plastid chromosome in land plants: gene content, gene order, gene function. Plant Mol Biol. 2011;76(3-5):273-297.

64. Jansen RK, Saski C, Lee SB, Hansen AK, Daniell H. Complete plastid genome sequences of three Rosids (Castanea, Prunus, Theobroma): evidence for at least two independent transfers of rpl22 to the nucleus. Mol Biol Evol. 2011;28(1):835-847.

65. Zavala-Páez M, Vieira LDN, de Baura VA, Balsanelli E, de Souza EM, Cevallos MC, et al. Comparative plastid genomics of neotropical Bulbophyllum (Orchidaceae; Epidendroideae). Front Plant Sci. 2020;11:799.

66. Echt CS, DeVerno LL, Anzidei M, Vendramin GG. Chloroplast microsatellites reveal population genetic diversity in red pine, Pinus resinosa Ait. Mol Ecol Resour. 1998;7:307-316.

67. Nie XJ, Lv SZ, Zhang YX, Du XH, Wang L, Biradar SS, et al. Complete chloroplast genome sequence of a major invasive species, crofton weed (Ageratina adenophora). PLoS One. 2012;7(5):e36869.

68. Nishikawa T, Vaughan DA, Kadowaki KI. Phylogenetic analysis of Oryza species, based on simple sequence repeats and their flanking nucleotide sequences from the mitochondrial and chloroplast genomes. Theor Appl Genet. 2005;110(4):696-705.

69. Gragg H, Harfe BD, Jinks-Robertson S. Base composition of mononucleotide runs affects DNA polymerase slippage and removal of frameshift intermediates by mismatch repair in Saccharomyces cerevisiae. Mol Cell Biol. 2002;22(24):8756-8762.

70. King DG, Soller M, Kashi Y. Evolutionary tuning knobs. Endeavour. 1997;21(1):36-40.

71. Kim HT, Kim JS, Moore MJ, Neubig KM, Williams NH, Whitten WM, et al. Seven new complete plastome sequences reveal rampant independent loss of the ndh Gene family across orchids and associated instability of the inverted repeat/small single-copy region boundaries. PLoS One. 2015;10(11):e0142215.

72. Xu X, Wang D. Comparative chloroplast genomics of Corydalis species (Papaveraceae): Evolutionary perspectives on their unusual large scale rearrangements. Front Plant Sci. 2020;11:600354.

73. Ranade SS, García-Gil MR, Rosselló JA. Non-functional plastid ndh gene fragments are present in the nuclear genome of Norway spruce (Picea abies L. Karsch): insights from in silico analysis of nuclear and organellar genomes. Mol Genet Genomics. 2016;291(2):935-941.

74. Bellot S, Renner SS. The plastomes of two species in the endoparasite genus Pilostyles (Apodanthaceae) each retain just five or six possibly functional genes. Genome Biol Evol. 2015;8(1):189-201.

75. Wakasugi T, Tsudzuki J, Ito S, Nakashima K, Tsudzuki T, Sugiura M. Loss of all ndh genes as determined by sequencing the entire chloroplast genome of the black pine Pinus thunbergii. Proc Nati Acad Sci USA. 1994;91(21):9794-9798.

76. Sun YX, Moore MJ, Lin N, Adelalu KF, Meng AP, Jian SG, et al. Complete plastome sequencing of both living species of Circaeasteraceae (Ranunculales) reveals unusual rearrangements and the loss of the ndh gene family. BMC Genomics. 2017;18(1):592.

77. Ni ZX, Ye YJ, Bai TD, Xu M, Xu LA. Complete chloroplast genome of Pinus massoniana (Pinaceae): gene rearrangements, loss of ndh genes, and short inverted repeats contraction, expansion. Molecules. 2017;22(9):1528.

78. Chris Blazier J, Guisinger MM, Jansen RK. Recent loss of plastid-encoded ndh genes within Erodium (Geraniaceae). Plant Mol Biol. 2011;76(3-5):263-272.

79. Braukmann TW, Kuzmina M, Stefanović S. Loss of all plastid ndh genes in Gnetales and conifers: extent and evolutionary significance for the seed plant phylogeny. Curr Genet. 2009;55(3):323-337.

80. Kim HT, Chase MW. Independent degradation in genes of the plastid ndh gene family in species of the orchid genus Cymbidium (Orchidaceae; Epidendroideae). PLoS One. 2017;12(11):e0187318.

81. Lin CS, Chen JJ, Huang YT, Chan MT, Daniell H, Chang WJ, et al. The location and translocation of ndh genes of chloroplast origin in the Orchidaceae family. Sci Rep. 2015;5:9040.

82. Lin CS, Chen JJW, Chiu CC, Hsiao HCW, Yang CJ, Jin XH, et al. Concomitant loss of NDH complex-related genes within chloroplast and nuclear genomes in some orchids. Plant J. 2017;90(5):994-1006.

83. Menezes APA, Resende-Moreira LC, Buzatti RSO, Nazareno AG, Carlsen M, Lobo FP, et al. Chloroplast genomes of Byrsonima species (Malpighiaceae): comparative analysis and screening of high divergence sequences. Sci Rep. 2018;8(1):2210.

84. Clegg MT, Gaut BS, Learn GH, Morton BR. Rates and patterns of chloroplast DNA evolution. Proc Nati Acad Sci USA. 1994;91(15):6795-6801.

85. Yang ZH, Bielawski JP. Statistical methods for detecting molecular adaptation. Trends Ecol Evol. 2000;15(12):496-503.

86. Zuo LH, Shang AQ, Zhang S, Yu XY, Ren YC, Yang MS, et al. The first complete chloroplast genome sequences of Ulmus species by de novo sequencing: Genome comparative and taxonomic position analysis. PLoS One. 2017;12(2):e0171264.

87. Drescher A, Ruf S, Calsa Jr T, Carrer H, Bock R. The two largest chloroplast genome-encoded open reading frames of higher plants are essential genes. Plant J. 2000;22(2):97-104.

88. Huang JL, Sun GL, Zhang DM. Molecular evolution and phylogeny of the angiosperm ycf2 gene. J Syst Evol. 2010;48(4):240-248.

89. Zhong QW, Yang SP, Sun XM, Wang LH, Li Y. The complete chloroplast genome of the Jerusalem artichoke (Helianthus tuberosus L.) and an adaptive evolutionary analysis of the ycf2 gene. PeerJ. 2019;7:e7596.

90. Tang HQ, Tang L, Shao SC, Peng YL, Li L, Luo Y. Chloroplast genomic diversity in Bulbophyllum section Macrocaulia (Bl.) Aver. (Orchidaceae, Epidendroideae, Malaxideae): Insights into species divergence and adaptive evolution. Plant Diversity. 2021.

91. Wang JH, Moore MJ, Wang HX, Zhu ZX, Wang HF. Plastome evolution and phylogenetic relationships among Malvaceae subfamilies. Gene. 2021;765:145103.

92. Carbonell-Caballero J, Alonso R, Ibañez V, Terol J, Talon M, Dopazo J. A phylogenetic analysis of 34 chloroplast genomes elucidates the relationships between wild and domestic species within the genus Citrus. Mol Biol Evol. 2015;32(8):2015-2035.

93. Hu SL, Sablok G, Wang B, Qu D, Barbaro E, Viola R, et al. Plastome organization and evolution of chloroplast genes in Cardamine species adapted to contrasting habitats. BMC Genomics. 2015;16:306.

94. Firetti F, Zuntini AR, Gaiarsa JW, Oliveira RS, Lohmann LG, Van Sluys MA. Complete chloroplast genome sequences contribute to plant species delimitation: A case study of the Anemopaegma species complex. Am J Bot. 2017;104(10):1493-1509.

95. Yu XQ, Drew BT, Yang JB, Gao LM, Li DZ. Comparative chloroplast genomes of eleven Schima (Theaceae) species: Insights into DNA barcoding and phylogeny. PLoS One. 2017;12(6):e0178026.

96. Chase MW, Hills HH. Silica gel: an ideal material for field preservation of leaf samples for DNA studies. Taxon. 1991;40(2):215-220.

97. Doyle JJ, Doyle JL. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem Bull. 1987;19(1):11-15.

98. Jin JJ, Yu WB, Yang JB, Song Y, dePamphilis CW, Yi TS, et al. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biol. 2020;21(1):241.

99. Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28(12):1647-1649.

100. Ripma LA, Simpson MG, Hasenstab-Lehman K. Geneious! Simplified genome skimming methods for phylogenetic systematic studies: a case study in Oreocarya (Boraginaceae). Appl Plant Sci. 2014;2(12):1400062.

101. Qu XJ, Moore MJ, Li DZ, Yi TS. PGA: a software package for rapid, accurate, and flexible batch annotation of plastomes. Plant Methods. 2019;15(1):50.

102. Tillich M, Lehwark P, Pellizzer T, Ulbricht-Jones ES, Fischer A, Bock R, et al. GeSeq-versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 2017;45(W1):W6-W11.

103. Lowe TM, Chan PP. tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes. Nucleic Acids Res. 2016;44(W1):W54-57.

104. Wicke S, Naumann J. Molecular evolution of plastid genomes in parasitic flowering plants. Adv Bot Res. 2018;85:315-347.

105. Lohse M, Drechsel O, Kahlau S, Bock R. OrganellarGenomeDRAW-a suite of tools for generating physical maps of plastid and mitochondrial genomes and visualizing expression data sets. Nucleic Acids Res. 2013;41(Web Server issue):W575-W581.

106. Leese F, Mayer C, Held C. Isolation of microsatellites from unknown genomes using known genomes as enrichment templates. Limnol Oceanogr Meth. 2008;6(9):412-426.

107. Li Q, Wan JM. SSRHunter: development of a local searching software for SSR sites. Hereditas. 2005;25(5):808-810.

108. Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R. REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 2001;29(22):4633-4642.

109. Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27(2):573-580.

110. Amiryousefi A, Hyvönen J, Poczai P. IRscope: An online program to visualize the junction sites of chloroplast genomes. Bioinformatics. 2018;34(17):3030–3031.

111. Frazer KA, Pachter L, Poliakov A, Rubin EM, Dubchak I. VISTA: computational tools for comparative genomics. Nucleic Acids Res. 2004;32(Web Server issue):W273-W279.

112. Darling ACE, Mau B, Blattner FR, Perna NT. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004;14(7):1394-1403.

113. Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33(7):1870-1874.

114. Rozas J, Ferrer-Mata A, Sánchez-DelBarrio JC, Guirao-Rico S, Librado P, Ramos-Onsins SE, et al. DnaSP 6: DNA sequence polymorphism analysis of large data sets. Mol Biol Evol. 2017;34(12):3299-3302.

115. Gao FL, Chen CJ, Arab DA, Du ZG, He YH, Ho SYW. EasyCodeML: a visual tool for analysis of selection using CodeML. Ecol Evol. 2019;9(7):3891-3898.

116. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772-780.

117. Zhang D, Gao F, Jakovlic I, Zhou H, Zhang J, Li WX, et al. PhyloSuite: an integrated and scalable desktop platform for streamlined molecular sequence data management and evolutionary phylogenetics studies. Mol Ecol Resour. 2019.

118. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312-1313.

119. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 2012;61(3):539-542.

120. Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods. 2017;14(6):587-589.

121. Rambaut A, Drummond AJ, Xie D, Baele G, Suchard MA. Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst Biol. 2018;67(5):901-904.