Chloroplast phylogenomic analyses resolve multiple origins of the Kengyilia species via independent polyploidization events

Background Kengyilia is a group of allohexaploid species that arose from two hybridization events followed by genome doubling of three ancestral diploid species with different genomes St , Y and P in the wheat tribe. Estimating phylogenetic relationship in resolution of the maternal lineages has been difficult, owing to the extremely low rate of sequence divergence. Here, phylogenetic reconstructions based on the plastome sequences were used to explore the role of maternal progenitors in establishment of Kengyilia polyploid species. Results The plastome sequences of 11 Kengyilia species were analyzed together with 11 tetraploid species ( PP , StP , and StY ) and 33 diploid taxa representing 20 basic genomes in the Triticeae. Phylogenomic analysis and genetic divergence patterns suggested that (1) Kengyilia is closely related to Roegneria , Pseudoroegneria , Agropyron , Lophopyrum , Thinopyrum , and Dasypyrum ; (2) both the StY genome Roegneria tetraploids and the PP genome Agropyron tetraploids severed as the maternal donor during the speciation of Kengyilia species; (3) the different Kengyilia species derived their StY genome from different Roegneria species. .

sequence data. Moreover, analysis of CoxII suggested that some species of Kengyilia (e.g.: K. batalinii), Agropyron, and Pseudoroegneria formed a paraphyletic grade with zero-length branches [16]. While these studies added our understanding of phylogenetic relationship of Kengyilia, the molecular phylogenies based on published chloroplast DNA (trnL-F, matK, rbcL, trnH-psbA) and mitochondrial sequences in resolution of the maternal lineages of Kengyilia species are still in dispute due to either the unresolved gene tree with polytomies or incongruence among cytoplasmic gene data [14][15][16]. Moreover, the processes that have driven polyploid diversification and speciation, especially with regards to which tetraploid and diploid species as maternal progenitors were involved in hexaploid evolution in Kengyilia, remain unclear. Thus, to better understand the maternal contribution to the species of Kengyilia, it is essential to obtain a well comparative study of chloroplast genome-wide in Kengyilia and its relatives covering nearly all of the genomic combinations in Triticeae.
Here, integrating 38 newly and 18 previously sequenced plastomes representing the StYP genomes and its related tetraploid and diploid genomic types in Triticeae, this study applies phylogenetic reconstruction methods in combination with estimate of genetic distance among coding region to clarify maternal lineage relationships. Our objectives are to demonstrate a phylogenomic framework for illustrating the maternal donor of Kengyilia polyploids and to explore the role of maternal progenitors in establishment of Kengyilia polyploid.

Characteristics of chloroplast genomes and genes
All sequenced genomes are very similar to published cp genomes of Triticeae [17,18] and rather conservative in genome structure and gene content. Their genome size ranges from 134,985 in R. ciliaris to 135,489 bp in A. cristatum (ZY09064). All plastomes exhibited a typical quadripartite structure that included a pair of IRs separated by a large single copy region (LSC) and a small single copy region (SSC) and contained a total of 109 genes (including 76 protein coding genes, 29 tRNA genes and 4 rRNA genes). Assemblies in genus Kengyilia averaged 135,113 bp, with an estimated 0.064% insertion data (compared to Pseudoroegneria libanotica reference); genus Roegneria assemblies averaged just less than 135,079 bp (0.039% estimated insertion data, compared to Pse. libanotica reference).
Analysis of co-linearity is inferred for two diploid taxa representing St and P genomes, one tetraploid species with the StP genome, one tetraploid species with the StY genome, and three hexaploid species with the StYP genome (Fig. 1). Despite a high degree of co-linearity among these genomes due to the conservative in chloroplast genome structure and gene content, five big indels (at position 17819-18278 bp, 56172-56963 bp, 62664-63130 bp, 83590-84338 bp, 130804-131592 bp, respectively) were detected between the St-and P-containing lineages, which is an indicative of high genetic divergence between them.
The features of each of the 76 protein-coding gene in the diploid-polyploid plastome data are summarized in Table S1. The lengths of each gene ranged from 90 (petN) to 4,440 (rpoC2) bp. The proportion of variable sites (variable sites/total sites, V/T) varied from 0 (e.g. petG) to 3.36% (rpl32).
The ratio of parsimony-informative characters per total aligned characters was greatest for petL (2.08%) and lowest for petG, psbF, and rpl23 (0).

Phylogenetic analyses
Bayesian phylogenetic reconstruction of the plastome data under the GTR + G + I model resulted in a tree with high posterior probability support across most clades. ML analyses in IQ-Tree under the TVM + F + R3 model recovered the same topology as the Bayesian analyses. The tree illustrated in Fig. 2 was the BI tree with statistic supports (UFboot, SH-aLRT, and PP) above branches. The phylogenetic tree showed that the plastome sequences of Kengyilia were split into two major clades (Clade I and II) with consistent statistical support (100% UFboot and SH-aLRT; 1.0 PP). The Clade I included

Statistic of K2-p distance matrix
A distance matrix including 1,664 genetic values was generated to investigate the relationship between the plastomes of Kengyilia and those of its closely relatives (Table S2). The Hopkins statistic was found to be 0.2057, indicating that the data is highly clusterable (

Discussion
The cpDNA-based (trnL-F, matK, rbcL, and trnH-psbA) phylogeny of the genus Kengyilia, especially with regard to the origin of maternal donor during hexaploid polyploidization events, were largely unresolved due to the occurrence of many polytomies and incongruence among published gene tree [14,15]. Ma et al. [19] pointed out that despite missing samples, phylogenetic analysis of plastome sequences can offer the greatest phylogenetic resolution. In this study, a resolved tree with highly statistic support was inferred from the plastome sequences of Kengyilia and those of its relatives in Triticeae, allowing the relationship regarding to the maternal lineages of Kengyilia to be clarified.
In phylogenomic tree, ten species of Kengyilia (K. alatavica, K. hirsuta, K. laxiflora, K. batalinii, K. kokonorica, K. thoroldiana, K. grandiglumis, K. mutica, K. stenachyra, and K. rigidula), Roegneria, and Pseudoroegneria were in one group with consistent support, indicating that Pseudoroegneria is likely to be the maternal donor of these ten StYP genome Kengyilia species and the sampled StY genome Roegneria species. Since Kengyilia species arose from two hybridization events followed by three genome doublings (the St, Y and P genomes), with one firstly generating the StY genome Roegneria and the other forming the StYP genome Kengyilia [11,13], Roegneria severed as the maternal donor during the speciation of the ten Kengyilia species.
Analysis of trnL-F suggested that four species of Kengyilia (K. kokonorica, K. melanthera, Kengyilia mutica, and Kengyilia thoroldiana) were closely related to species of Agropyron [14]. A similar deeplevel relationships regarding to maternal lineages is also presented by Luo et al. [15], although molecular characters (including matK, rbcL, and trnH-psbA) and more taxa were sampled from Kengyilia. In this study, only K. melanthera was grouped with the species of Agropyron, and the remaining three species (K. kokonorica, K. mutica, and K. thoroldiana) were placed into the clade including St-containing species. Moreover, the plastome sequence of K. melanthera and Agropyron are obviously distinct from those of the St-containing species. Thus, the molecular phylogenies based on published cpDNA fragments and the present plastome sequence data in resolution of the placement of K. kokonorica, K. mutica, and K. thoroldiana led to apparently contradictory results.
Discordances among phylogenetic trees result from methodological artifacts (e.g., sampling error and/or a failure of molecular characters) and the complex dynamics of the evolutionary processes in organisms (e.g., hybridization and/or ancestral polymorphisms) [6,20]. Sampling error is likely to be the candidate for the current incongruences because our samples for the comparative phylogenies with Kengyilia species included nearly all of the monogenomic genera accepted in genome-based classifications of the Triticeae, and most monogenomic genera were not covered in previous study [14,15]. It is well known that molecular characters can affect the accuracy of phylogenetic estimates [19]. Incongruences would also be the result of lack of molecular characters. Less molecular characters in cpDNA regions, as indicated by our estimate for the variable features of each chloroplast protein-coding genes (Table S2), together with its slowly evolving rates in chloroplast genome, would not only provide a few variable information for the accuracy of phylogenetic reconstruction but also result in the occurrence of polytomies in phylogenetic tree. On the contrary, the plastome data offer enough molecular characters for the accuracy of phylogenetic estimates with well-supported topology. Both hybridization and ancestral polymorphisms acting alone or in concert can generate discordance and therefore are the principal processes to explain the phylogenetic incongruence in Triticeae species [6,21]. Analysis of genetic distance matrix based on the 52 proteincoding genes suggested that Lophopyrum and Thinopyrum are closely related to the St-containing species. In phylogenomic tree inferred from complete chloroplast genome, Lophopyrum, Thinopyrum, Dasypyrum, and two species of Pseudoroegneria (Pse. stipifolia and Pse. congnata) form a monophyletic group. These results indicated Lophopyrum, Thinopyrum, Dasypyrum, Pseudoroegneria (most likely Pse. stipifolia and Pse. congnata) shared ancestral polymorphisms due to incomplete diversification of common maternal ancestry. Such ancestral polymorphisms could be genetically transmitted to some polyploid species (e.g.: StP, StY, StYP) via the hybridization between Pseudoroegneria as female parent and the donors with Y and/or P genomes. The hypothesis of hybridization is also a likely candidate to explain the conflict because different polyploid species with the same genotypes could derive from different parental donors via independent hybridization events, generating a diverse array of polyploid genotypes in Triticeae [5,22]. The present plastome data also provides support for the independent origin some polyploid species, which can be shown by different Kengyilia species that was grouped with different Roegneria species in a phylogenetic tree. For example, in the clade I of phylogenomic tree, three Kengyilia species (K. hirsuta, K. laxiflora, and K. batalinii) were clustered with R. grandis with strongly statistic support (100% UFboot, 100% SH-aLRT, and 1.0 PP), and five Kengyilia species (K. thoroldiana, K. grandiglumis, K. mutica, K. stenachyra, and K. rigidula) were grouped with R. longearistata (100% UFboot, 100% SH-aLRT, and 1.0 PP). Analysis of genetic distances based on 52 protein-coding sequences also presented similar results. Sympatric distribution among R. grandis, R. longearistata and Agropyron species have provide an opportunity in physical proximity for hybridization events. It is thus suggested that the different Kengyilia species derived their StY genome from different Roegneria species. Our data also indicated that Agropyron species severed as the maternal donor during the speciation of K. melanthera, providing additional support for the independent origin of different Kengyilia species. However, it seems unlikely that the maternal Agropyron lineage in K. melanthera resulted from hybridization between high ploidy Roegneria species with StY genomes (served as paternal donor) and diploid P genome Agropyron species. One possible explanation is that the P genome of K. melanthera originated from the tetraploid Agropyron lineage as the female parent. Given the present data, multiple origins of polyploid species result in a maternal haplotype polymorphism and could explain the rich diversity and wide adaptation of polyploid species in the genus Kengyilia [11].

Conclusions
The present analysis of phylogenetic relationships in Kengyilia based the plastome sequences revealed that both Roegneria and Agropyron tetraploid species severed as the maternal donor during the speciation of Kengyilia species, and different Roegneria species contributed their StY genome to different Kengyilia species. This is an indicative of independent origin of different Kengyilia species, which shed new light on our understanding of the maternal lineages, polyploidization events and speciation process of Kengyilia.

Methods
The complete chloroplast sequences were aligned with MAFFT v. 7 [24] using the default settings. All alignments were visually inspected in MEGA 6.0 [25] and manually adjusted where needed. We also conducted a co-linear analysis using the software LASTZ, and the results were visualized using AliTV

Phylogenetic analysis
Because complete chloroplast genome sequences offer the greatest phylogenetic resolution [19], phylogenomic trees were generated from all sampled complete chloroplast genomes. Phylogenomic analyses were conducted using maximum likelihood (ML) and Bayesian inference (BI). ML analysis was performed using the IQ-Tree software (http://www.cibiv.at/software/iqtree/ Declarations are listed as endangered species on the red list. We declared that we did not collect the three endangered species from any field, and we just used the chloroplast genome sequences from the NCBI public web where no permission for download is needed.

Consent for publication
Not applicable. Supplementary Information Table S1. Features of each of the 76 protein-coding genes in analysis of polyploid and its diploid relatives.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.