Characteristics of chloroplast genomes and genes
All sequenced genomes are very similar to published cp genomes of Triticeae [17, 18] and rather conservative in genome structure and gene content. Their genome size ranges from 134,985 in R. ciliaris to 135,489 bp in A. cristatum (ZY09064). All plastomes exhibited a typical quadripartite structure that included a pair of IRs separated by a large single copy region (LSC) and a small single copy region (SSC) and contained a total of 109 genes (including 76 protein coding genes, 29 tRNA genes and 4 rRNA genes). Assemblies in genus Kengyilia averaged 135,113 bp, with an estimated 0.064% insertion data (compared to Pseudoroegneria libanotica reference); genus Roegneria assemblies averaged just less than 135,079 bp (0.039% estimated insertion data, compared to Pse. libanotica reference).
Analysis of co-linearity is inferred for two diploid taxa representing St and P genomes, one tetraploid species with the StP genome, one tetraploid species with the StY genome, and three hexaploid species with the StYP genome (Fig. 1). Despite a high degree of co-linearity among these genomes due to the conservative in chloroplast genome structure and gene content, five big indels (at position 17819–18278 bp, 56172–56963 bp, 62664–63130 bp, 83590–84338 bp, 130804–131592 bp, respectively) were detected between the St- and P-containing lineages, which is an indicative of high genetic divergence between them.
The features of each of the 76 protein-coding gene in the diploid-polyploid plastome data are summarized in Table S1. The lengths of each gene ranged from 90 (petN) to 4,440 (rpoC2) bp. The proportion of variable sites (variable sites/total sites, V/T) varied from 0 (e.g. petG) to 3.36% (rpl32). The ratio of parsimony-informative characters per total aligned characters was greatest for petL (2.08%) and lowest for petG, psbF, and rpl23 (0).
Phylogenetic analyses
Bayesian phylogenetic reconstruction of the plastome data under the GTR + G + I model resulted in a tree with high posterior probability support across most clades. ML analyses in IQ-Tree under the TVM + F + R3 model recovered the same topology as the Bayesian analyses. The tree illustrated in Fig. 2 was the BI tree with statistic supports (UFboot, SH-aLRT, and PP) above branches. The phylogenetic tree showed that the plastome sequences of Kengyilia were split into two major clades (Clade I and II) with consistent statistical support (100% UFboot and SH-aLRT; 1.0 PP). The Clade I included Thinopyrum (Eb),Lophopyrum (Ee)༌Dasypyrum (V), Pseudoroegneria (St), and all the sampled St-containing (Douglasdeweya, StP; Roegneria, StY; Kengyilia, StYP) polyploid species (except for Kengyilia melanthera) with consistent statistical support (100% UFboot and SH-aLRT; 1.0 PP). In this clade, Thinopyrum༌Lophopyrum༌Dasypyrum, Douglasdeweya, four species of Pseudoroegneria (Pse. stipifolia, Pse. cognata, Pse. libanotica and Pse. tauri), two species of Roegneria (R. grandis and R. ciliaris), and four species of Kengyilia (K. alatavica, K. hirsuta, K. laxiflora, and K. batalinii) were in one subclade (99.8% UFboot, 100% SH-aLRT, and 1.0 PP). Kengyilia alatavica from central Asia formed a paraphyletic grade with Thinopyrum༌Lophopyrum༌Dasypyrum, Douglasdeweya, and two species of Pseudoroegneria (Pse. stipifolia and Pse. cognata). Two Kengyilia species from central Asia (K. hirsuta and K. batalinii) and one Kengyilia species from the Qinghai-Tibetan Plateau (K. laxiflora) were clustered with two species of Roegneria (R. grandis and R. ciliaris) (95% UFboot, 95% SH-aLRT, and 1.0 PP). Kengyilia kokonorica from the Qinghai-Tibetan Plateau and two species of Pseudoroegneria (Pse. libanotica and Pse. tauri) formed a paraphyletic grade in the subclade. Five species of Kengyilia from the Qinghai-Tibetan Plateau (K. thoroldiana, K. grandiglumis, K. mutica, K. stenachyra, and K. rigidula) were grouped with one species of Roegneria (R. longearistata) (100% UFboot, 100% SH-aLRT, and 1.0 PP), and this group sister to two accessions of Pseudoroegneria spicata (99.6% UFboot, 100% SH-aLRT, and 1.0 PP). The clade II contained all sampled Agropyron species (A. cristatum and A. mongolicum) and K. melanthera from the Qinghai-Tibetan Plateau (100% UFboot, 100% SH-aLRT, and 1.0 PP).
Statistic of K2-p distance matrix
A distance matrix including 1,664 genetic values was generated to investigate the relationship between the plastomes of Kengyilia and those of its closely relatives (Table S2). The Hopkins statistic was found to be 0.2057, indicating that the data is highly clusterable (Fig. 3A). Analysis of both bivariate cluster and clustering dendogram based on the method of the hierarchical agglomerative clustering shows four major clusters (Fig. 3B and 3C), which correspond to the four genomic types (P/StYP, Ee/Eb, StY, and St/StP/StY/StYP). This is also well congruent with the groupings in phylogenomic tree inferred from the plastome data including all sampled Triticeae plants. The first cluster included all sampled Agropyron species (A. cristatum, ACR; A. mongolicum, AMO) and K. melanthera (KME). The second cluster contained one species of Roegneria (R. longearistata, RLO) and five species of Kengyilia (K. stenachyra, KST; K. rigidula, KRI; K. grandiglumis, KGR; K. mutica, KMU; K. thoroldiana, KTH). The third cluster consisted of Thinopyrum (TBE) and Lophopyrum (LEL). The forth cluster comprised all the sampled Pseudoroegneria (Pse. spicata, PSP; Pse. stipifolia, PST; Pse. cognata, PCO; Pse. libanotica, PLI; Pse. tauri, PTA), Douglasdeweya (DDE), two species of Roegneria (R. grandis, RGR; R. ciliaris, RCI), and four species of Kengyilia (K. hirsuta, KHI; K. laxiflora, KLA; K. alatavica, KAL; K. batalinii, KBA).