Complete Chloroplast Genome Sequence of a Major Warm-Season Turfgrass Species, Centipedegrass [Eremochloa ophiuroides (Munro) Hack.]: Genome Characterization, Comparative and Phylogenetic Analysis

Background: Chloroplast (cp) genome sequence data could provide valuable information for molecular taxonomy and phylogenetic reconstruction among plant species and individuals. However, as one of the most important warm-season turfgrasses widely used in USA and China, cp genome characteristics and phylogenetic position of centipedegrass (Eremochloa ophiuroides) were poorly understood. Results: In this study, we determined the complete chloroplast genome sequences of E. ophiuroides using high-throughput Illumina sequencing technology. The circle pseudomolecule for E. ophiuroides cp genome is 139,107 bp in length, and has a typical quadripartite structure consisting of a pair of inverted repeat (IR) regions of 22,230 bp each separated by a large single copy (LSC) region of 82,081 bp and a small single copy (SSC) region of 12,566 bp. The nucleotide composition of E. ophiuroides cp genome is asymmetric with an overall A + T content of 61.60%. It encodes a total of 131 gene species, composed of 20 duplicated genes within the IR regions and 111 unique genes including 77 protein-coding genes (PCGs), 30 transfer RNA (tRNA) genes and four ribosome RNA (rRNA) genes. Analysis of the repetitive sequences revealed that E. ophiuroides cp genome contains 51 tandem repeats including 29 forward, 20 palindromic and 2 reverse repeats, and 197 simple sequence repeats (SSRs) which were mainly composed of adenine (A) and thymine (T) bases. Comparison of the E. ophiuroides complete cp genome with the genomes of other seven Gramineae species showed a high degree of collinearity among Gramineae plants. Phylogenetic analysis showed that E. ophiuroides was closely related to E. ciliaris and E. eriopoda, and was placed in a clade with the two Eremochloa species and Mnesithea helferi within the subtribe Rottboelliinae, which claried evolutionary status of E. ophiuroides in tribe Andropogoneae and also authenticated the current taxonomy of the tribe Andropogoneae. Conclusions: The present study the structure

the natural populations of E. ophiuroides have been experiencing a sharp decline. So it is urgent to establish scienti c strategies to protect and conserve the resources of E. ophiuroides in its main distribution areas.
Chloroplast (cp) is the key organelle responsible for photosynthesis and carbon xation in green plants, and is involved in the biosynthesis of amino acids, fatty acids, hormones, vitamins, nucleotides, pigments and other secondary metabolites [7,8]. It contains both highly conserved genes fundamental to plant life and more variable regions which are informative over broad time scales. Therefore, cp genome sequence data can provide valuable information for molecular taxonomy and phylogenetic reconstruction among plant species and individuals [9,10], which could contribute greatly to plant breeding and conservation strategies. In addition, cp DNA has the characteristics of non-recombinant nature, low mutation rates, and uniparental inheritance, which makes it signi cant in giving insights into plant evolution and developing applications for biotechnological breeding [11,12]. In recent years, with the rapid development of nextgeneration sequencing (NGS) technology, it is now more convenient and relatively inexpensive to obtain cp genome sequences and implement whole genome-based phylogenomics [13]. In contrast to previous studies done with a single or a few cp loci-based approaches, using the complete cp genome sequence now provides a unique opportunity to investigate related species evolution based on whole-genome comparison [14,15].
As an important turfgrass, turf quality of E. ophiuroides, to a great extent, depends on its biomass, color and green color retention that are highly correlated with photosynthetic e ciency. Like most of other turf grass species, E. ophiuroides prefers to grow in open sunny places. Therefore, it is hoped that the study of photosynthesis-related chloroplast genes could provide basis for genetic breeding of the turf grass [16].
Analyzing and characterizing the cp genome of a turf grass would provide essential information to improve the turf quality and facilitate the development of a plastid transformation system in this turf plant [17]. However, despite being one of the most important warm-season turfgrasses being introduced into the USA by Frank Meyer one century ago [18], the research on E. ophiuroides cp genome is lagging behind. Although a set of cp genome sequence data of E. ophiuroides was submitted to NCBI GeneBank in 2017, there is no relevant report to interpret it in detail, which has hindered our understanding and progress of E. ophiuroides evolution, species identi cation, germplasm conservation, genetic engineering, and other related researches.
In the present study, we sequenced the E. ophiuroides cp genome using Illumina technology, assembled the complete cp genome sequence of E. ophiuroides, and performed detailed phylogenetic analyses on the basis of complete cp genome sequence information. We also analyzed the fully assembled cp genome of E. ophiuroides and compared it to seven related species of Gramineae. The main purposes of this study were to investigate the complete structure of the E. ophiuroides cp genome, to explore the phylogenetic position of E. ophiuroides in tribe Andropogonodae, and also to provide basic data for further molecular studies related to grass taxa identi cation, phylogenetic resolution, population structure and biodiversity, novel gene discovery and functional genomic studies in the genus Eremochloa.

Results
Genome assembly and structure analysis A total of 19,101,863 clean reads (approximately 5.73 Gb) were obtained from the E. ophiuroides leaf library. After performing de novo and reference-guided assembly with minor modi cations, these reads nally were integrated into a complete circular pseudomolecule that was 139,107 bp in lengh (GeneBank accession: MT806102). Thus, the sequencing depth of the E. ophiuroides cp genome was expected to be more than 41,000 × ( Fig. 1, Table 1). Since 738,323 reads were mapped to the assembled cp genome, the average coverage reached to more than 1500 × of the E. ophiuroides.
The cp genome of E. ophiuroides exhibited a typical quadripartite structure, consisting of a pair of inverted repeat (IR) regions of 22,230 bp each with 56.00% AT, separated by a large single-copy (LSC) region of 82,081 bp with 63.76% AT and a small single-copy (SSC) region of 12,566 bp with 67.32% AT (Fig. 1). The global AT content in the E. ophiuroides cp genome was 61.60% (Table 1).

Gene annotation
A total of 131 genes were annotated in the sequenced E. ophiuroides cp genome, of which 20 genes are duplicated in the IR regions and 111 are unique, including four rRNA genes, 30 tRNA genes, 77 proteincoding genes (Fig. 1, Table 2). Most of the unique genes contained no introns, but one intron was found in each of six tRNA genes and eight protein-coding genes, and two introns were found in each of two protein coding genes. In the gene function analysis, the 111 unique genes were classi ed into four categories, including genes associated with photosynthesis (44 genes), self-replication (59 genes), other functions (5 genes), and genes of unknown function (3 genes) ( Table 2).

Codon preference analysis
There were 3 termination codons and 63 codons encoding 20 amino acids in the cp protein-coding genes of the E. ophiuroides (Fig. 2 Repeat structure and SSR analysis Repeated analysis revealed 29 forward repeats, 20 palindromic repeats, and two reverse repeats in the E. ophiuroides cp genome (Table S2). The forward repeat units were 30-242 bp long, and almost all the forward repeats were located at LSC regions except for four located in IR regions and one in SSC region.
Similar to the forward repeats, the majority of palindromic repeat units were 30-242 bp in length and distributed in LSC regions, with one of them was 22,230 bp long as an exception. Alternatively, for the reverse repeats, two in total were less than 35 bp in length and were detected in LSC regions.

IR contraction and expansion and genome collinearity
The exact IR boundary positions and their adjacent genes of the E. ophiuroides and the other eight species from families Gramineae and Cruciferae were compared (Fig. 3). In the cp genome of the three Eremochloa species, the IR boundary positions and their adjacent genes were exactly the same. IRa/SSC and IRb/SSC junctions were found within the gene ndhH and the gene ndhF respectively, and correspondingly the ndhH pseudogene (1 bp), the ndhF pseudogene (29 bp) was detected at the IRa/SSC boundary and the IRb/SSC border, whereas no pseudogene was observed at the IRa/LSC and IRb/LSC boundaries. As for Sorghum and Zea, they were found to have exactly the same IR boundary position and the adjacent genes, and to have almost unanimous IR situations with Eremochloa only except for the positions of the genes rpl22 (57bp to IRb) and psbA (88bp to IRa) in LSC. Setaria italic had IR boundary positions similar to that of Sorghum or Zea. IR boundary positions and the adjacent genes of Oryza sativa and Brachypodium distachyon were generally consistent with that of Eremochloa plants, even though slight differences were observed, such as the position divergence of the gene ndhF adjacent to IRb/SSC junction, the pseudogene ndhH length variations detected in IRa regions of the two species. Unlike the Gramineae species, IRb/LSC and IRa/SSC junctions of Arabidopsis thaliana were within the gene rsp19 and the gene ycf1 respectively, while IRb/SSC junction was within both the ycf1 gene and the ndhF gene.
The mauve alignment for the eight species revealed that all the cp genomes formed locally collinear blocks (LCBs). In particular, the gene order of the three Eremochloa cp genomes was highly conserved compared with that of other plant species (Fig. 4). Eremochloa and Sorghum had the highest chloroplast genome homologies, while the order of cp gene loci was highly consistent among the other Gramineae plant genomes. This demonstrates that the chloroplast genome has a high homology among Gramineae plants.

Phylogenetic relationship
Phylogenetic relationships of species in the tribe Andropogoneae and taxonomic statuses of E. ophiuroides and other species in the same tribe were systematically classi ed through Maximal likelihood (ML) analysis of the newly sequenced and published complete cp sequences. 49 published or available complete cp genome sequences and a newly sequenced E. ophiuroides cp sequence were combined in this study. Thus, we reconstructed a phylogenetic tree of tribe Andropogoneae using a total of 50 complete cp genome sequences which were selected from 46 different species in Andropogoneae (three species with two cp sequences) and one species (Arundinella deppeana) in Arundinelleae used as an outgroup (Table S4). RAxML analysis produced a phylogenetic tree which fully supported E. ophiuroides to be closely related with Eremochloa ciliaris and Eremochloa eriopoda with 100% bootstrap values, and the three Eremochloa species, together with Mnesithea helferi, form one monophyletic group corresponding to subtribe Rottboelliinae (Fig. 5).
It is noteworthy that a larger number of species traditionally classi ed as the same subtribe do not form a group. From the phytogenetic analysis, only four monophyletic groups, corresponding to subtribe Saccjaromae, Sorghinae, Andropogoninae and Rottboelliinae, can be retrieved from the tree, while quite a few non-monophyly be formed. A typical instance is the placement of Germainia capitata (Germainiinae) as sister to Pogonatherum paniceum (Incertae sedis) with the same branch length. Similar cases can be found for the placement of Dimeria ornithopoda (Dimeriinae) as sister of Eulaliopsis binata (Saccharinae), and the placement of Rottboellia cochinchinensis (Rottboelliinae) as sister of Coix lacryma-jobi (Coicinae). In addition, Heteropogon triticeus and Cymbopogon exuosus, two species in Anthistiriinae, were clustered to a sister clade of Andropogoninae; Kerriochloa siamensis, one species of Ischaeminae, was constrained as sister to Incertae sedis.

Discussion
Genome size and gene identi cation The size of the E. ophiuroides cp genome was found to be 139,107 bp, similar to those cp genomes in Panicoideae subfamily, which range from 138 Kb in Setaria viridis [19] to 141 Kb in Saccharum offcinarum [20], but larger than those of other sequenced cp genomes in Chloridoideae, Pooideae, and Oryzoideae subfamilies with not more than 137 Kb in length (Table S5). Since the average size of publicly available Poaceae cp genomes is 137,091 bp [21], E. ophiuroides is of average size within Panicoideae and of large size within Poaceae (Gramineae). The cp DNA of E. ophiuroides, like that of most angiosperms, is circular with a typical quadripartite structure containing a pair of IRs separated by LSC and SSC regions. The overall AT content of the E. ophiuroides cp genome was 61.6%, which is similar to that of most Gramineae plants (~61%, Table S5).
The gene and intron contents in the E. ophiuroides cp DNA are basically identical to those of rice [22,23], wheat [24], maize [25], sorghum [26] and other grasses [21,[27][28][29], with 77 protein-coding genes, 30 tRNA genes and four rRNA genes. Among the 111 unique genes, 14 contain one intron (six tRNA and eight protein-coding genes) and two (rps12 and ycf3) possess two introns. For all identi ed genes, 59 fragments are related to self-replication and 44 genes are associated with photosynthesis. Of the 44 photosynthesis related genes, ve genes encode photosystem I components (psaA, B, C, I, J), 15 genes are related to photosystem II, and six genes (atpA, B, E, F, H, I) are responsible for ATP synthase and the other 18 genes encode electron transport chain components. A similar pattern of protein-coding genes is also present in Oryza sativa [30], Oryza glaberrima [31] and Oryza minuta [23].

Repeat sequence
The nucleotide sequences of most organism genomes contain many different types of repetitive sequences, such as short tandem repeats, interspersed repeats or spaced repeats. These repeat elements are either dispersed throughout the genome or within a short region of the genome [32]. The mismatching on slip chains and inappropriate recombination of repetitive sequences may lead to the occurrence of sequence variation and DNA rearrangement [33,34]. Interspersed repetitive sequences (IRS) are a kind of repeats interspersed in genome DNAs and are potential resource to revealing gene rearrangements and losses during evolution [35,36]. It usually includes forward, palindromic, reverse and complement repeats. In the present study, many forward and palindromic repeats, and a few reverse repeats were detected in E. ophiuroides cp genome sequences, and most of them were distributed in LSC regions of the genome. Similar ndings were also reported in other plant species, such as Swertia mussotii [37], Oryza minuta [23]. This re ects the common characteristics of the IRSs in most of plant cp genomes.
SSRs, also called microsatellites, are known to be more informative and are very abundant and evenly distributed in angiosperm plastomes [38]. Because of their abundance, high rate of polymorphism, ubiquitous distribution throughout the genome, and high extent of allelic diversity, SSRs have been extensively used as versatile DNA-based markers in plant genetic and genomic research [39]. The motif type, length and abundance of SSRs are the main characteristics of microsatellites [40]. Besides complex SSRs, ve types of perfect SSRs (mono-/di-/tri-/tetra-/penta-nucleotide repeats) were detected in the E. ophiuroides cp genome sequences. The most abundant SSR motif was mononucleotide repeats followed by trinucleotide and tetranucleotide repeats in detected SSRs. This result is not completely consistent with other ndings that showed mono-and di-nucleotides are the most frequent SSR types in plant cp genomes [41][42][43], but is consistent with the report in Lythraceae [44] and Magnolia polytepala [12], and is also in accord with the nding of SSR mining from the E. ophiuroides RNA-seq data although mononucleotide repeat was omitted in that study [45]. Whether mononucleotide SSRs or polynucleotide SSRs detected in the present study, most of them were rich in A/T content. This is consistent with the existing chloroplast SSR reports [46][47][48].

IR contraction and expansion
IRs is prominent feature of most angiosperm cp genomes. Expansion and contraction of IR region boundaries is the main reason for size variations in the cp genome and plays an important role in species evolution [49]. In the present study, a detailed comparison on four junctions (Fig. 3), i.e., JLA (junction line between LSC and IRa), JLB (junction line between LSC and IRb), JSA (junction line between SSC and IRa) and JSB (junction line between SSC and IRb), between the two IRs (IRa and IRb) and the two single-copy regions (LSC and SSC) was performed among E. eriopoda, E. ciliaris, S. bicolor, Z. mays, S. italica, O. sativa, B. distachyon with regard to E. ophiuroides by carefully analyzing the exact IR border positions and adjacent genes. The IR region of E. ophiuroides was 22,230 bp in length, which was in medium length of the nine compared species from 20,804 bp to 22,783bp. This implies that some IR expansion and contraction may occur in the E. ophiuroides cp genome. JLA is between rps19 and rpl22, and JLB is located between rps19 and psbA in all eight Gramineae species. Both of the distances between rps19 and JLA, between rps19 and JLB are 35 bp in all three Eremochloa species, S. bicolor and Z. mays, which are shorter than that in other three Gramineae species; the distance between rpl22 and JLA in three Eremochloa species is shorter than that in S. bicolor and Z. mays, but is longer than that in the other species, while the distance between psbA and JLB in three Eremochloa species is longer than that in the other Gramineae species. The ndhF gene traverses the SSC and IRa regions, with 29 bp located in the IRa region for all the C 4 plants including three Eremochloa species, S. bicolor , Z. mays and S. italic, but it is distachyon. This is accord with most reported ndings in Gramineae plants [23]. This hints that variation in JSA border caused by IR expansion or contraction might result in the difference between C 3 and C 4 plant cp genomes. Our results also demonstrated that size variation of cp genomes resulted from IR contraction and expansion is a common feature during evolution of Gramineae plants, although structural organization and gene order of Gramineae cp genomes are highly conserved [50].

Phylogenetic analysis
The tribe Andropogoneae includes over 1,200 species in ca. 90 genera, and is a primary component of grasslands and savannahs that dominate tropical and subtropical regions throughout the world [51,52].
Recently, a number of phylogenetic and evolutionary studies have been implemented for the tribe Andropogonodae using complete chloroplast genomes [52][53][54][55]. Although E. ophiuroides is an important member in genera Eremochloa of the tribe Andropogoneae, it has not been included in these studies, which restricts illuminating its evolutionary relationships to other Andropogoneae species. Our molecular phylogenetic tree based on sequences of complete cp genomes revealed that E. ophiuroides was closely related to E. ciliaris and E. eriopoda, and their placement in a clade with Mnesithea helferi is highly supported with bootstrap values of 100% within the subtribe Rottboelliinae (Fig. 5). This is congruent with the traditional morphology-based taxa of Rottboelliinae, indicating that the classi cation of subtribe Rottboelliinae is generally reasonable.
In addition, from our results, the Rottboelliinae, Saccjaromae, Sorghinae and Andropogoninae are typically monophyletic groups, which re ect the agreement between molecular phylogeny and traditional morphology-based taxonomy. However, some non-monophylies of subtribes were recognized in the current molecular phylogeny. In the present study, Germainia capitata (Germainiinae) was placed as sister to Pogonatherum paniceum (Incertae sedis), Dimeria ornithopoda (Dimeriinae) as sister to Eulaliopsis binata (Saccharinae), and Rottboellia cochinchinensis (Rottboelliinae) as sister to Coix lacryma-jobi (Coicinae), which are congruent with previous results for these species [52][53][54][55]. Another typical nonmonophyletic area in the tree is the placement of Heteropogon triticeus and Cymbopogon exuosus (two species in Anthistiriinae) in a clade with Andropogoninae species, and the similar result has actually been reported [52]. However, it is worth mentioning that Sorghastrum nutans and Eulalia aurea were not clustered as sister clades in the current study, which is incongruent with previously reported results [53][54][55]. This is mainly due to the fact that more extensive species (50 complete cp genome data of 47 different species) in the tribe Andropogoneae were used for phylogenetic analysis in the present study.

Conclusion
We present the complete cp genome sequence of E. ophiuroides (139,107 bp) in this study. E. ophiuroides cp genome possesses circular and quadripartite structure which is well conserved similar to previously reported cp genomes from Gramineae family. A total of 131 genes were annotated in the sequenced E. ophiuroides cp genome, of which 44 genes are associated with photosynthesis. Most of E. ophiuroides codons encoding amino acids have codon preferences. The location and distribution of repeat sequences was detected, and around 197 SSR loci and 51 repeat sequences were identi ed in E. ophiuroides cp genome. Comparative genomic analysis revealed that E. ophiuroides has a high level of collinearity with the other Gramineae cp genomes. Phylogenetic analyses showed that E. ophiuroides is most closely related to E. ciliaris and E. eriopoda, and the three Eremochloa species together with Mnesithea helferi were placed to a monophyletic group corresponding to subtribe Rottboelliinae, which is completely accord with the traditional morphology-based taxa of Rottboelliinae in tribe Andropogoneae. The cp genome information of E. ophiuroides is a useful genetic resource that would be utilized on the conservation genetics, species identi cation, taxonomic clari cation, phylogenetic reconstruction, and molecular breeding in the further studies on Eremochloa.

Plant material, DNA extraction and sequencing
Fresh leaves were sampled from E. ophiuroides accession E039, which was collected from Lushan, Jiangxi province and now is deposited in the nursery of the Institute of Botany, Jiangsu Province and Chinese Academy of Sciences. Total genomic DNA was extracted using the EZgene TM SuperFast Plant Leaves DNA Kit (Biomiga, San Diego, CA, USA) following the manufacturer's protocol. Quality and integrity of the DNA were checked and determined using spectrophotometry and agarose gel electrophoresis, respectively. Average 350 bp paired-end (PE) library was prepared using Illumina TruSeq DNA Sample Prep kit (Illumina Inc.) and was then sequenced on an Illumina's NovaSeq 6000 platform.
Data assembly, gene annotation and codon preference analysis Raw reads were ltered using the base quality control software fastp (version 0.20.0) to obtain highquality reads. Then a BLAST analysis was performed between the high-quality reads and the reference cp genome (KY432809.1) to extract cp-like reads. The obtained high-quality cp-like reads were further assembled into contigs via de novo assembler SPAdes v3.9.0 [56]. All contigs were sorted and joined into a single draft sequence using NOVOPlasty with the reference genome (KY432809.1) as a template. Cp cyclization and initiation site determination was done by manual processing. Prodigal [57] and hmmer [58] softwares were applied to annotate protein-coding genes and ribosomal RNAs respectively, while transfer RNAs were predicted via Aragorn software [59]. The annotation results were veri ed using CpGAVAS pipeline and then manually corrected. Finally, the cp genome map was generated using the OrganellarGenomeDRAW tool (OGDRAW) [60].
The codon preference was analyzed by R software. The degree of the codon preference was evaluated by the relative synonymous codon usage (RSCU). The RSCU value was calculated as the ratio between the use frequency and the expected frequency of a particular codon. According to the RSCU theory [61,62], synonymous codon preference was partitioned into four models arti cially: no preference (RSCU ≤ 1.0), Tandem repeats were evaluated using Tandem Repeat Finder program [63] with default settings. Forward, palindromic, reverse and complement repeats were identi ed using vmatch v2.3.0 http://www.vmatch.de/ , and the minimal repeat size setting was greater than 30 bp with a Hamming distance of 3. Microsatellite or simple sequence repeats (SSRs) of one to six nucleotides were detected using the Perl script MISA v1.0 [64], and thresholds of eight, ve, three, three, three, and three repeat units were set for mono-, di-, tri-, tetra-, penta-and hexa-nucleotide SSRs, respectively.

Phylogenetic analysis
The phylogenetic analysis was conducted based on E. ophiuroides cp genome data including a newly sequenced cp genome (MT806102) in the present study and another one submitted to NCBI GeneBank (KY432809.1) by Gallaher et al. in 2017, together with the cp genomes of 48 Andropogonodae species with strong genetic relationships downloaded from GeneBank, and Arundinella deppeana (NC030620) was selected as the outgroup (Table S4). MAFFT [65] and trimAl [66] were used for genome sequences alignment and data set trimmed, respectively. The best substitution model GTR+G was chosen in the jModelTest v2.1.7 [67], and the Randomized Axelerated Maximum Likelihood (RAxML) method was used to infer the phylogenetic relationship with 1000 bootstrap replicates in MEGA 6.0.

Availability of data and materials
Sequence information of the newly sequenced E. ophiuroides cp genome is available in the NCBI database (www.ncbi.nlm.nih.gov/) under the accession number MT806102. The reference cp genome of E. ophiuroides and the whole cp genome sequences of 48 species analysed in this study were all downloaded from the NCBI database (www.ncbi.nlm.nih.gov/) with their accession numbers listed in Table S4.
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.    Phylogenetic tree of the Andropogoneae species based on the complete cp genome data by Maximum likelihood (ML). A total of 50 species were used to reconstruct a phylogenetic tree using MEGA6 software, and Arundinella deppeana was used as the outgroup. Subtribes and higher taxonomic groupings are indicated.