Application of chloroplast genome to resolve the taxonomy and phylogenetic relationships of invasive dioecious weeds in Amaranthus (Amaranthaceae)

Backgroud: Amaranthus palmeri, A. tuberculatus and A. arenicola are alien invasive dioecious amaranths originated from North America which have similar morphology and complex taxonomic relationship with their relatives. To search for effective molecular methods and accurate species boundary for detecting the alien invasive species, we sequenced whole chloroplast genome of 6 amaranths species, of which A. palmeri , A. arenicola , A. retroflexus and A. dubius are the first reports. Results: The complete chloroplast genome of 6 species has a circular molecular structure of 150,454 to 150,939 bp in length with 36.6% of GC content and contains a total of 134 genes, including 89 protein-coding genes, 37 tRNA genes, and 8 rRNA genes. There are a total of 802 parsimony-informative (PI) sites within genes and intergenic spacers. The rpl22-rps19 , ndhG-I , rpl 32- trnLUAG , trnPUGG-psaJ and ccsA - ndhD are the hotspots in the genus. And the 1,601 bp fragment from rpl32 to psaC has contained maximum variants with 82 PI sites. A. arenicola differs from A. tuberculatus with 19 PI sites located in 14 genes and spacers separately. The regions for differentiate A. dubius , A. hypochondriacus and A. caudatus of the Hybrid complex only fasten on 2 coding genes and 5 intergenic spacers. The patristic distances (0.00001-0.00005) among the three species are approximate to the distance (0.00005) between individuals of A. tuberculatus . Conformed to dioecious and monoecious distinctions but different with previous phylogenetic studies, A. palmeri clustered with A. arenicola and A. tuberculatus and formed a stable clade of subgen. Acnida . Conclusion: The chloroplast genome has played a role in offering enough information for discrimination and phylogenetic relationship among the Amaranthus subgen. Acnida . The most valuable regions of chloroplast genome in Amaranthus are intergenic spacers and could


Background
The genus Amaranthus comprises at least 60 species, most of them are annual weeds and distributed throughout the world's temperate and tropical regions (Mosyakin & Robertson, 2003). Several 3 amaranths are economic crops, such as A. hypochondriacus L. and A. caudatus L. (Sauer, 1950), and a well-known vegetable and horticultural plant A. tricolor L. (Advisory Committee on Technology Innovation, 1984). In contrast to this, A. palmeri S. Watson, A. tuberculatus (Moq.) J.D. Sauer  In addition, the genus Amaranthus is divided into 3 subgenera presently: Amaranthus subgen. Acnida (L.) Aellen ex K. R. Robertson, Amaranthus subgen. Amaranthus and Amaranthus subgen. Albersia (Kunth) Gren. & Godr., according to dieocious or monoecious and the tepal or stamen number (Mosyakin & Robertson, 1996). A. palmeri, A. tuberculatus, and A. arenicola are dioecious, belong to Amaranthus subgen. Acnida (L.) Aellen ex K. R. Robertson, and often be misidentified for their similar morphology (Sauer, 1955(Sauer, , 1972. These three dioecious amaranths all originated from North America and have spread widely in agricultural fields and other disturbed areas (Sauer, 1955(Sauer, , 1972. Palmer amaranth is one of a distinct subgroup of dioecious species within Amaranthus which hybridization among different species has been widely reported (Trucco et al., 2007;Steckel, 2007 In previous studies, a high ITS sequence homology degree between Palmer amaranth and spiny amaranth has been found, and A. palmeri was separated from subgen. Acnida and then merged into subgen. Amaranthus (Xu et al., 2017a;Kirkpatrick, 1995). This conclusion is opposite to traditional taxonomy opinion. Whether the taxonomic status of subgen. Acnida is tenable needs our further studies. In our papers, we focus on seeking appropriate identified regions for amaranths based on the chloroplast. In addition, we try to find out a new viewpoint for resolve our previous doubts about the status of subgen. Acnida.

Genomic features
The complete chloroplast genome of six Amaranthus spp. has a circular molecular structure of 150,454 to 150,939 bp in length with 36.6% of GC content. It has a large single copy (LSC) region of 83,747 to 84,340 bp and a small single copy (SSC) region of 17,898 to 18,044 bp, separated by a pair of identical inverted repeat regions (IRs) of 24,519 to 24,582 bp each (Table S1). The chloroplast genome contains a total of 134 genes, including 89 protein-coding genes, 37 tRNA genes, and 8 rRNA genes, 19 of which were duplicated in the inverted repeat regions (Table S2). The rps12 not only lie in the LSC, but duplicated in the IRs. 14 genes (rps12, rps16, atpF, rpoC1, ndhB, petB, petD, ndhA, trnK-UUU, trnI-GAU, trnA-UGC, trnG-UCC, trnL-UAA and trnV-UAC) contained a single intron, while ycf3 and clpP harbored two introns separately (Table S3).

Comparative genomic analysis and hotspot regions for identification
After aligned, sequence variablitiy was due solely to the presence of single nucleotide polymorphism (SNP) and indels. No gene rearrangements of genome or differences in gene content were observed.

5
There are a total of 802 PI sites among 134 genes and 132 intergenic spacers (Table 1 and S3).
Analysis of the distribution of genetic variability within the chloroplast genome of Amaranthus revealed that the most variable region is the Small Single Copy Unit (SSC) with 1.12 percent of nucleotide variation (Table 1). And the variation frequency of intergenic spacers is 1.05 percent much more than 0.32 percent of the gene regions (Table 1).

Phylogenetic tree
The two phylograms constructed by whole chloroplast genome and 802 PI sites are consistence basically (Figure 1 and S1  Amaranthus subgen. Albersia, and not involved in three subgenera, and the hotspots will be limited.
Even expanded research taxa, the valuable informative sites still scatter in different genic and intergenic regions. As for the confused taxa, if we want to set apart A. arenicola from A. tuberculatus, or distinguish the species of the Hybrid complex, that needs at least 6 genic and 13 intergenic regions. Efficient molecular markers based on these SNPs should be developed for rapid identification.
And more different geographical individuals of one species ought to be used in verify the availability of these variable sites.

Taxonomy of Amaranthus subgen. Acndia
From the previous classification system of Amaranthus to date, many scholars have put forward different revision suggestions. The primary taxonomic system for Amaranthus was revised by Mosyakin & Robertson (1996) and for Amaranthus subgen. Acnida was classified by Sauer (1955). But A. palmeri has a longer genetic distance with subgen. Amaranthus, and much more close with A. arenicola and A. tuebrculatus. The phylogenetic status of subgen. Acnida seem to be stable and form an independent clade on the plastid genome.

Phylogenetic status of Amaranthus subgen. Amaranthus
Another group is the Hybrid complex which is the most studied taxa (Sauer, 1967). Erika Viljoen et al.

Conclusions
All above conclusions have accounted for the genus is a trouble taxa. The chloroplast genome has played a role in offering enough information for discrimination and phylogenetic relationship among the Amaranthus, especially the Amaranthus subgen. Acnida. That's the first reports for the subgen.
Acnida based on the plastid. Subsequently, much more Amaranthus species should be sequenced and analyzed complementedly in the future.

Plant samples, DNA extraction, and sequencing
Fresh leaves of each individual of A. palmeri, A. arenicola, A. retroflexus, A. dubius and two accessions of A. tuberculatus var. tuberculatus were sampled from imported borders in China and dried using silica gel. Voucher specimens (Table S1)

Genome assembly and annotation
The paired-end sequencing data (2 × 150 bp) were used to assemble its complete chloroplast genome. Sequencing adapters and barcodes were trimmed and low quality reads with Q value ≤ 30 removed. Trimmed paired end reads were mapped to the chloroplast sequence of A. hypochondriacus 9 (MG836505), with default parameters. The reads were assembled using the Geneious Prime v.

(Biomatters, Auckland, New Zealand). The consensus chloroplast sequence of four
Amaranthus spp. was retrieved separately and used as a reference for a second round of mapping of itself reads in order to validate its consensus chloroplast sequence. All trimmed and quality-filtered sequence reads have been deposited in Genbank of NCBI. Non-mapped reads, which are assumed to be of non-plastid origin, were excluded from further analysis. The complete chloroplast genome sequence was annotated using the Geneious Prime v. 2019.1.3 (Biomatters, Auckland, New Zealand) by comparing with the genome of A. hypochondriacus (GenBank accession No: MG836505). The assembled and annotated Amaranthus spp. chloroplast genome sequence was deposited at NCBI (Table S1). and (ii) mutation site with variant frequency above 20%. The parsimony-informative (PI) sites are much credible than single SNP for the discrimination of difficult groups and phylogenetic analyses.

Whole-plastid genome tree and parsimony-informative (PI) sites-based tree
Chloroplast genomic phylogenetic analyses were performed based on 12 sequences of 11 species in Amaranthaceae. These sequences were aligned using the Geneious Prime v.2019. replicates. In addition, we extracted all PI sites except indels of each gene and intergenic spacer from 10 the whole chloroplast genome and formed a fasta forma file putting into MEGA v6.06 software (Tamura et al., 2011) for phylogenetic analyses and construct a phylogram based on the PI sites.

Ethics approval and consent to participate
Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
The datasets generated and/or analysed during the current study are available in the GenBank repository, https://www.ncbi.nlm.nih.gov/.

Competing interests
The authors declare that they have no conflict of interest.

Funding
This work was supported by National agricultural standardization Project NBFW-14-2018, and the Basic Scientific Research Program of CAIQ (2014JK010 2017JK038). The work involved in this study is part of the above projects on pest identification techniques.

Authors' contributions
XH designed the experiments and write the paper. Table   Table 1 Variations within genes and intergenic regions among 10 whole chloroplast genomes of 9 Amaranthus spp. * .

Reigon
Coding regions Non-cod * Note: The region length of genes and intergenic spacers includes gaps and originated from the 14 consensus sequence aligned by 10 whole chloroplast genomes of 9 Amaranthus spp.. Figure 1 The Neighber-Joining (NJ) tree based on the 12 representative chloroplast genomes of family Amaranthaceae. The bootstrap value based on 1000 replicates is shown on each node. *The species of the Hybrid complex.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.