Genomic features
The quadripartite structure of 22 samples of 17 species in Amaranthus consists of a large single-copy region (LSC with 83, 382 – 84, 062 bp), a small single-copy region (SSC with 17, 937 – 18, 124 bp), and a pair of inverted repeat regions (IRs with 23, 964–24, 357 bp). The full length of the 22 cp genomes ranges from 149,949 bp in A. polygonoides to 150, 756 bp in A. albus (Table 1). The chloroplast genome sequences were deposited in GenBank (Table 1).
The total GC content was 36.5% to 36.6%, only A. albus, A. blitoides and A. polygonoides have a GC content of 36.5% (Table 1). The chloroplast genome contains a total of 133 genes, including 88 protein-coding genes, 37 tRNA genes, and 8 rRNA genes, 18 of which were duplicated in the inverted repeat regions (Table S2). The gene rps12 was trans-spliced; the 50-end exon was located in the LSC region, whereas the 30- intron and exon were duplicated and located in the inverted repeat regions. The partial duplicate of rps19 and ycf1 genes appeared as pseudogenes as they lost their protein-coding ability. 16 genes have introns.
Variants of cp genomes
The length of the SSC region was conserved among the subgenera by comparing the length of the chloroplast genomes of 22 individuals from 17 species. A. palmeri, A. tuberculatus and A. arenicola in subgen. Acnida were 18027 - 18042 bp in length (average value 18038.5 ± 5.3151), the SSC length of 5 species of subgen. Amaranthus was 17937-17948 bp (average value 17941 ± 3.3665), and the SSC length of 8 species of subgen. Albersia was 18057 - 18124 bp (average value 18076.3 ± 22.6806) (Table 1; Figure 1). At SSC, there were about 77 bp InDels in ndhE-G and 180bp InDels in ndhG-I, which induced the variation of SSC length among subgenera (Figure 2). The frequencies of SNPs and InDels in the chloroplast genomes of the 17 species were 1.79% and 2.86%, respectively (Table 2). The frequencies of SNPs and InDels in the genes were 1.22% and 1.14%, and the frequencies of SNPs and InDels in the intergenic spacer were 3.25% and 7.32%, respectively (Table 2). In general, the variation mainly occurred in the intergenic spacer region, and InDels mainly occurred in the non-coding region (Table 2). The longest InDel was 387 bp, which occurred on ycf2, followed by 384 bp InDel on psbM-trnD.
Repeat and SSR analyses
Each species has 28 to 38 repeats, distributed in 30 locations, including 11 to 14 forward repeats, 11 to 17 palindromic repeats, and 6 to 8 reverse repeats ranging from 30 to 64 bp in length. There were 19 common repeats locations, of which 11 had no variation and 8 had variation in length. The R3, R8, R11 and R13 had the most abundant variation (Figure 3). The R12 (forward and reverse repeats) was distributed in LSC, IRa, SSC and IRb. The R12 on SSC is almost opposite to R12 on LSC, dividing the entire circular genome into two parts of nearly equal length. The repeats on LSC were mainly concentrated near Repeat 12 (loci 29572-46282), loci 8166-8327, loci 29572 and loci 75230. The repeats on IRs are constant within the genus. There were two common repeats in SSC, and one was a palindrome sequence shared by subgen. Acnida, subgen. Amaranthus, and A. albus.
MISA analysis showed that each cp genome of Amaranthus contained 29-39 SSRs (Table 3). On average, the number of SSR types from more to less was mono-, tetra-, di-, tri-, penta- and hexa-nucleotides in order (Table 3). About 55.56% of those SSRs were composed of A or T bases. Among all SSRs, most loci located in LSC (77.78 %) and IGS (71.91%). About 12 repeat motifs were shared by all species in the genus while the remaining motifs were species-specific or subgenus-specific (Table 3). Different combinations of SSR markers could distinguish all species except A. standleynaus and A. crispus, A. dubius and A. spinosus (Table 3).
Phylogenetic trees of whole chloroplast genomes
The results obtained in this study in limited samples were basically consistent with previous studies based on chloroplast gene sequences. A. palmeri, A. arenicola and A. tuberculatus in subgen. Acnida clustered together (BS/PP=100/1) (Figure 4). A. hybridus, A. hypochondriacus, A. dubius, A. spinosus, A. retroflexus clustered together (BS/PP=100/1) (Figure 4). And the above two clades were very close (BS/PP=100/1) (Figure 4). A. albus and A. blitoides were clustered (BS/PP=35/0.84) and separated from subgen. Albersia and were closely related to subgen. Amaranthus and subgen. Acnida (BS/PP=58/0.99) (Figure 4). A. polygonoides become a single basal branch. The rest of subgen. Albersia were clustered into one branch (BS/PP=100/1) (Figure 4).
Hotspots for Amaranthus
The partially qualified fragment regions searched by exhaustive method were overlapped, and the overlapped regions were combined together as a hotspot region. Finally, 16 hotspot fragments with a length of 737 to 2818 bp were obtained, and the SNP variation frequency ranged from 0.78% to 1.49% (Table S3). The topological trees constructed by the alignments of these 17 hot fragments and the topological trees constructed by the alignment sequences of each gene and intergenic spacer were consistent with the chloroplast genome topological tree, namely, the hotspots with more than 90% bootstrap value support for the subgen. Amaranthus, subgen. Acnida and subgen. Albersia branch (excluding A. albus, A. polygonoides, and A. blitoides) were ndhF-rpl32, ycf1 and rpoC2 (Figure S1).
In several similar taxa, there were 25 InDels and 11 SNPs between A. tunetanus and A. standleyanus. A. crispus and A. standleyanus had no difference. There are 46 SNPs and 144 InDels between A. arenicola and A. tuberculatus. By sequence alignment and variation analysis, it was found that trnK-UUU-atpF, trnT-UGU-atpB, psbE-clpP, rpl14-rps19, ndhF-D could be used to distinguish A. tunetanus from A. standleyanus, A. crispus, and A. arenicola from A. tuberculatus.