Complete Chloroplast Genome of Calligonum Mongolicum: Genome Organization, Codon Usage Pattern, Phylogenetic Relationships, Comparative Structure and Adaptive Evolution Analysis

The perennial shrub of Calligonum mongolicum is a dominant native plant in all Calligonum species, which has the largest and most widespread geographic distribution in arid deserts of northern China. Understanding the phylogenetic relationship between C. mongolicum and closely related plant species will offer guidance on the classication and identication of inter-species and their varieties. The chloroplast (cp) genome is an optimal model to decipher phylogenetic relationships and genome evolution in related plant families. In the present study, the complete cp genome of C. mongolicum was sequenced, and the characteristics were described, then the genomic structure was compared to other three Polygonaceae species.

the chloroplast genome (cp genome), the mitochondrion genome, and nucleus genome [11,12]. The complete cp genome, which possesses many characteristics of small size, simple and highly conserved structure, single parentalin heritance, and haploid nature, is widely applied for species identi cation, phylogenic analysis, and adaptive evolution analysis [13]. Huang et al. [14] sequenced the complete cp genomes of ve Dilciptera species, and compared the interspecies relationships of the ve species. Xue et al. [12] also nished the cp genome comparison of three economic trees from Prunus, found the genomic structure difference, revealed the possible molecular markers, and concluded the phylogenetic evolution relationships. Until now, studies on the cp genome sequences have been conducted in many plant species, for example, watercress, yellow mustard, Echinacanthus attenuatus, and so on [15][16][17]. Thus, the cp genome is an optimal model to decipher phylogenetic relationships and genome evolution in related plants families.
Chloroplasts is a photosynthetic organelle in plant cells that plays crucial roles in the photosynthesis and crucial metabolites biosynthesis, for example amino acids, starch, fatty acids and pigments [18]. In general, the length of the cp genome ranges from 120 -160 kb, the difference of which is mainly due to inverted repeated regions (IR) expansion, contraction or loss [19]. The cp genome encodes  unique genes, and is conserved with quadripartite organizations including a pair of IR regions (IRa and IRb), a large single copy (LSC) region and a small single copy (SSC) region [20]. For majority of higher plants, chloroplast is highly conservative in genes number, arrangement order and function [21]. However, gene losses and genome rearrangement can be found in some leguminous plants and conifer algae [22,23]. After the rst cp genome data of tobacco published in 1986, more and more plant cp genomes are completed sequencing, yet none of species from Calligonum L. has obtained the complete cp genomes data [24].
In the current study, the cp genome of C. mongolicum was constructed rstly by using Illumina sequencing and integrating a combination of de novo sequencing and reference-guided assembly. Then, the whole cp genome characteristics of C. mongolicum were described, and the synonymous codon usage (SCU) pattern, simple sequence repeats (SSRs) and long repeats were analyzed. Besides, we compared the cp genome of C. mongolicum with the published cp genomes of other three related species, including genome structure, IR contraction and expansion, and selective pressure events. This study may provide positive clues for adaptive evolution analysis of the Calligonum species.

Materials
The wild seedlings of C. mongolicum were transplanted from the Minqin desert to Lanzhou Scienti c Observation and Experiment Field Station of the Ministry of Agriculture for Ecological System in the Loess Plateau Area (36°01′N 103°45′E, altitude 1700 m), Gansu, China in 15 May 2019. The station was belonged to Lanzhou Institute of Husbandry and Pharmaceutical Science, Chinese Academy of Agricultural Sciences, and we were approved to perform plant species planting and cultivating. C. mongolicum is a common wild plant in the Minqin desert, thus collecting or transplanting it for scienti c research is not restricted in Ganu Province. Fresh leaf samples of C. mongolicum were collected in 22 July 2019. The sample collecting procedure was complied with our institutional guidelines. The voucher specimen was identi ed formally by expert on plant taxonomy from Gansu Grass Variety Committee and kept in Herbarium of Lanzhou Institute of Husbandry and Pharmaceutial Science (CYSLS-CmZhang20190722). Samples were quickly frozen in liquid nitrogen and conserved in -80℃ for the subsequent analysis.

DNA Extraction, Illumina Sequencing, Assembly and Annotation
Genomic DNA was isolated by the Plant Genomic DNA Rapid Extraction Kit (Biomed Gene Technology) with the modi ed CTAB method [25]. 1% Agarose gel electrophoresis and Qubit Fluorometer (Invitrogen) were used to check DNA integrity and quality. One library (350 bp) was constructed by using pure DNA with the NEBNext ® Ultra TM DNA Library Prep Kit for Illumina ® . The library was sequenced with an Illumina NovaSeq platform (Beganen Tech Solution CO., Ltd, Wuhan, China) and nally 150 bp paired-end reads were generated. The unknown reads and low-quality reads were ltered using SOAPnuke software (version: 1.3.0). Chloroplast-like reads were identi ed by BLAST (E-value ≤1e -5 ) with other related species.
Finally, the chloroplast-like reads were assembled using NOVOPlasty (version: 32) with the parameter of k-mer 39 to form a circular genome. The sequences were annotated using GeSeq, mainly containing coding gene prediction and non-coding RNAs annotation (rRNA and tRNA) [26]. The circular genome map of C. mongolicum was drawn through the OGDRAWv1.2 program [27].

Synonymous Codon Usage Bias Analysis
A number of the codon usage indicators were performed via the program codon W version 1.3 (https://sourceforge.net/projects/codonw/), including the relative synonymous codon usage value (RSCU), the effective number of codons (ENC), G + C content of the gene (GC), the frequency of the nucleotides G + C at the 3 rd position of synonymous codons (GC 3s ), and the base compositions (A 3s , T 3s , G 3s , and C 3s ) [28]. The RSCU value and ENC value were used together to describe codon usage patterns Page 6/32 [29]. The G+C content at the 1 st , 2 nd , 3 rd of codons (GC 1 , GC 2 , GC 3 ) and the average GC content of the 1 st and 2 nd (GC 12 ) were calculated by Cusp function from EMBOSS (http://imed.med.ucm.es/EMBOSS/) [30]. Synonymous codons with RSCU values >1.3 were identi ed as high frequency codons. The optimal codon of the gene was speculated as the codon with both the highest RSCU value and the largest ΔRSCU [31]. Parity rule 2 (PR2) plot mapping analysis was constructed to show the relationship of the values A 3 / (A 3 + T 3 ) and G 3 / (G 3 + C 3 ), and the data were distributed into four quadrants in a scatter diagram [32].
ENC-plot mapping analysis was performed to re ect the relationship of the ENC values against the GC 3S values [33]. Neutrality plot mapping analysis was used to analyze the relationship of the GC 12 values and GC 3 values of all the genes [34].

Long repeat sequences and SSRs analysis
Long repeats including forward repeats and reverse repeats were analyzed by REPuter (http://bibiserv.techfak.uni-bielefeld.de/reputer) [30]. The hamming distance was set to 0 and the minimal repeat size was 20 bp. The SSRs were analyzed through the Perl script MISA (version: 1.0), and the SSRs parameters were de ned as follows, the threshold of mononucleotide SSRs was ten repeats, the thresholds of dinucleotide and hexanucleotide SSRs were ve repeats [35].

Phylogenetic Analysis
Phylogenetic tree with the bootstrap replicates set to 1000 was constructed by neighbor joining (NJ) analysis through TreesBeST (Version: 1.9.2, http://www.mybiosoftware.com/treebest-1-9-2-softwaresphylogenetic-trees.html). The cp genomes of C. mongolicum and other 36 species were used to investigate the evolution of C. mongolicum. The cp genomes information of the 36 plant species were downloaded from NCBI database.

Genome Structure Comparison
Based on the above results of the phylogenetic analysis, the complete cp genome of C. mongolicum was compared with other three closely related species of Rumex acetosa (NC_042390.1), Rheumpalmatum (NC_027728.1), and Fagopyrumesculentum (NC_010776.1) using the mVista program with the shu e-LAGAN mode [36]. The annotation of C. mongolicum was used as reference. The IRscope tool was used to visualize the genes on the boundaries of the junction sites of these four closely related species [37].

Selective pressure analysis
The orthologous genes of the Polygonaceae family were identi ed by OrthoMCL [38]. The sequences alignment of each orthologous gene was conducted using MAFFT [39]. The non-synonymous substitution rate (K A ) and synonymous substitution rate (K S ) were calculated by PAML [40]. The ω value was the ratio of K A / K S .

Features of C. mongolicum Cp Genome
The cp genome of C. mongolicum was 162,124 bp in length, was comprised by a pair of IR regions (IRa and IRb) (30,512 bp), a large single copy (LSC) region of 87,718 bp and a small single copy (SSC) region of 13,382 bp (Fig. 1). The nucleotide composition of C. mongolicum cp genome was enriched in A/T nucleotides. The A + T content of the cp genome were 62.5%, which was signi cantly higher than the overall G + C content. The A + T content of the IR regions were 58.66%, obviously lower than LSC and SSC regions (64.42% and 67.51%, respectively) ( Table 1). Weak base composition asymmetry (A-T, C-G) was found in C. mongolicum cp genome.
The positions of the 131 functional genes annotated in C. mongolicum cp genome were shown in Fig. 1. 78 genes were protein-coding genes, accounting for the half portion (59.5%) of the total genes. The remaining genes included 45 tRNA genes and 8 rRNA genes. According to the different functions, all the annotated genes were classi ed into four classes, including photosynthesis, self-replication, biosynthesis, and unknown functions (Table S1). Seventeen genes were duplicated in the IR regions, harboring 6 protein coding genes, 4 rRNA genes, and 6 tRNA genes.
In C. mongolicum cp genome, 12 different genes possessed a single intron and two exons, containing 5 tRNA genes and 7 protein coding genes, whereas the protein coding gene of ycf3 and clpP had two introns and three exons ( Table 2). Of the total intron-containing genes, the gene of trnK-UUU had the largest intron (2511 bp), and the trnL-UAA had the smallest intron (520 bp).

Synonymous Codon Usage Analysis
A total of 51 coding sequences (CDSs) with length longer than 300 bp were screened for synonymous codon usage (SCU) bias analysis. In general, the four nucleotides were unevenly represented in the 51 CDSs. Adenine (A) and thymine (T) were the most represented (43.3% and 46.4%, respectively), cytosine (C) and guanine (G) were the least represented (16.7% and 16.9%, respectively), The average GC content of the CDSs was 38.7%.We identi ed the total of 61 synonymous codons except for stop codons, among which, the total of 18 codons with RSCU value more than 1.3 was identi ed as high frequency synonymous codons, 29 codons with ΔRSCU value more than 0.08 were identi ed as the high expressed codons (Table 3 and Table S2). 7 codons with high frequency as well as high expression including TTT, GGA, CAT, AAA, TTA, AAT and CCT were identi ed as the optimal codons.
To further analyze the SCU pattern in C. mongolicum cp genome, multivariate statistical analysis of PR2, ENC-plot analysis, and neutrality plot were combined conducted. PR2 plot mapping showed that the genes distributed unevenly in the four quadrants centered on 0.5, most points located under the horizontal centered line of 0.5 (the ratio of A 3 / (A 3 + T 3 ) < 0.5) (Fig. 2a). ENC plot was used to analyze the codon usage variation of the 51 CDSs (Fig. 2b). A majority of the points were lying away from the expected curve, accompanied with a relative concentrate distribution, and except for some points (rp116, ycf2, ycf3, and so on) located on the curve. Besides, we performed neutrality plot analysis to reveal the relationship of GC 12 and GC 3 (Fig. 2c). Only one gene of ycf2 located on the effected curve, the remaining genes were up the standard curve.

Long-Repeat Sequences and Simple Sequences Repeats (SSRs) Analysis
The long repeat sequences in C. mongolicum cp genome were searched by REPuter software. A total of 50 long repeats were detected, 44 were forward and 6 were reverse repeats (Table 4). A majority long repeat sequences were only located in intergenic spaces (IGSs) (47%), 39% long repeat sequences were distributed in different genes, and the remaining long repeats (14%) were detected both in IGSs and genes. It was worth noting that the six reverse repeats were all located in IGSs. Besides, a total of 17, 1, 12, and 10 repeats harbored only one region of LSC, SSC, IRa and IRb regions, respectively. Another 10 repeats were detected simultaneously in two regions. Ycf1 CDS possessed the highest number of long repeats (14) and the longest repeats at 45 bp.
A total of 244 SSRs were found in C. mongolicum cp genome using MISA perl script. Among the identi ed SSRs, 67.2% was located in the LSC regions, 23.0% and 9.8% were found in the IR and SSC regions, respectively (Fig. 3a). 158 SSRs were located in IGSs, 80 SSRs were found in the coding regions and only 6 were found in introns (Fig. 3b). The numbers of mono-, di-, tri-, and tetranucleotides were 147, 43, 4, and 7, respectively (Fig. 3c). Mononucleotide repeats were the most frequented, accounting for 60.2% of the total repeats, while dinucleotides repeats accounted for 17.6%, and other SSRs were less common. Among all the identi ed SSRs, 20 SSRs belonged to G/C types, and the remaining SSRs belonged to the A/T types.

Phylogenetic Analysis
The phylogenetic tree was constructed based on a multiple alignment of nucleotide sequences of complete cp genomes from 37 plant species (Fig. 4). Drosera rotundifolia was used as the outgroup. The results showed that the species in the Polygonaceae family formed a clade, C. mongolicum was clustered closely to R. acetosa, R. palmatum, Oxyria sinensis, and F. esculentum. Furthermore, R. acetosa was the most related plant to C. mongolicum.

Comparative Analysis of Genomic Structure
The cp genome of C. mongolicum was compared to its closely related species including R. acetosa, R. palmatum, and F. esculentum (Table 5). C. mongolicum had the largest cp genome size, the largest SSC region and the most tRNA genes. F. esculentum had the smallest cp genome size and the largest LSC region. To further verify the genome divergence among these four species, sequence identity was compared using mVISTA with C. mongolicum as a reference (Fig. 5). Generally, IR regions were relatively conserved, while LSC and SSC regions were more divergent. Higher divergence of conserved non-coding regions were found than coding regions, for example, the IGS regions of rps16 and tmQ-UUG, ycf3 and tmS-GGA. Besides, signi cant differences were found in the regions of coding genes (petD and ndhA) and non-coding RNAs (tml-GAU).

IR Contraction and Expansion
The LSC/IR and SSC/IR boundaries of the cp genomes of C. mongolicum and other three related plant species were compared (Fig. 6). Six different genes were located at the juncture of the LSC/IRb (rps19 and rp12), IRb/SSC (ndhF), SSC/IRa (rps15 and ycf1), and IRa/LSC borders (rp12 and trnH), respectively. The ndhF gene crossed the IRb/SSC border, with 62-95 bp lengths within IRb region. Compared to other species in the Polygonaceae, the borders of the IRb/SSC and SSC/Ira in C. mongolicum changed greatly. The LSC/IRb and IRa/LSC borders were relatively conserved in C. mongolicum, R. palmatum, and F. esculentum, however the rps19 gene at the LSC/IRb border and the trnH gene at the IRa/LSC border in R. acetosa varied from the other three species.
Selective pressure events A total of 75 orthologous protein-coding genes were found in the family of Polygonaceae. The ω values of most genes were lower than 1, except for the psbK gene found in the LSC region, which had a ω value of 1.0556 (Figure 7). The ω values of some genes were 0, such as psbI, petN, ycf3, psbE, petG, rps12, and ndhE.

Discussion
Features of the C. mongolicum Cp Genome Cp genomes of land plants are mostly conserved in structures, gene content, and organization of content [41]. Generally, the cp genome is a typical quadripartite circular structure and composed by two IR regions, large (LSC) and small single-copy (SSC) regions [42]. However, Tao et al. [21] reported alfalfa had a special cp genome structure with only one IR region. Besides, the linear cp genome is existed which is different from the typical plant cp genomes with a single circular molecule [43]. In this study, the complete cp genome of C. mongolicum revealed a typical circular and quadripartite structure, implying the relatively conserved cp genome in land plants. Despite the structures of cp genomes in different plant species are overall conserved, the size of which varies from 107 kb to218 kb [19]. The cp genome of C. mongolicum was ~162 kb, longer than the closely related plant species of R. acetosa (~160 kb), R. palmatum (~161 kb), and F. esculentum(~159 kb) to a certain extent. We also found that the A + T contents of the IR regions were signi cantly lower than other regions, which was similar to the observations in Prunus species, Quercus acutissima, and Phleum pretense and so on [14,42,44].
In land plant cp genomes, gene and intron content are highly conserved, although losses of them have been found in many angiosperms [11]. Funk et al. [45] found the losses of ndh genes in Cuscuta re exa, they speculated the genes might be transferred to nuclear or the genes did not part in the critical life development. Here, we analyzed the genes and intron contents in C. mongolicum cp genome. The cp genome of C. mongolicum exhibited a complete set of genes (131), suggesting these genes might be critical to its development. Our results were similar to the nding of 130 genes in Nelumbo nucifera cp genome [46]. The cp genomes of the earliest diverging angiosperms contain the complete repertoire of 18 genes with introns [11]. Additionally, in many plants, the loss phenomenon of introns within proteincoding genes is often occurred, for example, Cicer arietinum, Mahonia bealei, and Hordeum vulgare [47][48][49]. The proteins encoded by genes with intron loss possess diverse functions. In C. mongolicum cp genome, we searched a total of 14 different genes with intron, and 4 intron losses were found, including ycf1, rpoC2, rps19, and ndhF. The genes with intron losses might endow C. mongolicum diverse functions on RNA polymerase, ribosomal proteins, and NADH dehydrogenase.

Synonymous Codon Usage Pattern
SCU bias re ects uneven usage of synonymous codons with the same amino acid, which is different among different species and genes [50]. The possible causes of SCU bias have been investigated in the genomes of numerous living organisms, for example, Zea mays, cotton, Arabidopsis and so on [51][52][53]. In this study, 51 CDSs of C. mongolicum cp genome were selected to analyze the SCU bias. AT/GC nucleotide usage differed among the three positions of codon, and the genes showed a preference for ATending codons, thus revealing a SCU imbalance of A/T and G/C at the third base position. This speculation was further con rmed by PR2 analysis. The similar observations were also found in Elaeagnus angustifolia, Porphyra umbilicalis and so on [17,54].
In the case of random mutation or mutation pressure in a certain direction, there should be no change in the three different positions of each codon and the base content should be similar [31]. Thus, the preference for A/T ending bases would drive the observation of natural selection competing against mutation pressure. In order to analyze the two major evolutionary factors on codon usage in C. mongolicum, we constructed ENC plot analysis and neutrality plot analysis. ENC plot analysis re ects the relationship of ENC value and GC 3s , thus detecting the SCU variation among the genes [33]. The distribution comparison of genes and the standard curve could be indicative for some other factors except mutation pressure [33]. In our study, it was observed that a few genes were lying on the curve, which de nitely originated from the extreme mutation pressure. However, a majority of the points were lying well below the curve, suggesting that a majority of genes in C. mongolicum cp genome had other SCU bias, for example natural selection. This hypothesis was largely supported by neutrality plot mapping analysis. Neutrality plot analysis is useful to compare the impacts of selection constraints and mutation on SCU [55]. The low correlation between GC 12 and GC 3 shows that the base composition of the 3 positions are different, and the GC content of the cp genome is highly conserved, indicating that natural selection is the most important determinant of codon usage patterns [56]. In the present neutral graph, no correlation was found between GC 3 and GC 12 , indicating strong difference appeared, and natural selection would be crucial for SCU bias in C. mongolicum cp genome.

Molecular Markers
Long repeat sequences are associated with the sequence divergence and rearrangement of the cp genomes for illogical recombination and slipped-strand mismatch [57]. In C. mongolicum cp genome, more long repeats were found in LSC region than in IRs and SSC regions, verifying an uneven distribution phenomenon of long repeats in cp genomes. Many studies indicate that the sequence divergences are higher in the LSC and SSC regions than IR regions, also higher in IGSs than coding regions [58,59]. Thus, the long repeats appeared in LSC and SSC regions, such as ycf3, ndhA, psaA and psaB, might help reducing sequence mutations in these regions. For the abundant variable sites, the gene of ycf1 was been reported to be used in DNA barcodes [60]. Huang et al. [14] recommended ycf1 could be used as a candidate molecular marker for the Diciptera species. Similar to the nding in Dicliptera species, ycf1 harbored the highest number and the longest length of long repeats in C. mongolicum cp genome, thus we speculated ycf1 might be used as an important molecular marker for phylogenetic elucidation in Calligonum species.
SSRs, known as microsatellites, have many forms in the cp genome, and the variations in SSRs copy numbers are different between plant species [61,62]. SSRs are usually used as potential genetic markers for plant population genetics, polymorphism investigations, and evolutionary research [16,63]. Different to the nuclear genome, SSR variations in the cp genome exhibit outstanding characteristics in evolutionary study since it is sensitive to population genetic effects and exploration the maternal gene ow of populations [64]. In the present study, the SSRs identi ed in C. mongolicum cp genome were inclined to A/T types, which were similar to the observations in three Prunus species and Nasturtium o cinale [12,15], supporting the supposition that cp SSRs are generally composed by poly-adenine/thymine (polyA/T) repeats [42]. Similar to long repeats, morenumbers of SSRs in LSC region than IRs and SSC region also implied an imbalanced repeats distribution in the cp genomes. Besides, the presence of the SSRs is indicators of important hotspots for genome recombination [14]. SSRs in C. mongolicum cp genome mainly located in IGSs which possessed high variable feature in the cp genome, suggesting that these regions could be treated as mutational hotspots. All the long repeats and SSRs identi ed in this study might furnish more venues for potential genetic markers in species identi cation and phylogenetic researches of Calligonum species.

Phylogenetic Analysis and Genomic Structure Comparison
Cp genome displays vast phylogenetic information, is usually used for phylogenetic reconstruction and population studies [65,66]. In order to explore the phylogeny location of C. mongolicum and clearly elucidatethe genetic evolutionary relationships within Polygonaceae, we performed phylogenetic analysis based on the cp genome of C. mongolicum and other 36 cp genomes data of plant species. From the phylogenetic tree, all the 5 studied Polygonaceae species were clustered together, and C. mongolicum was accommodated as the neighbouring clade to the branch of R. acetosa accompanied with a 100% bootstrap value, fully con rming the genus Calligonum as a member of Polygonaceae. Thus, the constructed phylogenetic tree could be useful to con rm the phylogenetic position of C. mongolicum and further understanding the phylogeny relationships among more Polygonaceae species in the future.
Although cp genomes have highly conserved structures of most plants, four regions have varied genome sizes [49,67]. The genomic structure comparison of four Polygonaceae species including C. mongolicum showed that they possessed different sizes in the four regions, mainly due to their different genus classi cations. Among the four cp genomes of Polygonaceae species, more divergences emerged in LSC region, and higher divergence of conserved non-coding regions were found than coding regions. Our results were similar to the reports in other angiosperms, for example, Kaempferia galangal [68] and four Echinacanthus species [17], implying that less divergence in the IR regions and coding regions possibly might cause copy corrections in the process of gene conversion. Besides, as the most conserved regions, frequent expansions and contractions at the boundaries of SSC/IR and LSC/IR can re ect the taxa relationships, thereby recognizing as evolutionary signals [69]. Compared to C. mongolicum cp genome, the IR regions of R. acetosa cp genome exhibited a slight contraction, and to the contrary, in F. esculentum and R. palmatum exhibited a slight expansion. Thus, the contractions and expansions at the two boundaries would contribute to the size variations of the four Polygonaceae species cp genomes.

Adaptive evolution analysis
Adaptability improvement of a species during evolutionary progress is re ected by adaptive evolution [14]. Adaptive evolution, driven by evolutionary factors (for instance natural selection), leads to pressures and diversity at different biological organization processes [70]. Makalowski and Boguski [71] reported that the ω value had been widely applied to identify the evolutionary dynamics and explore adaptive characteristics among species. The gene is under positive selection when the ω value is more than 1.
Otherwise, the gene is negatively selected. The positively selected genes play important roles in diverse environment adaptation [17]. In the present study, the ω value of psbK gene was more than 1, suggesting psbK was under positive selection. Similar to our result, psbK was also detected under positive selection in karst topography [17]. Thus, we speculated the gene psbK could play an important role in the adaptation evolutionary process of Polygonaceae species to the diverse environment, and its unique function needs to be validated in further study.

Conclusions
In the study, the complete cp genome of C. mongolicum was rstly depicted comprehensively, including genome features, SCU bias, identi cation of long-repeat sequence and SSRs, and adaptation evolution analysis. The cp genome of C. mongolicum was a typical quadripartite structure and 131 functional genes were annotated, while 4 intron losses including ycf1, rpoC2, rps19, and ndhF were found. Codons in C. mongolicum cp genome presented A/T ending preference, possibly caused by major natural selection constraints. The phylogenetic analysis among 37 species revealed C. mongolicum was closely related to R. acetosa. Besides, comparative analysis of genomic structure of C. mongolicum and other three Polygonaceae species was conducted, revealing more divergence in LSC and SSC regions than IR regions, also more divergence in IGSs than coding regions, and these divergent regions could be treated as mutational hotspots. Furthermore, expansions and contractions at SSC/IRs and LSC/IRs junctions were also analyzed. A total of 50 long repeats and 244 SSRs were identi ed. The adaptation evolution analysis showed that only psbK gene was positively selected, thus psbK might play a crucial role in the adaptation of Polygonaceae species. In summary, our results would lay a vigorous foundation for further study on molecular marker exploration, phylogenetic signature, and population studies in Calligonum species.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publish
Not applicable.

Competing interests
The authors declare that they have no competing interests.

Availability of data and materials
The raw sequence data of C. mongolicum