Population genetic structure and gene flow of rare and endangered Tetraena mongolica Maxim revealed by reduced representation sequencing

DOI: https://doi.org/10.21203/rs.3.rs-19749/v1



Studying population genetic structure and gene flow of plant populations and their influence factors is crucial in field of conservation biology, especially rare and endangered plants. Tetraena mongolica Maxim (TM), belong to Zygophyllaceae family, a rare and endangered plant with narrow distribution. Due to excessive logging, urban expansion, industrial development and development of the scenic spot in the last decades, has caused habitat fragments and decline.


In this study, the genetic diversity, the population genetic structure and gene flow of TM populations were evaluated by reduced representation sequencing technology, a total of more than 133.45 GB high-quality clean reads and 38,097 high-quality SNPs were generated. Analysis based on multiple methods, we found existing TM populations have moderate levels of genetic diversity, very low genetic differentiation and high levels of gene flow between populations. Population structure and principal coordinates analysis showed that 8 TM populations can be divided into two groups, Mantel test detected no significant correlation between geographical distances and genetic distance for the whole sampling. The migration model indicated that the gene flow is more of an north to south migration pattern in history.


Our study demonstrate that the present genetic structure is mainly due to habitat fragmentation caused by urban sprawl, industrial development and coal mining. For recommendations of conservation management, all 8 populations should be protected as a whole population, rather than just those in the core area of TM nature reserve, especially the populations near the edge of TM distribution in cities and industrial areas deserve our special protection.


The population genetic structure is the result of the interaction between ecological and genetic processes, the study of genetic structure and gene flow are an important part of understanding the genetic characteristics and the dynamics of population, especially, for rare and endangered species with a limited geographical range, because their populations are more vulnerable to damage, gene flow is considered one of the most important determinants of the genetic structure of plant populations [1], in addition to the influence of plant characteristics, historical events of the population, ecological factors, natural selection factors, interference of human activities, historical events and other factors will also lead to the formation of genetic structure in the population. Habitat fragmentation by human activity are serious threats to the viability of plant populations that may causes the population to shrink or disappear [2], the primary driver of population decline is that continuous population and large habitats were divided into isolated patches and smaller pieces, some research have shown that urban development does restrict gene flow among populations or individuals even within areas that comprise largely natural habitat [3], and further causes habitat degradation and fragmentation [4].

Genetic diversity, population genetic structure and gene flow studies have early relied on some traditional molecular markers (Random Amplified Polymorphism DNA, Simple Sequence Repeats, Amplified Fragment Length Polymorphism, Restriction Fragment Length Polymorphism and so on) and thus provided a limited overview, however, these markers have their limitations in resolution the geographical differentiation of population, especially when using few genetic markers, in addition, developing traditional molecular markers takes time and effort, complicated steps and costly. At present, whole genome sequencing is still cost, especially sequencing of hundreds samples, with the rapid development of high-throughput sequencing technology, so far, scientists have developed various reduced-representation genome sequencing by restriction enzyme digestion of genomic DNA [5-7], genotyping-by-sequencing (GBS) methods are increasingly applied to SGS and gene flow analysis [8-11]. GBS method can obtain a large number of SNPS to allow resolving patterns of genetic diversity and spatial structure at very fine scale [12], compared to other molecular markers, GBS approaches have well suited to unravel genetic diversity, genotyping and genetic structure in nonmodel species without reference genome [13]. Employing large GBS datasets to biogeographic studies can vastly increase their accuracy and resolution relative to traditional molecular markers, especially subtle changes in the genetic structure of the population.

Tetraena mongolica Maxim(TM) belongs to Zygophyllaceae family, is an endangered species, it is only distributed in the Alashan Ordos area of the Inner Mongolia in china, its habitat belongs to ecologically fragile area by extremely low annual rainfall, TM is able to survive in these places by depending on its very developed roots, and has been got a lot of attention for its extremely resistant to drought and cold, Windbreak and sand-fixation. However, due to excessive logging (as a firewood) and overgrazing, urban expansion, industrial development and coal and gas mining, habitat fragmentation amd the numbers of TM have declined significantly in recent decades, the area of TM decreased by 225.2 km2 (from 629.5 km2 in 1990 to 404.3 km2 in 2010) declined by up to 35.8%, and the reduced area was mainly transformed into mining area, sandy land, farmland and urban area [14](Wang, 2012). It has been listed as first degree national protective plant at present. Previous studies have been mainly focusing on reproductive characteristics [15,16], anatomy, vegetative cuttings [17], chemical components [18,19], eco-physiological adaptation [20], landscape pattern [14], influence of mining [21], salt-tolerance and drought-tolerance genes [22,23], genetic diversity analysis [24-26] and so on. In order to study the population structure and endangered mechanism of TM, it is urgent to further study TM from the perspective of molecular biology and ecology. Therefore, studying TM population genetic structure and gene flow is of great significance for the protection of this species and the ecosystem of this region. As a representative material for studying endangered species protection in desert ecosystems, TM has the following advantages: 1) A limited distribution. 2) The biological community structure in TM distribution area is relatively simple, TM is the dominant species. 3)  The history of urban construction in TM area is short and the interference of human activities is clear.

Understanding the geographic patterns of genetic variation, population structure and gene flow of this plant will be crucial for their effective conservation, we used reduced representation sequencing technology (SuperGBS method ) to evaluate patterns of genetic diversity, structure and gene flow of TM population across its entire geographic distribution range, based on a genome-wide SNPs analysis, to reveal new understanding of genetic structure and gene flow of TM under urbanization, apparent hydrographical and geological barriers (Such as the Yellow River, ZhuoZi mountain). We characterize biogeographic patterns of TM population and possible factors for its formation. we assess “isolation by distance” based on large-scale spatial genetic structure. According to these results, finally, we put forward some constructive suggestions for better protection and follow-up management.


SNPs Discovery, Genotype and Population Genetic Diversity

120 TM accessions was sequenced using Illumina Hiseq Xten, PE150 Platform, after the primary quality filtering step, a total of 981,376,270 reads, 133.45 Gb clean reads were generated, average of per accession were 1.11 Gb reads, according to the comparison results, the average sequencing depth of all samples was 104.17×, and the coverage ranged from 77.89% to 89.66%. With a minimal set of initial quality filters, we obtained 40,931 SNPs, after rigorously screening (standards: the missing rate < 0.2, MAF ≥ 0.01, DP > 4), 38,097 SNPs genotyping data were obtained, several values of genetic diversity were calculated, all 120 individuals as a whole, the Ne varied from 1.020 to 2 and average was 1.616, the levels of Ho and He ranged from 0.067 to 0.933 and 0.019 to 0.500 with an average of 0.446 and 0.348 respectively, the PIC values varied from 0.019 to 0.375 with an average of 0.274 (Table 1), a total of 35,414 (92.95%) SNPs had a MAF more than 0.2, a high degree of polymorphism (MAF > 0.30) was 18,800 and accounted for 49.34 % of the dataset, the nucleotide diversity (π) varied between 0.019 and 0.502 with an average of 0.349. For each population, HN population had the highest He, Ne, PIC and π among all eight populations (average 0.341, 1.608, 0.269 and 0.354 respectively), these four numbers was the lowest at SZS population (0.337, 1.599, 0.265 and 0.349 respectively). Above data indicated that TM had moderate level of genetic diversity on the whole. For different populations, HN population showed a higher genetic diversity than 7 other populations, according to the genetic diversity data and fine geographic scale analysis, HN population was middle ground of north-south intersection in TG distribution area, because north wind blows all year round and the Yellow River run from south to north in TM distribution area, we speculate that HN population most likely accepted large amounts pollen from other populations or gene flow mediated by seeds, in addition, HN population  located in the national nature reserve of TM, therefore, HN populations have higher levels of genetic diversity than other populations, this result was consistent with Ge et al. [25]. Conversely, SZS population located to the southernmost point in TM distribution area west of the Yellow River, it was adjacent to several big industrial park in HuiNong district and backfill zone of open-pit coal mine, coal mining and industrial development have caused fragmentation and pullution of SZS population local habitat, this result suggests that the SZS population may have been experiencing a slowly reduction in genetic diversity as it is affected by human activities.

Population structure and phylogenetic analysis

To understand the population genetic relationships among all TM individuals, ADMIXTURE (version 1.3.0) software was used to evaluate genetic structure, the K-value that represent the number of clusters of all 120 TM individuals during the analysis, the optimal K-value was selected according to CV, in this study, CV ranged from 0.436 to 0.575 (Figure 2A), we mainly focused on both cases (K-value = 2 and K-value = 3), because they most clearly showed the major structure of these 8 populations. When the CV was 0.436, the optimal K = 2, it indicates that 120 TM individuals from eight populations were divided into two groups with highest probability (Figure 2B), one group was composited of QLS, TST, WJM population’s all individuals and 7 individuals of HN population, another group contained WD, MHG, SDQZ, SZS population’s all individuals and 8 individuals of HN population. it was compared geographical locations to analysis, the two groups were divided by Wuhai urban area as the boundary, HN population distributed in near the Hainan district, Wuhai city, it was middle ground of north-south intersection in TM distribution area, it suggest that this population could be a pathway and bridge between the two groups, but to consider from a different perspective, urban expansion and human activities caused a certain degree of fragmentation of the TM population. At this scale, increasing values of K (K = 3), all 120 TM individuals from eight populations were split into three groups  (Figure 2B), first group was composited of all individuals of QLS, TST, WJM remained grouped a single cluster (blue in Figure 2B), second group contained four populations (WD, MHG, HN, 13 individuals of SDQZ), third group mainly composited of SZS and 2 individuals of SDQZ. In terms of geographical distribution pattern, individuals in first group came from the north of Wuhai urban area, four populations in second group distributed around the Wuhai urban area, third group composited of two populations perched on the south of Wuhai urban area. Although there was also a large number of genotype admixtures, a certain genetic structure was found between different populations, we determined finally that K=2 was the most suitable number of genetic cluster based on the experimental populations corresponded spatially with major geographical features.

To complement ADMIXTURE analyses and to better understand genetic structure, we PCA analysis based on obtained SNPs among all 120 TM individuals, a benefit of PCA is that it can detect fine scale genetic structure between populations, even there were extensive gene flow between these populations [27]. PCA analysis generally confirmed the pattern of genetic structure obtained with K = 2. Our results showed two principal components accounted for 2.19% (PC 1) and 1.67% (PC 2) of total variability, respectively (Figure 3), all 120 TM individuals can be divided into two distinguished groups at PC1, one was composited of QLS, TST, WJM population’s all individuals, another contained HN, WD, MHG, SDQZ and SZS population’s all individuals, in this way, the first group (QLS, TST, WJM) individuals showed low and negative values, another groups individuals displayed positive values. Under PC 2, 120 TM individuals roughly splited three clusters: one comprising all individuals of QLS, TST, WJM population, second composed only of individuals of SZS population, third contained WD, MHG, SDQZ, HN individuals. The result of PCA analyses showed genetic clusters be similar to the result of ADMIXTURE analyses, illustrating a clear divergence between groups. Moreover, compared to other populations, high discretization of individuals of SZS implied this population was more heterogeneity and may be seriously disturbed. During resource survey and sampling, we found that SZS was sandwiched between the Yellow River and the national highway and was close to the industrial park and backfill zone of open-pit coal mine in HuiNong district, judging by this, it is a small population separated from early urban sprawl and industrial development, combined with our PCA data, this result further supported our hypothesis that SZS population is most disturbed by human activities.

To better visualise sample distribution and relationships, a phylogenetic trees of 120 TM individuals were constructed (Figure 4), within the population, the individuals of each population get together except HN, but there are very clear genetic boundaries between different populations. 15 individuals of HN population were divided into two parts and grouped into different clades, 9 individuals of HN population and MHG population got together, the rest of 6 individuals of HN population clustered a clade alone, QLS, TST, WJM were close and grouped together into a clearly large cluster, WD and SDQZ clustered together (but a few individual position swaping, e. g. individual SDQZ8 and individual WD6), 15 individuals of SZS population clustered separately, SDQZ2 formed a branch alone, visibly, most genetic variation occurs primarily between populations of TM, in addition, HN population was assigned to different branches that suggested introgression exist.

Population structure, principal component and phylogenetic relationship indicated existing 8 TM population were divided into two groups with high genetic resolution, Zhi et al.[26] have evaluated the genetic structure in TM populations by only 12 microsatellites SSRs, their results showed no distinguishable genetic clusters were detected in populations. However, our study found the obvious genetic clusters and a sign that TM population is beginning to differentiation base on 38,097 SNPs genotyping data, We hypothesized that as the number of detected sites increased (38,097 SNPs VS 12 SSRs), more subtle genetic structures were discovered. Compared the PCA with fine geographic scale analysis, Wuhai urban area is the north-south boundary of TM population.

Genetic differentiation and gene flow

Although the phylogenetic relationship and population structure analyses showed that TM populations were divided into two groups clearly, surprisingly, we detected very low levels of genetic differentiation and high levels of gene flow (Nm) between populations, genetic differentiation coefficient (Fst) ranged from 0.010 to 0.026 among 8 populations and average 0.020 within 120 TG individuals (Table 2), Wright [28]believed that if the Fst value of the populations was between 0 and 0.05, it indicated that there was no genetic differentiation among populations, if Fst value is between 0.05 and 0.15, it is moderately differentiated, if Fst value is between 0.15 and 0.25, it is highly differentiated, in this case, 8 TM populations without differentiation should be understood as a whole in our experimental, our results on genetic differentiation are consistent with those of Ge et al. [24] and Zhi et al.[26]. Gene flow varied from 9.750 to 39.169 and average 12.25 between 8 populations in our study (Table 2), according to Wright [28], the Nm was divided into three grades: high (≥1.0), medium (0.250-0.99) and low (0.0-0.249) [29], when Nm > and 1 existed, there was certain gene flow between populations, our data suggested that there was frequent gene flow mediated by pollen or seeds between the 8 populations. In previous studies, by comparing the the atpB-rbcL noncoding spacer region of the chloroplast DNA of TM, the findings of Ge et al. showed high levels of genetic differentiation (Fst of 0.38-0.90) and medium levels of gene flow (Nm of 0.04-0.71) between TM populations [26]. On the contrar, we found strong gene flow and almost no genetic differentiation between individuals or populations based on 38,097 SNPs, high intensity gene flow between 8 populations indicated Yellow River, Menggu Mountains, Zhuozi Mountains were not barrier to pollen and seed dispersal.

The directions of the gene flow between different populations were estimated, the result showed “north to south migration model” was the highest probability among the five speculative models (Table 3, based on the Bayes factor and the marginal likelihood), although from the Numbers of gene flows shown there is a high degree of gene flow between different population, north to south migration model suggested the WJM population may have been an ancestral population and then spread southward with prevailing winds.

Correlation analysis between genetic distance and geographic distance

To understand the effects of geographic distance on population, we computed the Mantel test and did not detect significant correlation between geographical distances and genetic distance for the 8 populations ( r = 0.358, P (rxy-rand > = rxy-data) = 0.094) (Figure 5). Although the two most distant populations are more than 118 kilometers apart, some populations are separated by the Yellow River and Zhuozi mountain, “Isolation by distance (IBD)” model was not found in our study, according to the genetic distance of the fine geographic scale analysis, this result suggests that geography distance is not the key factor affecting the existing distribution pattern of TM. Habitat fragmentation usually leads to isolation by distance [30], but most plants pollen flow may be a strong cohesive force [31], it is possible that overall high gene flow attenuate this isolation effect.


Spatial genetic structure is considered to be a key factor in the short-term evolution of the population [32], it is the result of  interaction between plant characteristics and ecological factors and is also influenced by human interference, historical events of the population, natural selection and other factors. For species with narrow distribution range, limited gene flow is the main influencing factor of small-scale spatial genetic structure. Species with limited gene flow are prone to the phenomenon of individual aggregation, thus forming the spatial genetic structure of population. TM is a monotypic genus of the Zygophyllaceae and a typical narrowly distributed endemic species, it lives in a very hostile environment but it is the dominant species in its distribution area, the number and structure of TM population plays an important role in the overall biodiversity and ecological system in this regions. However, genetic structure and the main influencing factors underlying relatedness of TM populations  have not been addressed to date through fine-scale analyses of genomic data, previous studies conducted by Ge et al. [24,25] and Zhi et al. [26] provide conclusion of the diversity and preliminary genetic structure analyses among existing populations using only a few molecular markers. Here, we analyzed all of the TM population using reduced representation sequencing technology (the Super-GBS method), the average sequencing depth of the samples was 104.17×. Relevant research have verified if the sequencing depth is more than 5×, the accuracy of genetic structure and gene flow analysis can be guaranteed [33,34].

Through analyses of nuclear genomic data we found some evidence for level of genetic diversity within this species. Several genetic diversity parameters from our study indicated both as a whole and as a population showed only moderate levels of genetic diversity, not a high level from Zhi et al.[26], geographical distribution is one of  factors that affect the genetic diversity of plant species, species with wide range tend to have higher level of genetic diversity [35], our data show that the genetic diversity level of the population (HN) in the north-south intersection in the TM distribution area is higher than that of the marginal population (SZS), the HN population is an intermediate population for gene exchange between the north and south population, it received more gene flow mediated pollen and seed, on the contrary, the SZS population was a marginal population located at the southern tip of TM distribution region, the low level diversity of SZS population indicated that it was seriously disturbed by human activities (habitat destruction caused by coal mining, industrial pollution, excessive logging), individuals on the edge of the population are more likely to be killed by random events, resulting loss of genes carried by these individuals, thereby reducing the genetic diversity of the population, other populations of TM experienced relatively little disturbance due to low accessibilities. According to ecological theory, in general, small populations of narrowly distributed species are expected to exhibit low levels of genetic variation, but high levels of genetic differentiation among populations [25], therefore, in addition to the core areas of nature reserves, small populations outside nature reserves need more protection.

Evidence of population genetic structure and gene flow presented here may provide a different insights and previously undocumented subtle genetic structure for understanding current distribution pattern of TM populations. Population structure and PCA analysis showed that the existing 8 populations was divided into two groups based on large-scale sequencing data, the results were different from those of previous studies that have shown no apparent genetic structure [26], in general, the Yellow River is a natural barrier for many species distributed in its basin, surprisingly, the two groups are not bounded by the Yellow River which is the second longest river in China, on the contrary, in terms of geographic distribution, Wuhai city and surrounding industrial areas as the boundary between this two groups. In addition, individuals of HN population were assigned to different groups or clades in population structure and phylogenetic trees, it corresponds to the geographical distribution pattern, the HN population is located in the middle of long and narrow distribution zone, it revealed that the HN population is more genetic introgression and possesses more frequent communication with other population.

Fst values and gene flow of among 8 TM population were measured, the results showed no genetic differentiation and extremely high levels gene flow ( Nm varied from 9.750 to 39.169 ) among different populations, it indicated geological landforms such as Yellow River, Menggu Mountains, Zhuozi Mountains were not barrier to gene flow. Additionally, a lot of admixture was observed in all populations in population structure, some theories thought that admixture, reflecting allele sharing, can result from incomplete lineage sorting of historically contiguous populations [36], it suggested all TM individuals in history are likely a continuously distributed large population connected by extremely large gene flow. Then, in the case of such strong gene flow, how to understand the results of genetic structure and PCA analysis (The existing TM population were divided into two groups), we didn't think that's a contradiction, restricted gene flow is the main reason for the formation of  genetic structure in population, the more restricted the gene flow between populations or between individuals, the more prominent the population genetic structure [37], apparently, the development of Wuhai city and the increasing number of industrial areas nearby have weakened the gene exchange between the northern and southern populations to some extent, human activity interference was the most direct and major influence factors in recent decades.

The directions of the gene flow was evaluated by migration model, according to our data, north to south migration model was the highest probability according to the marginal likelihood and the Bayes factor among the five speculative models (Table 3), this suggests that the northern population (WJM population, during the resource survey, we found that WJM population had the largest distribution area and the largest number of individuals) may be ancestral populations and spread southward with prevailing winds in history, pollen-mediated gene flow would be limited by the migration distance of pollination insects, plus gravity, the winds also cannot spread the seeds over long distances, therefore, the main diffusion mode of gene flow is short distance step by step, although all eight populations can be considered as one large group according to average 12.25 of gene flows, this is the result of long-term gene exchange after population expansion into stable habitats. In addition, gene flow of short distance can make the population with narrow distribution reach a dynamic balance by long-term evolution in the absence of interference, but once interference is present, it is likely to accelerate the generation of genetic structures. In addition, the Yellow River has made frequent floods in its history and running to lower flood plain, may have had an effect on the fragmentation and diffusion of the population, but the effects of urban sprawl and industrialization on adjacent migration of gene flow may be more dramatic.

In order to explore the spatial genetic structure of different source populations of species, Mantel test is often used to verify whether the species conform to the model of IBD, and to analyze the correlation between their genetic distance and spatial distance [38], results of Mantel test showed no correlation between geographical distances and genetic distance for the 8 TM populations, this result was consistent with the high levels of gene flow that we detected, as an insect-pollinated plants with narrow distribution area, 8 TM populations are in the same or similar ecological environment, and the genetic differentiation between populations caused by environmental pressure is not obvious, and the correlation between internal genetic distance and spatial distance is not significant. Although the biological characters of TM, such as the high rate of ovule abortion and low seed setting percentage is considered the main threat in the past study [15,39], additional threats such as urban sprawl, industrial development and pollution, habitat fragmentation are expected to exacerbate the formation of genetic structures and the decline in biodiversity. Indeed, it is expected that fragmentation and perturbation can reduce the effective population size within patches and increase the genetic differentiation between the populations [40]. Urbanization can break up once-contiguous populations and leave the population patches scattered around the town [41]. Relevant literatures showed that Wuhai is an industrial city with a short history of urbanization (about 60 years), and its industry is mainly coal-based energy industry [42], on the contrary, as a “living fossil” plant, TM has existed in this area for thousands of years, other studies have shown that there is considerable overlap between the distribution of coal resources and TM in Wuhai area [21], Therefore, the development of the coal industry and the process of urbanization are gradually splitting the TM habitat for a long time,  habitat fragmentation induced by human disturbance is affecting the genetic structure of TM population, conservation biology agrees that larger populations survive longer, small populations that are separated from the whole can easily shrink or even extinction, which were all detected in this species.


Combining genetic variation, gene flow, population genetic structure and IBD analysis, we concluded that the existing genetic variation and spatial genetic structure of TM population were caused by rapid development of urbanization and industrialization. Although the TM population maintained a certain level of genetic diversity and a high level of gene flow, current populations has shown subtle spatial genetic structure, Wuhai city is in the middle of the north and south TM population, to some extent, the dense urban population can affects the routes of pollinators, insect-mediated gene flow over several generations that generated the current genetic structure, it's a signal that TM populations disturbed seriously by human activity was beginning to gradually form genetic divergence. Compared with previous studies, our results of genetic diversity and population structure present a new perspective, and can provide valid information for TM management and conservation. To avoid genetic degradation and to maintain the TM population, we offer the recommendations for protective measures: (1) Mining of opencast coal mines in TM distribution areas is strictly forbidden, the establishment of new industrial zones in the TM distribution area is prohibited. (2) The 8 populations should be protected as whole units, as a bridge and link between the north and south populations, the HN population should be continuously monitored and protected, in addition, because the populations has edge effects, those populations near urban and industrial areas also needs to be key protected. (3) Germplasm resources were collected, seed Banks were established, and transplanting experiments were conducted to expand TM population, furthermore, as an insect-pollinated plants, and those pollinators should also be protected.

Materials And Methods

Sample Collection and DNA Extraction

In this study, we investigated roundly geographic distribution of TM before sampling, 144 individuals of TM were collected from all eight extant populations (The abbreviation of place name for populations were showed in Table 1). The sample collection was approved by the West ordos national nature reserve administration, Inner Mongolia Province, China. 18 individuals were chosen randomly in each population and avoiding clonal individuals, samples information and populations location were showed in Figure 1. Healthy leaves from 144 individuals were collected and were dried directly with silica gel in the wild, then taken back to the lab and frozen in -80℃ until DNA extraction.

DNA extraction of each individuals was carried out using DNAsecure Plant Kit (Tiangen Biotech, Beijing, CHN)following the manufacturer’s introduction, DNA quality and concentration were tested by Agarose Gel and NanoDrop, we selected 15 samples with the highest DNA quality from each population as experimental samples.

Preparation of libraries and sequencing

we adopted modified GBS methods (SuperGBS, following Qi et al.[43]) to construct libraries, briefly, extracted DNA from each individual was digested with both PstI-HF and MspI (New England Biolabs, NEB) for 2 h at 37 °C and subsequent 2 h incubation at 75 °C. The barcoded adapters and common adapter were respectively ligated on the PstI cut site and the MspI cut site of all samples by T4 DNA ligase (NEB), ligation was run at 22 °C for 2 h. Fragments smaller than 300 bp were removed using recovery system of improved magnetic bead. The recovered fragment was amplified by PCR using high-fidelity enzymes, benfore libraries were sequenced, the concentration of PCR product was tested by Qubit2.0 and it should be greater than 5ng/ul. Final libraries were sequenced using Illumina Hiseq Xten, PE150 Platform.

Cleaning of the raw reads and SNP/indel calling

To ensure the accuracy of analysis, ‘FastQC’(v. 0.11.4) [44] was usted to test the raw read quality, the adaptor and barcode of raw reads sequence were cleaned by the process_radtags program in the STACKS (v2.1) [45] software package. The restriction site sequence, base quality < 20; and the last 5 bp of raw reads that were more likely to contain errors was removed using the FASTX_trimmer program package (main parameter -f-l) in the FASTX toolkit (v0.0.1 4) (http://hannonlab.cshl.edu/fastx_toolkit/). Through the steps above, clean reads were obtained. Because the public databases does not contain full genome information for TM, and even there is also no the genome data of relative species of zygophyllaceae family to reference, under these circumstances, de novo generation of a GBS reference was constructed following [43]. Clean Reads were aligned against the GBS reference genome using bowtie2 (v2.3.4.1) software (main parameter -maxins 300 -no-discordant -no-mixed) [46], then genotyping was performed applying Unified program in the GATK (v3.8-1) software (Main parameters -dcov 1000-GLM BOTH) [47] to predict SNP&INDEL sites in samples, the predicted results are screened using the SelectVariants program of the GATK software (Key parameters: - restricelesto biallelic-select "QD > 10.0"), in order to reduce the error rate of detected SNP&INDEL, we used the software vcftools (v0.1.13) (main parameter  -maf 0.01 -minDP 4 -max-missing 0.8) to filter the obtained SNP typing results. The filtering conditions were as follows:(1) reads support (total depth, DP) is no less than 4; (2) eliminate the sites with minor allele frequency (MAF) less than 0.01; (3) eliminate the sites where the missing rate of SNP typing is higher than 20%.

Genetic diversity analysis

Several population genetics general statistics to describe genetic variation were estimated by R package genepop (Version1.0.5) as described [48], mean number of alleles (Na), effective number of alleles (Ne), observed heterozygosities (HO), expected heterozygosities (HE), polymorphism information content (PIC) can be calculated according to the formula [49]. Vcftools software (Version 0.1.14) [50] was used to calculate the nucleotide diversity (π).

Population structure, Phylogenetic trees and principal components analysis

For further elucidation of the genetic structure of eight TM populations, we used the PLINK (version 1.9) software (main parameter -indep-pairwise 50 10 0.6, 0.6 is r2 threshold) to filter the obtained SNP, the SNP without close linkage was selected, then we used the ADMIXTURE (version 1.3.0) software [51] to estimate the genetic structure, the pre-defined genetic clusters (K) was set from 2 to 10, and 10 different models were selected for repeated analysis for 10 times, the optimal number of K was determined according to Cross Validation Error (CV) to in ADMIXTURE analysis.

To better understand and describe the genetic structure, the obtained SNP markers were analyzed by principal components analysis with plink2 software (version: 2.0) [52], the two eigenvectors with the greatest influence are obtained.

Phylogenetic trees are tree branch structures representing near or far relationships between individuals, for further elucidation of genetic distance between samples and clusters, as a first step, each individual SNP marker was end to end, and if the corresponding locus was missing, the - was used instead. Neighbor-joining method  was used to build phylogenetic tree of all 120 individuals, Treebest software (v1.9.2) [53] was used to calculate the distance matrix, the reliability of phylogenetic tree was tested by bootstrap method with 1000 replicates.

Genetic differentiation and Gene flow pattern inference

Genepop program (Version1.0.5) in R soft was used to analyze Fst of each SNP locus in all individuals. We also analyzed the gene flow level among eight populations and the parameter Nm was used evaluated using formula Nm=(1Fst)/4Fst. The directions of the gene flow between different populations were estimated using MIGRATE (https://peterbeerli.com/migrate-html5/index.html) according to Beerli & Palczewski [54] and Blanco-Bercial & Bucklin [55]. The estimated models of gene flow directions were chosen based on the hydrography and prevailing wind of the region. We assumed five possible models: (i) adjacent, migration occur between adjacent populations only; (ii) full migration of north 3 populations (WJM, TST, QLS), full migration of south 4 populations (SZS, SDQZ, MHG, WD), Wuhai city to the north and south as a boundary, the intermediate HN population is a bridge; (iii) north to south, the direction of the prevailing wind in this area (pollen and seeds are dispersed by the wind); (iv) south to north, the direction of the Yellow River in this area (seeds may be dispersed along Yellow River); (v) panmixia, all samples were considered as a whole population (the movement of pollinators is random). For each model, marginal likelihood and the Bayes factor was used estimated possible direction of migration and the specific probability, the number of recorded steps in chain was set to 2000,000.

Correlation analysis between genetic distance and geographic distance

In order to understand whether there is a correlation between genetic distribution and spatial location of TG from different populations, Mantel test is performed using GenAlEx v6.5 [56] with 10,000 permutations.


SNP: single nucleotide polymorphism; HO: Observed heterozygosity; Na: mean number of alleles; Ne: effective number of alleles; HE: expected heterozygosities; PCA: Principal component analyses; PIC: Polymorphism information content; MAF: minor allele frequency; GBS: genotyping-by-sequencing



We thank all the faculty of the school of biological science and technology of Beijing forestry university. We thank Dr. Li mingyu for his assistance in mapping.

Authors’ contributions

DSB designed research and collect experimental materials, CJ performed research and analysed data, KHX contributed reagents or analytical tools, DSB wrote the first draft and all authors contributed equally to the revisions.


This work was supported by the Fundamental Research Funds for the Central Universities (No. 2019ZY32), the National Key Research and Development Program of China (No. 2017YFC0504403) and  the National Nature Science Foundation of China (No. 31972948). The funders didn’t play a role in study design and collecting materials, analysis, and interpretation of data and in writing the manuscript.

Availability of data and materials

The plant materials were collected from natural population in geographic distribution of TM. The raw fastq reads files can be accessed on NCBI Sequence Read Archive (SRA), accession nos. BioProject: PRJNA601311 ; BioSample: SAMN13870224.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no conflict of interest.


[1]. Robledo-Arnucio JJ, Klein EK, Muller-Landau HC, Santamaria L. Space, time and complexity in plant dispersal ecology. Mov Ecol. 2014;2(1):16.

[2]. Vranckx GUY, Jacquemyn H, Muys B, Honnay O. Meta-analysis of susceptibility of woody plants to loss of genetic diversity through habitat fragmentation. Conserv Biol. 2012;26(2):228-37.

[3]. Kozakiewicz CP, Burridge CP, Funk WC, Salerno PE, Trumbo DR, Gagne RB, at el. Urbanization reduces genetic connectivity in bobcats (Lynx rufus) at both intra-and interpopulation spatial scales. Mol Ecol. 2019;28(23):5068-85.

[4]. LaPoint S, Balkenhol N, Hale J, Sadler J, van der Ree R. Ecological connectivity research in urban areas. Funct. Ecol. 2015;29(7):868-78.

[5]. Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, at el. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One. 2011;6(5):e19379.

[6]. Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, at el. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One. 2008;3(10):e3376.

[7]. Altshuler D, Pollara VJ, Cowles CR, Van Etten WJ, Baldwin J, Linton L, at el. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature. 2000;407(6803):513-16.

[8]. Mondon A, Owens GL, Poverene M, Cantamutto M, Rieseberg LH. Gene flow in argentinian sunflowers as revealed by genotyping by sequencing data. Evol Appl. 2017; 11(2):193-204.

[9]. Massatti R, Doherty KD, Wood TE. Resolving neutral and deterministic contributions to genomic structure in Syntrichia ruralis (Bryophyta, Pottiaceae) informs propagule sourcing for dryland restoration. Conserv Genet. 2018;19(1):85-97.

[10]. Vidal MC, Quinn T W, Stireman JO, Tinghitella RM, Murphy SM. Geography is more important than host plant use for the population genetic structure of a generalist insect herbivore. Mol Ecol. 2019;28(18):4317-34.

[11]. Mizuno N, Yasui Y. Gene flow signature in the s-allele region of cultivated buckwheat. BMC Plant Biol. (2019);19 (1):125.

[12]. Dittberner H, Becker C, Jiao W, Schneeberger K, Hölzel N, Tellier A, at el. Strengths and potential pitfalls of hay‐transfer for ecological restoration revealed by RAD-seq analysis in floodplain arabis species. Mol Ecol. 2019;28(17):3887-901.

[13]. Shafer ABA, Peart CR, Tusso S, Maayan I, Brelsford A., Wheat, CW, at el. Bioinformatic processing of RAD‐seq data dramatically impacts downstream population genetic inference. Methods Ecol Evol. 2016;8(8):907-17.

[14]. Wang GM. The study on landscape pattern of Tetraena mongolica influenced by human disturbance in Wuhai (Unpublished doctoral dissertation). Inner Mongolia University. 2012.

[15]. Wang Y, Ma H, Zheng R. Studies on the reproductive characteristics of Tetraena mongolica Maxim. Acta Bot Boreal Occident Sin. 2000;20:661-5.

[16]. Xu Q, Jiang CQ, Liu SR, Guo QS. Study on pollination ecology of endangered plant Tetraena mongolica population. Forest Res. 2003;16(4):391-7.

[17]. Zhou ZG, Liu GH, Yang LX, Shi XG, Luo F. Study on rooting characteristics of hardwood cutting of tetraena mongolica maxim. J Desert Res. 2009;29(3):519-23.

[18]. Shi G, Ding L, Liu Q, Tang S, Duan H. Chemical constituents contained in tetraena mongolica. Zhongguo Zhong Yao Za Zhi. 2012;37(11):1579-80.

[19]. Wu Z, Wei W, Xu H, Zheng L, Ma C, Wang Y. Constituents from the leaves of Tetraena mongolica and their protective activity in HEK 293t cells damaged by CdCl2. J  Nat Prod. 2019;82(10):2707-12.

[20]. Shi S. Studies on eco-physiological adaptation mechanism and endangering mechanism of Tetraena mongolica Maxim. in different habitats (Unpublished doctoral dissertation). Inner Mongolia University. 2005.

[21]. Wang W, Hao W, Bian Z, Lei S, Wang X, Sang S, at el. Effect of coal mining activities on the environment of tetraena mongolica in wuhai, inner mongolia, china-a geochemical perspective. Int J Coal Geol. 2014;132:94-102.

[22]. Wang Y, Li M, Wu W, Wu, H, Xu Y. Cloning and Characterization of an AP2/EREBP Gene TmAP2-1 from Tetraena mongolica. Chinese Bull Bot. 2013;48(1):23-33.

[23]. Chen NM, Feng JC, Song B, Tang S, He JQ, Zhou YJ, at el. De novo transcriptome sequencing and identification of genes related to salt and PEG stress in Tetraena mongolica Maxim. Trees. 2019;33(6):1639-56.

[24]. Ge X, Yu Y, Zhao N, Chen H, Qi W. Genetic variation in the endangered Inner Mongolia endemic shrub Tetraena mongolica Maxim (Zygophyllaceae). Biol Conserv. 2003;111(3):427-34.

[25]. Ge X, Hwang C, Liu Z, Huang C, Huang W, Hung K, at el. Conservation genetics and phylogeography of endangered and endemic shrub tetraena mongolica (zygophyllaceae) in inner mongolia, china. BMC Genet. 2011;12:1.

[26]. Zhi Y, Sun Z, Sun P Zhao K, Guo Y, Zhang D, at el. How much genetic variation is stored in the endangered and fragmented shrub Tetraena mongolica Maxim? Peer J. 2018;21:6:e5645.

[27]. Jombart T, Devillard S, Balloux F. Discriminant analysis of principal components:a new method for the analysis of genetically structured populations. BMC Genet. 2010;11:94.

[28]. Wright S. Evolution and the genetic of population,Variability within and among natural populations. Chicago: University of Chicago Press. 1978;4:213-20.

[29]. Govindaraju D R. Relationship between dispersal ability and levels of gene flow in plants. Oikos, 1988; 52(1):31-5.

[30]. Zhao Y, Vrieling K, Liao H, Xiao M, Zhu Y, Rong J, at el. Are habitat fragmentation, local adaptation and isolation-by-distance driving population divergence in wild rice Oryza rufipogon. Mol Ecol. 2013;22(22):5531-47.

[31]. Browne L, Karubian J. Habitat loss and fragmentation reduce effective gene flow by disrupting seed dispersal in a neotropical palm. Mol Ecol. 2018;27(15):3055-069.

[32]. Ng KKS, Lee SL, Koh CL. Spatial structure and genetic diversity of two tropical tree species with contrasting breeding systems and different ploidy levels. Mol Ecol. 2004;13(3):657-69.

[33]. He Z, Zhai W, Wen H, Tang T, Wang Y, Lu X, at el. Two evolutionary histories in the genome of rice: the roles of domestication genes. PLoS Genet. 2011;7: e1002100.

[34]. Li Y, Zhou G, Ma J, Jiang W, Jin L, Zhang ZH, at el. De novo assembly of soybean wild relatives for pangenome analysis of diversity and agronomic traits. Nat Biotechnol. 2014;32(10):1045-52.

[35]. Karron J D. A comparison of levels of genetic polymorphism and self compatibility in geographically restricted and widespread plant congeners. Evol Ecol. 1987;l(1):47-58.

[36]. Huang CL, Chen JH, Tsang MH, Chung JD, Chang CT, Hwang SY. Influences of environmental and spatial factors on genetic and epigenetic variations in Rhododendron oldhamii (Ericaceae). Tree Genet Genome. 2015;11:823.

[37]. Hardy OJ, Maggia L, Bandou E, Breyne P, Caron H, Chevallier MH, at el. Fine-scale genetic structure and gene dispersal inferences in 10 neotropical tree species. Mol Ecol. 2010;15(2):559-71.

[38]. Moyle LC. Correlates of genetic differentiation and isolation by distance in 17 congeneric silene species. Mol Ecol. 2006;15(4):1067-81.

[39]. Liu GH, Zhou SQ, Thang L, Ren L. Study on the biological characteristics and the endangering factors of the Tetraena mongolica. J Inner Mongolia Forestry College. 1993;2:33-9.

[40]. Young A. The population genetic consequences of habitat fragmentation for plants. Trends Ecol Evol. 1996;11(10):413-8.

[41]. Lee JS, Ruell EW, Boydston EE, Lyren LM, Alonso RS, Troyer JL, at el. Gene flow and pathogen transmission among bobcats (Lynx rufus) in a fragmented urban landscape. Mol Ecol. 2012;21(7):1617-31.

[42]. Zhang Y. Population dynamics of endangered species, Tetraena mongolica in fragmentation habitats (Unpublished doctoral dissertation). Wuhan university. 2000.  

[43]. Qi P, Gimode D, Saha D, Schröder S, Chakraborty D, Wang X, at el. UGbS-Flex, a novel bioinformatics pipeline for imputation-free SNP discovery in polyploids without a reference genome: finger millet as a case study. BMC Plant Biol. 2018;18(1):117.

[44]. Andrews S. FastQC: A quallity control tool for high throughput sequence data. 2010; http://www.bioinformatics.babraham.ac.uk/projects/fastqc.

[45]. Catchen JM, Amores A, Hohenlohe P, Cresko W, Postlethwait JH. Stacks: Building and genotyping loci de novo from short-read sequences. G3 (Bethesda). 2011;1(3):171-82.

[46]. Langmead B, Salzberg SL. Fast gapped-read alignment with bowtie 2. Nat Methods. 2012;9(4):357-9.

[47]. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, at el. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297-303.

[48]. Rousse F. Genepop'007: A complete re-implementation of the genepop software for Windows and Linux. Mol Ecol Resour. 2008;8(1):103-6.

[49]. Botstein D, White RL, Skolnick M, Davis RW. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet. 1980 May;32(3):314-31.

[50]. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, Depristo MA, at el. The variant call format and vcftools. Bioinformatics. 2011;27(15):2156-8.

[51]. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19(9):1655-64.

[52]. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bende D, at el. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559-75.

[53]. Ruan J, Li H, Chen Z, Coghlan A, Coin LJ, Guo Y, at el. TreeFam: 2008 Update. Nucleic Acids Res. 2008Jan;36(Database issue):D735-40.

[54]. Beerli P, Palczewski M. Unified framework to evaluate panmixia and migration direction among multiple sampling locations. Genetics. 2010;185(1):313-26.

[55]. Blanco-Bercial L, Bucklin A. New view of population genetics of zooplankton: RAD-seq analysis reveals population structure of the North Atlantic planktonic copepod, Centropages typicus. Mol Ecol. 2016;25(7):1566-80.

[56]. Peakall R, Smouse P.E. GenAlEx 6.5: Genetic analysis in Excel. Population genetic software for teaching and research-an update. Bioinformatics. 2012;28:2537-9.


Table 1 The number of individuals and genetic diversity statistics for each population, observed heterozygosity (Ho), expected heterozygosity (He), polymorphism information content (PIC), efficient allelic number (Ne), nucleotide diversity (π).

populations acronym







































































Table 2 Genetic differentiation coefficient and gene flow between different populations. The lower triangle is the interpopulation genetic differentiation coefficient (Fst), and the upper triangle is the interpopulation gene flow.



















































































Table 3 Log probability of the data given the model (marginal likelihood, based on the Bezier approximation score) and corresponding Bayes factors. The most probable model considers the north tosouth model.


Log(marginal likelihood)

Log (Bayes factor)


1: Adjacent




2: full migration




3: north to south




4: south to north




5: Panmixia