Genome-wide assessment of genetic diversity and population structure in Magnolia odoratissima based on SLAF-Seq

Magnolia odoratissima is a highly threatened species with small populations and scattered distribution due to habitat fragmentation and human activity. The species is recognized as a Plant Species with Extremely Small Populations (PSESP) and is endemic to China. In the current study, the population structure and levels of genetic diversity of M. odoratissima in the five remaining natural populations and three cultivated populations were evaluated using single nucleotide polymorphisms (SNPs) derived from Specific-Locus Amplified Fragment Sequencing (SLAF-seq). A total of 180,650 SNP loci were found in seventy M. odoratissima individuals. The genome-wide Nei ’ s and Shannon ’ s nucleotide diversity indexes of the total M. odoratissima population were 0.3035 and 0.4695, respectively. The observed heterozygosity ( H o ) and expected heterozygosity ( H e ) were 0.1122 and 0.3011. Our results suggest that M. odoratissima has relatively high genetic diversity at the genomic level. F ST and AMOVA indicated that high genetic differentiation existed among populations. A phylogenetic neighbor-joining tree, Bayesian model – based clustering and principal components analysis (PCA) all divided the studied M. odoratissima individuals into three distinct clusters. The Treemix analysis showed that there was low gene flow among the natural populations and a certain gene flow from the wild populations to the cultivated population (LS to KIB, and GN to JD). In addition, a total of 36 unique SNPs were detected as being significantly associated with environmental parameters (altitude, temperature and precipitation). These candidate SNPs were found to be involved in multiple pathways including several molecular functions and biological process, suggesting they may play key roles in environmental adaptation. Our results suggested that three distinct evolutionary significant units (ESUs) should be set up to conserve this critically endangered species.


Introduction
Global biodiversity has drastically decreased over the last several decades due to anthropogenic activities such as deforestation, habitat fragmentation and climate change (Attard et al. 2016). Thousands of species have declined in numbers or gone extinct (Khan et al. 2016). Plant Species with Extremely Small Populations (PSESP), is a concept referring to species with low number populations and a narrow geographical distribution that has been disturbed and stressed by external factors over a long time. The number of individuals in a PSESP are smaller than the minimum required to prevent extinction (Ren et al. 2012). Small remaining populations, restricted habitat, serious human disturbance and extremely high risk of extinction are the key characteristics of PSESP, and nearly half of all PSESP face a high risk of extinction (John 2013). Precise protection strategies for the rescuing of PSESP are considered essential for ecological, economic, and human health, and this topic has been a major focus in the field of international conservation biology and restoration ecology. A conservation action concept focusing on PSESP was first proposed in 2012.

Magnolia odoratissima Law et R. Z. Zhou is a perennial, tree in the family
Magnoliaceae, and is endemic to southwest China. The flower and leaves of M.
odoratissima have different fragrances, and the species is used in landscaping, and planted in urban and rural areas, scenic spots, high-grade residential areas and private villas. It is considered to be an excellent landscaping tree species  and its flower extract (pure oil) is a natural fragrance worthy of development (Li et al. 1996); M. odoratissima has also strong wind and pollution resistance, and can be used in environmental purification.
M. odoratissima is an extremely endangered plant, and is a national second-class key protected wild plant in China. The species has a narrow distribution and is restricted to a few localities in Yunnan Province, China. An exhaustive field survey across the whole of the natural M. odoratissima distribution range of was conducted between 2017 and 2019. Six remnant populations were found and all were severely fragmented. Population sizes in these populations currently range from 1 to approximately 200 individuals (He et al. 2018). The species is experiencing a rapid demographic decline, mainly due to increasing anthropogenic pressures (e.g. deforestation and urbanisation) in its natural habitat. M. odoratissima is currently ranked as having a high conservation priority in China and considered to be a PSESP (Li et al. 2003) and it is therefore vitally important to develop conservation strategies for this species.
The wild population of M. odoratissima are relatively isolated and far from nature reserves, and the protection of these areas is not strong. Protection of this species has not therefore been studied. To date, the wild growth, seed dormancy and germination of M. odoratissima resources have been studied by He et al (2020), with the results suggesting that there are substances in the seeds that inhibit germination, and that the inhibition effect is significant (He et al. 2020). Chai (2010Chai ( , 2013 investigated the soil structure, population ecological characteristics and population pattern distribution characteristics of wild M. odoratissima populations, and analyzed the threats facing M. odoratissima from the perspective of ecology (Chai et al. 2013;Dong et al. 2010;Qi et al. 2010). Jin et al (2014) analyzed the genetic diversity of four populations of M. odoratissima using the random amplified polymorphic DNA (RAPD) technique, with the results suggesting that M. odoratissima has higher genetic diversity than other endangered and endemic plants in the Magnoliaceae (Jin et al. 2014). Xu (2016) analyzed the height structure, age structure and spatial distribution pattern of M. odoratissima, compiled a specific time life table and drew a survival curve (Xu et al. 2016). Yang (2008Yang ( , 2009 preliminarily discussed seedling and cutting propagation techniques in M. odoratissima (Yang 2009;Yang 2008).
However, although the research to date has made progress in the field of M. odoratissima conservation, there is still a lack of comprehensive research into genetic diversity and population genetics in this species.
Genetic diversity is a major focus of conservation genetics. The population structure, which refers to patterns of distribution of genetic diversity in plant populations, and the levels of genetic diversity can be influenced by a variety of biotic and abiotic factors. These factors include plant breeding systems, genetic drift, pressure from natural selection, gene flow, genetic mutations and anthropogenic destruction (Chang et al. 2020), and are closely related to the adaptability and evolutionary potential of the species. Therefore, assessment of genetic variation and the spatial structure of a species is an important premise for the exploration of various evolutionary factors, especially for rare and endangered species and species impacted by direct human interference. It is difficult to develop reasonable conservation and rescue measures in the absence of basic genetic information.
However, genome-wide single nucleotide polymorphisms (SNP) are more effective than traditional molecular markers at revealing genetic diversity and distinguishing genetic relationships (Lu et al. 2020;Zhang et al. 2018). SNP markers have a wide variety of applications in different plant species, including identification of plant varieties and cultivars, QTL analysis, construction of high-density genetic maps, and genome-wide association analysis (Delourme et al. 2008). At present, next-generation sequencing (NGS) technology can generate huge amounts of sequence data, which make the identification of high-throughput SNP markers at the species level. In order to reduce the cost of library construction when dealing with large numbers of samples, several methods of reduced-representation genome sequencing have been developed, including genotype-by-genotype (GBS) (Elshire et al. 2011), restriction-site associated DNA (RAD) sequencing (Bus et al. 2012), reduced representation libraries (RRLs) (Hyten et al. 2010), and single locus amplified fragment sequencing (SLAF-seq). Specific-Locus Amplified Fragment sequencing (SLAF-seq) is a simplified genome deep sequencing technology based on high-throughput sequencing.
A large number of SNPs are developed at a genome-wide scale by constructing an SLAF-seq library and screening for specific amplified fragments. SNP markers have been widely used in genetic relationship analysis, genetic map construction, genome-wide association analysis and gene mapping of plant germplasm resources, with the advantages of high accuracy, high flux and low cost (Liu et al. 2004;Liu et al. 2019;Singh et al. 2020).
In this study, seventy individuals from the five known remaining natural populations and three cultivated populations of M. odoratissima were collected and analyzed using SNP data derived from SLAF-seq. Key aims of this study included: (1) to evaluate population genetic diversity of M. odoratissima at the genomic level and to determine the genetic differentiation and population structure in and between these eight populations, in order to provide scientific theoretical guidance for the protection of M. odoratissima; (2) to reveal potential correlation between the genetic data and climatic variables and then to identify potential genes responsible for local adaptation .

Materials and Methods
Study species and sampling procedure M. odoratissima is an understorey tree species with hermaphroditic flowers, of temperate, evergreen broadleaf forests, and usually occurring in limestone forests at altitudes of 1300 m or above. Mature trees of M. odoratissima may reach 8 m in height and 9 cm in diameter at breast height. The bisexual flowers of the population cultivated in Kumming bloom in May, but in their original native habitat perpetual flowering is usual, and the fruits mature in September to October. This species has serious reproductive disadvantages, including low seed productivity and low germination rates. Seeds of M. odoratissima lack long-distance dispersal structures are dispersed by gravity and hence the vast majority of the seeds remain within 3 m of the mother tree.
This study was conducted in Wenshan Prefecture, Yunnan Province, China, which has an annual rainfall of approximately 1294 mm and an annual mean temperature of 5.9 °C . The topography of the study site is planar. From 2017 to 2020, the study site was subject to intensive demographic monitoring. During this period, the status of resources, habitat condition, companion species, population survival and age grade were assessed in a field survey by sampling approach across the whole area of distribution of M. odoratissima in Wenshan.
Nearly 600 individuals occur in the six populations in the study location. Only a single individual was found in the population at Boli Village, Xichou county. This could not be used in our population genetic analyses, so that this population was excluded from our study. From the other five populations, we collected new leaves from individuals and dried them immediately in fine-grain silica gel. Because of its fragrant flowers, M. odoratissima is also planted in gardens, so we also investigated three cultivated populations (Jindian Garden, Kunming Institute of Botany, and Yunnan Agricultural University). Approximately ten individuals were randomly selected from each of the study populations. To ensure the sample uniformity and to minimize the relationships between the sampled plants, an average distance between individuals of greater than 10 m was ensured. Plants with normal growth, no serious defects, no obvious diseases, no insect pests and fresh leaves were chosen for sampling. A total of 70 samples were collected. Summaries of the sampling and localities are given in Table 1.

High-Throughput Sequencing and SNP calling
Genomic DNA was isolated from young leaves of M. odoratissima using the cetyltrimethylammonium bromide (CTAB) method (Allen et al. 2006 genome was selected as the reference genome for enzyme digestion prediction. To get > 100,000 SLAF tags that were evenly distributed in the genome, two restriction enzymes, RsaI and EcoRV-HF, were selected. According to the selected optimal digestion scheme, the genomic DNA from each qualifying sample was digested separately. The obtained SLAF tags (414-464bp) were treated with a single nucleotide (A) at the 3'end, connected to the Dual-Index sequencing joint (Kozich et al. 2013), amplified using PCR, purified, mixed and gelled to select the target fragment, and then, if they passed the library quality inspection, sequenced on an Illumina platform.
In order to evaluate the accuracy and effectiveness of enzyme digestion efficiency, Oryza sativum ssp. Japonica was selected as the control for sequencing (Lu et al. 2020).
The raw reads generated from the sequencing were first processed by removal of the cohesion subsequence contained in the original reads, and by removal of low quality reads (quality scores < 20), and empty reads (containing only the adapter sequence). Based on sequence similarity, BWA software was used to cluster high quality paired end reads (Kent 2002). Sequences with more than 90% similarity between different individuals were identified as SLAF loci (Sun et al. 2013). The Genome Analysis Toolkit (GATK) (McKenna, et al. 2010) and SamTools (Li et al. 2009) were used for SNP calling, as their intersection is considered to represent reliable SNPs. For the phylogenetic analysis, SNPs with a minor allele frequency (MAF) ≥ 0.05 and integrity ≥ 50% were retained. were estimated using Arlequin (Laurent et al. 2007).

Population structure and analyses
A neighbor-joining phylogenetic tree was constructed using MEGA X software (Sudhir et al. 2018) with the following parameters, Kimura 2-parameter model, and 1000 bootstrap replicates. The population structure was analyzed in Admixture (Alexander et al. 2009), on the basis of the maximum-likelihood method. A number of populations (K) between 1 to 10 was tested, and each individual tree was assigned to its respective population according to the maximum membership probability. Genetic relationships among the studied individuals were also assessed using principal coordinates analysis (PCoA) in EIGENSOFT (Price et al. 2006) based on the Euclidian distances between individual genotypes.

Gene Flow among populations
Gene flow (Nm) at the species level was estimated using the software Genepop v4.0, and the pairwise Nm values at the population level were measured using the formula Nm = (1-FST) / 4 FST (Wright 1950) based on the FST values. TreeMix v1.13 (Pickrell and Pritchard 2012) was used to construct maximum-likelihood trees that best described the historical relationships between these populations and to infer migration events (mixtures) between them. We set the SNP block size parameter to 10.
TreeMix was run iteratively for values of the migration parameter (-5).

Climatic Association Analysis
First, the chosen 19 climatic variables from the studied sites were extracted from the WorldClim data set (http://www.worldclim.org/) interpolated to 30-arcsec (ca. 1 km) resolution using ArcGIS. Then, Two programs, BayeScan (Matthieu and Oscar 2008) and LFMM (Frichot et al. 2013), were used to detect outlier loci that were possibly associated with climatic variables. Candidate loci under natural selection can be identified from genetic data with BayeScan, using inter-population allele frequency differences.
The other software used, LFMM, can also be used for gene-climate association analysis. LFMM estimates the hidden impact of population structure, and permits the presence of background levels of population structure (latent factors). The detected SNPs that exhibit an association with the climatic variables were determined according to the z-score. Bonferroni adjustment was used on the z-score values for multiple tests. Markers with calibration of p-values ≤0.01 based on chi-square tests were considered to be significant. Putative functions for the identified outlier loci were annotated using the NCBI and UniProt databases.

SNP Detection
High-throughput sequencing based on SLAF generated a total of 229.  (Table S1). We obtained a total of 843,082 high-quality SLAF tags for the 70 samples, with an average coverage of 11.78x for each SLAF (Table 2). Of the SLAF tags, 287,843 were polymorphic. These polymorphic SLAFs contained 2,998,167 SNPs in total, and following application of the filtering criteria, 180,650 of these were utilized in the further analyses.

Genetic Diversity and Genetic Differentiation
The observed allele number ( (Table 3). AMOVA revealed that the greatest diversity was found within individuals is 45.12%, while the diversity found among individuals within populations was 8.68%, and a total of 46.20% of the genetic variation occurred among populations (Table 5).

Population Structure and Phylogenetic Analysis
Population structure also known as population stratification, refers to the existence of subgroups with different gene frequencies in the studied populations, where the individuals within the same subgroup are closely related but the subgroups are more distantly related. Population genetic structure analysis can give an estimate of the number of ancestors of the studied population and infer the origin of each sample. The population structure of the 70 samples in our study was analyzed based on SNPs using the ADMIXTURE software (Alexander et al. 2009). The number of ancestors was determined using numbers of subgroups (K) between 1 and 10 ( Figure   1A), with the value of K with the lowest predicted number of ancestors being chosen as the most likely. To generate an alternative view of population stratification, we used the population clustering program ADMIXTURE. The cross-validation error (CV) showed the lowest peak at K＝3, which indicated that the optimal number of genetic clusters comprising the 70 M. odoratissima accessions was 3 ( Figure 1B).
Thus, the 70 accessions were classified into three major subgroups ( Figure 1C Figure 2B). The first and third principal components were also able to distinguish between CZ, XQ and the other six regions to a great extent ( Figure 2C).
The first, second, and third principal components explained 56.29%, 3.52%, and 1.82% of the genetic diversity respectively ( Figure 2D).

Gene Flow among Populations
The pairwise population Nm values calculated from Wright's analysis indicated that the levels of gene flow between populations was quite low, with the lowest value being -12.883 between the GN and YNAU populations, and the highest being 2.597 between the JD and DM populations (Table 6).
To further describe the historical relationships between these populations and to infer migration events (

Associations between SNP Markers and Environmental Variables
Analysis of the associations between SNPs and environmental variables was conducted in the Bayenv2 and LFMM programs. A total of 387 and 14,369 SNP markers were identified as being significantly correlated with climate variables by Bayenv2 and LFMM, respectively. 36 SNP markers showed a significant correlation with climate variables in both of the analyses (Table 7). The highest degree of correlation was reached with "isothermality", which could be applied to 18 markers in our study. The other climate variables, altitude and latitude, correlated with between 1 and 9 markers..
A Blast search revealed that 25 related SNP markers were annotated, with molecular functions listed as enabling DNA binding and RNA binding, and enhancing the activity of RNA-directed, DNA polymerase, and aspartic acid type endopeptidases.
These markers also seem to be involved in biological processes that involve RNA dependent, DNA biosynthesis, DNA recombination and proteolysis (Table S2). (2)* suggests that the SNP showed an association with that specific climate variable

Genetic diversity in M. odoratissima populations
Genetic diversity may occur as the result of long-term accumulation of neutral substitutions during the evolution of species, or may also be a result of diversifying selection from different environments. Genetic diversity is also linked to the long-term survival and development of populations and species. It is widely believed that endangered and endemic plants must have low genetic diversity (Soltis et al. 1992), however some studies have shown that, in contrast, certain rare and endangered plants may have high genetic diversity (Jin and Li 2007;Matthew and Pamela 2000). odoratissima is a perennial tree, and the generations therefore overlap. Under the effects of long term natural selection, the genes that confer an advantage and therefore higher rates of survival are continuously accumulated and retained, thus forming a broad genetic basis.

Genetic differentiation and gene flow
According to Wright (Wright 1950), if the Nm is less than 1, genetic drift and differentiation may occur. If the FST is 0.05 or less, differentiation is thought to be negligible; an FST range of between 0.15 and 0.25 indicates a moderate level of differentiation; while FST = 0.25 means subpopulations were highly differentiated.
The results of our genetic analysis showed that there is high genetic differentiation Inter-population differentiation is related to the existing barriers to gene flow, as well as selection from environmental factors. Some studies have shown that variations in ecological micro-environments can lead to significant differences in genetic structure among different populations (Douglas and Lonnie 1990;Turkington and Harper 1979). Therefore, the levels of genetic variation observed between different populations may indicate the extent to which an organism adapts to different environments. Analysis of the associations between SNPs and environmental variables conducted in the Bayenv2 and LFMM programs suggested that there was a correlation between genetic markers and environmental variables, and therefore, the variation in ecological micro environments in the separate M. odoratissima populations may lead to significant differences in genetic structure in different populations.

Implications for conservation
Maintenance of genetic variation is one of the major objectives for conserving endangered and threatened species (Wallis 1996). Knowledge of genetic variation between and within populations provides essential information on the formulation of appropriate management strategies directed towards their conservation (Milligan et al. 2010). From the results obtained in this study it is possible to draw inferences on the conservation of M. odoratissima. Given that M. odoratissima has moderate to high genetic diversity, high population differentiation and low gene flow suggest that genetic drift is currently of great concern for this species. During the past few decades, this tree has been over-exploited for timber and ornamental use because of its fragrant flowers and beautiful shape. Many of the trees in the remaining wild populations are located near villages in limestone mountains and are easily accessible, which may heighten the possibility of genetic drift. Furthermore, population fragmentation is associated with decreases in population size and increasing geographic separation of the populations, which reduces rates of reproductive success (Suzuki et al. 2013). We found that gene flow between populations was low, and that the effects of the population structure and the size of current M. odoratissima populations were high, meaning that the species is likely to be susceptible to the loss of genetic polymorphisms in the future because of random genetic drift. The medium-term goal is therefore to establish a new generation of individuals to improve gene flow among populations, and thereby reduce genetic drift. Field surveys have shown that natural regeneration in M. odoratissima populations is slow, because of low fruit set and low seed germination rates. We therefore suggest that appropriate anthropogenic measures, including artificial pollination between populations, should be carried out to promote the natural regeneration of the population, so that the populations of M. odoratissima are able to recover and gene flow between the populations can be rebuilt.
Considering that the main threats driving declines in the remaining six M.
odoratissima populations are deforestation, over-exploitation of exotics, and habitat destruction from nearby residents, the most urgent measures required in the near future are in-situ and ex-situ conservation. There are six wild populations left, according to our field investigation, and we found that the extent of the M.
odoratissima distribution is relatively concentrated, so in-situ protection is not difficult and natural reserves should be easy to established in these wild populations.
Population structure analyses showed that the individuals of M. odoratissima clustered naturally into three groups, so we can look them as three distinct evolutionarily significant units (ESUs), and in this case, one representative population from each ESUs could be selected and set up as a key nature reserve. Using the information both from our field work and genetic analyses, we suggest that However, wild populations CZ and XQ display a certain degree of genetic diversity that is not represented in the cultivated populations, and ex-situ conservation of representative individuals from these populations should be implemented as soon as possible.
On top of the protection of existing living plants, another strategy for propagating future generations is to collect seeds in order to cultivate young seedlings and establish breeding in an off-site location. Germplasm resources can also be propagated by grafting, and genetically dissimilar individuals can be crossed by artificial pollination to produce highly heterozygous seedlings. Furthermore, in view of the difficulty of colonization and reproduction in this species, future research should encompass somatic embryogenesis for artificial reproduction.

Conclusions
Information on the genetic diversity, genetic structure, and demographic history of endemic species can aid the development of appropriate conservation and management strategies. In the present study, the genetic variation, genetic

Conflicts of Interest:
The authors declare no conflicts of interest.