DNA barcode reference libraryconstruction and population genetic diversity and structure analysis of Amomum villosum Lour. in Daodi production area

Background: The Amomum villosum has the situation that it is inferior and other other varieties are used as A. villosum in the market. In order to develop and utilize the genuine medicinal materials A. villosum, this experiment aims to carry out the identication and research of variety of the A. villosum and analyze its genetic diversity, constructing the DNA barcode database of the genuine medicinal materials A. villosum in Guangdong Province and providing recommendations for populations planting, which will be critical to the further research of A. villosum. (2) Methods: A total of 141 samples of A. villosum were analyzed by DNA barcoding to construct DNA barcode database. The genetic diversity of A. villosum sampled from 7 populations in Guangdong Province was detected based on ISSR molecular marker technology. (3) Results: The success rates of PCR amplication and sequencing of ve barcodes of A. villosumwas rbcL> ITS > ITS2 >psbA-trnH>matK. 141 samples of A. villosum from 7 populations in Guangdong Province were used to construct a reference DNA barcode database containing 531 sequences. The results of genetic diversity were as follow, the number of alleles Na ranged from 1.2879 to 1.7121, the effective number of alleles Ne ranged from 1.1848 to 1.4240, the gene diversity index (H) ranged from 0.2536 to 0.1117, and the Shannon index (I) ranged from 0.3816 to 0.1658, whichindicatedthegenetic diversity of A.Villosum was rich. The total genetic diversity among the 7 populations (Ht) was 0.3299, the genetic diversity within the populations (Hs) was 0.1819, and the gene differentiation coecient (Gst) was 0.4487. AMOVA showed that the genetic variation within the populations and the genetic variation between the populations accounted for 68.74% (P<0.05) and 31.26% (P<0.05) respectively, indicating that the genetic variation of A. villosum was mainly within the populations. The gene ow Nm was 0.6143.The genetic distance of the 7 populations was 0.0844 - 0.3347, and


Introduction
Amomum villosum Lour. is a medicinal plant of Zingiberaceae family mainly grown in southern China. It's ripe and dried fruit Fructus Amomi is the famous traditional Chinese medicine (TCM) with the effects of dampening appetite, warming the spleen to stop diarrhea, regulating qi and relieving the fetus and has been used for thousands of years. Modern pharmacological studies show that, Fructus Amomi has great activities of anti-ulceration, anti-diarrhea, anti-in ammatory and antimicrobial [1]. In addition, Fructus Amomi is also used in food, liquors, and tea as the health product and condiment. Yangchun Citylocated in Guangdong Provinceis considered the Daodi (genuine) production area of Fructus Amomi for its high quality. With the rapid development of the city as well as traditional Chinese medicine business, the habitat of A. villosum has been frequently destroyed that seriously threatened its germplasm resources. In 2016, Fructus Amomiwasselected as one of the eight legally protected TCM variety in Guangdong Province.
The genetic diversity of a species is the basis for its survival and evolution, which is of great signi cance tothe analysis of evolutionary polymorphism, genetic relationship, optimization of germplasm resources and protection of populations [2]. Polymerase Chain Reaction(PCR)-based molecular markers have been widely used in the analysis of plant genetic diversity [3]. Among them, Inter-Simple Sequence Repeat (ISSR) is a fast and e cient marker, with the characteristics of high polymorphism, high reliability, low cost, and does not require pre-determination of target sequence information [4,5], which is widely used in germplasm identi cation and genetic diversity analysis [6]. Another marker, DNA barcodes, proposed in 2003 can not only be used in biological identi cation, but also in genetic diversity analysis [7] .
In the current study, ISSR and DNA barcoding markers were used to investigate the genetic diversity of seven populations of A. villosum in its Daodi production area, Yangchun City of Guangdong Province. Five barcodes ITS2, psbA-trnH, ITS, matK and rbcL were ampli ed and DNA barcode reference libraries were constructed. This study will provide insights into the identi cation, conservation, domestication, and breeding of A. villosum.

2.1.DNA barcode reference library construction
We extracted the genomic DNA from 141 samples of A. villosum. The OD260/280 was 1.76-1.98 for all the DNA samples and the concentration was 73.70-1294.80 ng/µL. Five DNA barcodes of all the samples were ampli ed and sequenced. The PCR ampli cation and sequencing results were shown in Table 1. It showed that the success rate of sequencing for each barcode wasrbcL (100.00%) > ITS (98.58%) > ITS2 (95.04%) > psbA-trnH (53.90%) > matK (29.08%). The ranking of the success rate of PCR ampli cation was consistent with that of sequencing. Thus, these sequences constructed the DNA barcode reference library of A. villosum in Guangdong Province. We analyzed the sequences of each barcode we obtained. Sequence characterization for each barcodes was shown in Table 2. All the sequences of the ve barcodes had no variation sites, showing a strong conservation. ITS2 had the shortest sequence length and the highest GC content. The sequences of each barcode were shown in Fig. 1 to Fig. 5.  Table 3.   Table 5. Total genetic diversity (Ht) of the 7 populations was 0.3299, while the within population genetic diversity was 0.1819. Gene differentiation coe cient (Gst)was 0.4487, indicating that 55.13% of the the genetic variation was existed withinpopulations. The result was similar to that of molecular variance analysis (AMOVA), which showing 68.74% (P = 0.001) genetic variation was in within populations while 31.26% (P = 0.001) was between populations (Table 6). Additionally, the gene ow (Nm) among different populations was 0.4487.

2.4.Genetic distance, genetic identity and cluster analysis
Genetic distance and genetic identity are the main indicators to examine the degree of genetic differentiation and the relationship between groups [8]. The genetic distances of 7 populations were between 0.0844 and 0.3347, and the genetic identity were between 0.7156 and 0.9191 (Table 7). Among them, the smallest genetic distance was between ZJD and TK population (0.0844), and the largest one wasbetween XFC and YC population (0.3347). Mantel test carried out with NTSYS-pc 2.0 indicated that the genetic distance and geographical distance were not signi cantly correlated(r = 0.02698, P = 0.5504) (Fig. 6). The UPGMA clustering map of populations based on genetic similarity coe cient was constructed using the data of Nei's genetic identity and genetic distance of 7 populations (Fig. 7). 7 populations were divided into three groups at the similarity coe cient of 0.84. Three populations ZJD, TK and ZY formed one group. Three populationsGY, MM and YC formed another group. One population XFC formed a single group. The results of PCoA based on the unbiased pair Фst matrix of Nei were consistent with UPGMA cluster analysis (Fig. 8).

Discussion
In this study, ve DNA barcodes ITS2, psbA-trnH, ITS, matK and rbcL were ampli ed and sequenced from 141 individuals of 7 A. villosum populations and nally 531 sequences were obtained. Thus a local DNA barcode reference library of A. villosum in its Daodi production area was constructed. Many DNA barcodes reference libraries have been constructed for the purpose of a more rapid and accurate species identi cation [9]. And DNA barcoding has been used in A. villosum identi cation [10][11][12].COI was an e cient species identi cation tool and frequently used in genetic diversity analysis of animals [13][14][15][16][17]. In plants, however, low substitution rates of mitochondrial DNA have made it unsuitable, and some other barcoding regions were searched as alternatives just like we used in the current study [18][19][20]. An ideal DNA barcode should be easily retrievable and bidirectionally sequenced, and provide maximal discrimination among species [21]. Among the ve barcodes, the highest PCR ampli cation and sequencing success rate for A. villosum was rbcL. For a more comprehensive assessment of discrimination power of DNA barcodes, there needs more investigations of the DNA barcodes with adulterants of A. villosum included.
We aligned the sequences within the DNA barcodes, and no variation sites of the ve barcodes were found. Therefore, genetic diversity could not be analyzed by these DNA barcodes. The main reason for Intra-speci cdivergence could occur at a very high rate within geographically isolated populations [22]. But genetic diversity at the sub-speci c level is best explored with a multi-locus approach such as ngerprinting techniques [23]. Consequently, we may say that ITS method ismore suitable for genetic diversity analyses of population with wide geographic areas [24].
For species identi cation, geographic structure and plant diversity of the populations might be the problems for the barcoding approach and these problems have to be dealt at the library construction stage [25]. Here, no polymorphisms were examined in any DNA barcode sequence indicating that these DNA barcodes were suitable for A. villosum identi cation. How much variation is actually needed to separate species is not known with certainty because intra-speci c sampling has generally been limited to narrow geographic locales[26].
Molecular markers have been used extensively to determine genetic diversity and genetic relationships in plant science [27,28]. Moradkhani etal. [29] expressed that the ISSR marker was used among marker systems as a desirable marker in a wide range of genetic variations in various plants. In the ISSR marker, most ampli ed fragments were between 200 and 1500 bp.
The richer the genetic diversity of species, the stronger the ability of species to adapt to the natural environment. The level of genetic diversity of plants can be in uenced by a variety of factors, including the breeding system, the mechanism of seed transmission, geographical distribution, and natural selection [30]. The species with high genetic variation can resist the survival pressure caused by various environmental changes. The loss of genetic diversity will reduce the ability of species to adapt to environmental changes and affect the survival ability of species. Therefore, it is helpful to evaluate the genetic diversity level of A. villosum populations to analyze its evolutionary potential and provide reference for the identi cation, preservation and utilization of the germplasm resources of A. villosum. This provides a reference for increasing the genetic diversity and planting of A. villosum in different populations.
According to the ISSR markers, there was a high level of inter-population genetic variability and a relatively low level of genetic diversity within populations. Then we analyzed the genetic diversity and genetic structure of A. villosum populations by ISSR marker through the whole genome. Genetic diversity parameters indicated that the genetic diversity of germplasm materials of A.villosum in Yangchun was relatively rich (PPB = 47.19%, H = 0.1820, I = 0.2689).The results of Gst (55.13%) and AMOVA (68.74%) showed that more genetic variations existed in within populations.
Our results show that ISSR markers can effectively reveal the polymorphism among materials. Genetic diversity also differed somewhat within the 7 populations in this study. ISSR markers indicated that population ZY had the highest genetic diversity and TK had the lowest. This variation may be due to human activity, random genetic drift and/or inbreeding variation.
The higher the genetic differentiation index between populations, the more obvious the differentiation between populations and the more genetic difference between populations.Wright et al. [31]believed that the Gst value of genetic differentiation coe cient is between 0 and 0.05, and the genetic differentiation of the populations is weak; between 0.05 and 0.15, the genetic differentiation of the populations is moderate; between 0.15 and 0.25, meaninga large genetic differentiation of the populations; when the Gst value is higher than 0.25, the differentiation is extremely large. According to Nei's analysis of genetic diversity, the Gst value among A. villosum populations was 0.448, which was greater than 0.25, nding that the genetic differentiation between the populations was extremely large. AMOVA analysis showed that genetic variation within populations of A. villosum accounted for 68.74% (P < 0.05) of the total genetic variation in the populations, and genetic variation among populations accounted for 31.26% (P < 0.05), indicating that most of the genetic variation of A. villosum occurred within the populations. Gene ow is the movement of genes within and between populations, and its intensity has an important effect on populations differentiation. In this study, the gene ow Nm between different populations was 0.6143. According to Slatkin[32], the fraction of Nm > 1 between any population re ects that it is resistant to the in uence of genetic drift, with su cient communication and no obvious differentiation, so it can prevent populations segmentation. Therefore, the Nm value of 0.6143 indicated that genetic drift was the main factor affecting the genetic variation between populations and the genetic communication between populations is di cult, and high genetic variation was maintained within the population, which can be considered as an independent population. It was also con rmed by the genetic differentiation index between populations. The similarity and genetic relationship between plant populations can be expressed through genetic distance. Some scholars believe that genetic distance and geographic distance have a positive correlation [33]. While some scholars also believe that geographic distance and genetic distance are not signi cantly correlated [34].
The genetic distance of 7 populations was between 0.0844 and 0.3347, and the genetic similarity coe cient was between 0.7156 and 0.9191, indicating that the kinship of the populations was relatively close. If gene ow and seed transmission through the mating system are the main causes of populations variation, the closer the geographic distance between populations, the smaller the genetic differentiation. However, the result of the Mantel test indicated that the distribution of genetic diversity among populations may not be explained by obvious geographic distance and this results can be explained by enhancing the geographical distribution of gene ow, therefore, we can analyze the results through the grouping situation generated by the PCoA diagram and UPGMA. The populations in this study were clustered according to the similarity of habitats, regardless of geographic location, and populations with similar habitats are clustered together rst, and 7 populations were clearly divided into three major clusters: ZJD, TK and ZY were clustered together, GY, MM and YC were clustered together, while XFC was a separate cluster. It could be an introduction problem. Isolates in different groups had a similarity range of 78-92%, this high level of genetic diversity can be obtained through a series of evolutionary processes, including mutation, recombination and migration. The genetic level among populations was consistent with the results of PCoA and UPGMA cluster analysis revealed by ISSR markers.
In this study, ISSR marker technology is used to preliminarily analyze the genetic relationship of 7 populations by analysing band polymorphism, populations polymorphism, populations clustering based on genetic distance, populations genetic distance and genetic consistency, clustering result and PCoA, which can all verify that 7 populations have certain genetic diversity and ability to resist external invasion, which is undoubtedly a good news for the protection and cultivation of germplasm. It provides a theoretical basis for further research on the classi cation of A. villosum populations and lays a theoretical foundation for the protection and sustainable utilization of germplasm resources of Southern Medicine A. villosum.

Plant Material Sampling
A total of 141 samples of A. villosum were collected from 7 populations in Yangchun City, Guangdong Province from August to November 2018. The sampled plants were identi ed by Huang Zhihai, the chief Chinese pharmacist of the Second Clinical College of Guangzhou University of Chinese Medicine. Fresh and healthy leaves were moved from the plants, dried and preserved in silica gel right now in the eld, thenstored in an ultra-low temperature refrigerator (-80 ℃) when came back to the laboratory. Detailed information and geographic location of samples see Table 8 and Fig. 9. The total DNA was extracted using Tiangen DP305 plant DNA kit method. The NanoDrop2000 ultra-micro ultriolet spectrophotometer was used to determine the DNA concentration and purity.
The PCR ampli cation reaction system of the experiment contained 2 Taq PCR Mix 12.5 µL, forward primer (2.5 µM) 1.0 µL, reverse primer (2.5 µM) 1.0 µL, genomic DNA 2.0 µL and added up to25 µL with ddH 2 O. The primer sequences and ampli cation conditions of different DNA barcodes were shown in Table 9. All ampli cation reactions were completed on the ProFlex PCR instrument (Life Technologies, USA). PCR products were sent to Shanghai Meiji Biotechnology Company Guangzhou Branch to be sequenced.

Data analysis
The two-way sequenced peaks of DNA barcodes were evaluated and assembled by CondonCode Aligner v8.0.1 software. Low-quality areas at both ends of the assembled sequences were removed. ITS2 barcodes were annotated by cutting off the conserved 5.8S and 28S motifs based on HMM [23] and the ITS2 database [24]. Mega6.0 software was used to align DNA barcode sequences and calculate sequence statistics including the base composition ratio, GC content, heterotopic site information, conservative site and parsimony informative sites. Haplotype sequences for each barcodes were exhibited in the twodimensional code picture. In the picture, each vertical line represented a base , and the two-dimensional code on the right could be scanned directly read the DNA sequence.
Reproducible ISSR-PCR bands weredetermined with the help of the GelPro32 software and manual correction. These clear bands were scored as either present (1) or absent (0), thus generating an ISSR phenotype data matrix. And the data matrix was imported in Popgene32 software to analyze genetic diversity and genetic structure. Genetic diversity parameters included percentage of polymorphic sites (PPB), number of alleles (Na), effective number of alleles (Ne), Nei's gene diversity index (H) and Shannon's polymorphism information index (I) were calculated. Genetic structure parameters included Nei's gene differentiation coe cient (Gst), total population genetic diversity (Ht), intra-group genetic diversity (Hs) and gene ow (Nm) were calculated. GenAlEx 6.502 software was used to estimate the components of genetic variance within and among populations by analysis of molecular variance (AMOVA) and to assess the correlation between population genetic distance and geographic distance byMantel tests. Genetic distance and genetic similarity coe cient among populations were calculated and a UPGMA dendrogram was constructed by using NTSYS 2.10e.

Conclusion
A total of 141 samples of A. villosum from 7 populations in Guangdong Province were used to construct a reference DNA barcode reference library containing 531 sequences. On the anther hand, the 7 populations were signi cantly grouped in the cluster analysis and the genetic level of each population from high to low was as follow: ZY > ZJD > GY > MM > YC > XFC > TK. Based on the above research results, the following planting recommendations for A. villosum are proposed: priority should be given to the populations ZY, ZJD and GY with rich genetic diversity, in order to preserve as much genetic diversity as possible. At the same time, considering the signi cant genetic differentiation between A. villosum, in-situ conservation should be stepped up for every existing population, and it is recommended to add a protected area in Yangchun City.
A high level of genetic diversity is very important for the long-term survival of the species. A. villosum, which is used for both medicine and food from Zingiberaceae, has a very narrow distribution range due to its special requirements for the growth environment. This study found that the genetic diversity of A. villosum is relatively rich. It also con rms from the molecular level that the Yangchun area is the origin of A. villosum. But currently, A. villosum is facing many problems such as high incidence of pests and diseases, unstable yield, lack of cultivation management, di culty in breeding seedlings, and so on.
Based on the results of this research, establish a germplasm resource nursery for A. villosum, extensively collect germplasm resources, and carry out research on excellent germplasm selection, seedling breeding, and high-yield and high-quality cultivation techniques of A. villosum, which is crucial to the protection and utilization resources of A. villosum.

Declarations
Ethics approval and consent to participate: Not applicable.
Consent for publication:The authors agree to publish this research.
Availability of data and materials:The extracted features datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request. All the other data generated or analyzed during this study are included in this article.
Competing of interests: The authors declare that the research was conducted in the absence of any commercial or nancial relationships that could be construed as a potential con ict of interest.  Correlation of geographic distance and genetic distance