Genetic Diversity and Population Structure of Dendrobium Nobile in Southwest of China Based On Genotyping-by-Sequencing

Dendrobium nobile Lindl. is one of the most important Orchid plants worldwide. The genotype-by-sequencing (GBS) method has now been widely used to access genetic diversity because of its high-throughput and cost-effective in molecular markers. The goal of this study was to employ the GBS technique for diversity evaluation of D. nobile and determine genetic differences between populations. A total of 129 accessions of D. nobile collected originally between 2019 and 2020 from 10 imitation-wild cultivated populations growing in Sichuan, Guizhou and Yunnan of southwestern China were sequenced, a total of 135G clean reads and a total of 836,786 SNPs of high quality data was yielded and were used for nal analysis of genetic diversity and population structure. The quality value 20(Q20) ≥ 92.61%, the quality value 30(Q30) ≥ 82.38%. The GC contents distributed between 37.58% and 38.82%. It was also found that more transitions than transversions, and the ratio of transition/transversion varied from 1.804 to 1.911. By the methods of STRUCTURE, the most appropriate number was found to be k=3, all accessions of D. nobile were classied into three groups, excepts for 14 accessions belonging to admixed group. Phylogenetic tree and principal component analysis (PCA) were consistent with the result. The rst two principal components explained a total of 23.25% of the variation by PCA. The genetic diversity of ML population showed the lower genetic diversity as indicated by the effective number of alleles (Ne) = 1.287, polymorphism information content (PIC) = 0.141, and Shannon's information index (I) = 0.205, while WT population showed slightly higher genetic diversity by the Ne =1.512, PIC =0.256, and I =0.360. ML population and other nine populations (FB, FM, FX, LJ, SJ, SP, WL, WT and XM) were the most divergent between them respectively owing to all pairwise Fst values above 0.25, while FM population and FX population were considered identical because the pairwise Fst value was 0.0 between the two populations. Correlation analysis showed that highly signicant correlation was observed between genetic distance and actual geographical distance (r = 0.854, P < 0.0001), indicating that the genetic differentiation of the 10 D.nobile populations conformed to the geographical isolation model. Analysis of molecular variance (AMOVA) revealed that the genetic variation was greater within populations (87.8%) than among populations (12.2%). This conrmed that intra-population variation was the main source of genetic variation in 10 D. nobile populations. The results also showed that Nm = 1.799 > 1, indicating that there was gene exchange between different populations. Analysis of unweighted pair-group method with arithmetic mean (UPGMA) suggested that the 10 populations were classied into three groups (Group I, Group II and Group III), Group III could be further divided into two subgroups (Group IIIa and Group IIIb). The results will not only provide valuable information for the level of genetic diversity of D.nobile growing in southwestern of China but also help for formulation of strategies for resource protection and utilization. Moreover, GBS appears as an ecient tool to detect intra-population variation. STRUCTURE analysis was used to study the population structure of D. nobile based on 836,786 high quality SNPs from the 129 accessions. STRUCTURE software was run and the cross validation error (CV error) was calculated with k ranging from 1 to 12 to determine the optimal k value and the number of groups (k) was plotted. The results showed that (Fig. when the k value is from 1 to 2, the CV error value decreased rapidly, at k = 3, a distinct inection point was observed, the CV error value decreased to the minimum value, and then gradually increased to 8, after a rapid, gentle and linear rising respectively, nally the CV error value reached the maximum value at k = 12. So the most appropriate number was found to be k=3, which produced the lowest CV error value compared to other k values.


Introduction
Dendrobium nobile Lindl., an epiphytic orchid, is found to be distributed in tropical and subtropical regions of Asia, including Thailand, Laos, Vietnam, Sikkim, Bhutan, Burma, and China. In China, it is found to be present in southern regions of the Tsinling Mountain, such as Yunnan, Guangxi, Guizhou, Sichuan, and Hubei provinces (Tsi et al. 1999). It is one of the most important and representative species in the genus Dendrobium with approximately 1500 species in the worldwide, and is considered as a favorite ornamental plants, known for unique type, bright color and long-lasting owers. In addition, D. nobile plants has been used as precious and traditional Chinese medicine, which has the special e cacy of enhancing immunity, treating neurological diseases, digestive system diseases, ophthalmic diseases and anti-tumor owing to containing a variety of bioactive substances such as dendrobine, polysaccharide, avonoids, and etc (Yang et al. 2014; . It is listed in Pharmacopoeia of the People's Republic of China (Chinese pharmacopoeia Committee 2020). Therefore, D. nobile has high economic value, ecological value and scienti c research value.
However, owing to the pressure of human activities, over-exploitation, natural habitat destruction and fragmentation, like other Dendrobium plants, D. nobile is facing the serious threat of endangered and is contained in Appendices I and II of Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) . In order to protect wild resources of D. nobile, the plants have been cultivated in China, particularly in southwestern regions of China for the market demand. Despite a long time of cultivation, few reports are yet available on genetic variation and population structure of D. nobile. Diversity in cultivated population is crucial for genetic improvement and for the related ornamental industry (Satya et al. 2015; Kang et al. 2015). It is necessary to study population structure and access the level of genetic diversity of the species. An analysis of genetic diversity using molecular marker is a trusted and reliable approach which could provide useful baseline information of various plants or population (Bhattacharyya et al. 2013).
Molecular markers have been developed and a variety of molecular markers have been used to detect genetic diversity and genetic relationships among species in the genus Dendrobium to date, including random-ampli ed polymorphic DNA (RAPD) ( (Feng et al. 2014), inter primer binding repeats site (iPBs) (Cui et al. 2020), target region ampli cation polymorphism (trap) (Feng et al. 2015), sequence-characterized ampli ed regions (scar) (Zheng et al. 2021), these markers are based on electrophoretic map. Other markers are based on sequence differences, including nuclear genome such as ITS, chloroplast genome, including matK, rbcL, psbA-trnH and mitochondrial genome (Ye et al. 2017;Wang et al. 2018;Nguyen et al. 2020). These studies showed that highly genetic differentiation and rich genetic diversity existed among different Dendrobium species. The revealed variabilities have allowed for better understanding of the extent of diversity present in the genus.
Efforts have also been made to assess the genetic diversity within species in the genus Dendrobium using different molecular markers. These studies revealed that genetic diversity and the variation within populations was greater than among populations in D. o cinale Zhang et al. 2019;Xie et al. 2020) and in D. moniliforme (Ye et al. 2015). While most of the genetic variation found to be existed among populations in D. mbriatum (Ma et al. 2009). Some studies aimed at understanding the genetic diversity of D. nobile using molecular markers for the analysis of population structure, and genetic variation had been reported in natural populations. The results showed that the genetic variation among natural populations from Northeast India was higher than within populations by Bhattacharyya et al. using SCoT (2013), RAPD (2015), ISSR and DAMD (2015) markers respectively. Other results also showed higher genetic differentiation among populations from Qinba mountain area in China using ISSR marker ). Whereas the genetic variation of D. nobile within populations from Yunnan Province, China was higher than among populations, using DALP marker (Zhang et al. 2013). In these studies, the natural population of D. nobile was mainly taken as the research object, and the sampling range were respectively located in Northeast India, Yunnan province or Qinba mountain area of China. However, the research on genetic variation of cultivated population was rarely reported.
In recent years, a helpful tool to enhance genetic analysis has emerged called genotyping-by-sequencing (GBS) , which is a new method based on next-generation-sequence (NGS) (Zhu et al. 2019) and massive sequencing which is able to discover a large number of single nucleotide polymorphisms (SNPs) (Wang et al. 2017). The main advantage of the method over previous approaches is that they can still work in the absence of a reference genome (Perea et al. 2016). It has dramatically reduced costs and time and provides a rapid, high-throughput and cost-effective tool for exploring plant genetic diversity on a genome-wide scale (Taranto et al. 2016). Therefore, GBS technology has been widely used for population structure and genetic diversity studies in various plant To our knowledge, no population structure, and genetic diversity studies were available of D. nobile in previous study using GBS. In this study, we gathered 129 D. nobile accessions from 10 cultivated populations of Yunnan, Guizhou, and Sichuan provinces in southwestern China. The objectives of our study were to apply GBS technology (1) to discover SNPs for D. nobile species, (2) to assess the level of genetic diversity in a set of 129 accessions, (3) to evaluate the utility of the GBS in the genetic diversity analysis of D. nobile, an endangered medicinal plant.

Plant Materials
A total of 129 accessions of D. nobile were analyzed in this study. All samples were collected originally between 2019 and 2020 from 10 imitation-wild cultivated populations growing in Sichuan, Guizhou and Yunnan of Southwestern China. The species collected was identi ed by authors. The GPS coordinates of the 10 populations were recorded. Young leaves which were fresh, healthy and diseasefree were cut off from randomly selected plants and immediately put into self-sealing bags containing dry Silica Gel, then they were taken back to the laboratory and stored in the refrigerator at -20 C prior to use. Provenances of the 10 populations used in this study were listed in Table 1.

Genotyping by Sequencing
Quali ed DNA was rst digested using the restriction enzymes (PstI, MspI), and barcoded adapters were ligated to the digested fragments, then PCR ampli cation was carried out, the PCR products were pooled, fragments between 350 and 600 bp were isolated from agarose gel and puri ed, the library was constructed. After the library was tested to be quali ed, paired-end sequencing was performed on the Illumina novaseq6000 platform at Huazhi Biotech Co., Ltd. in Changsha, China.
All the raw pair-end sequencing data in FASTQ format were deposited into the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) with BioProject accession PRJNA747120. The original data were split, low-quality reads were removed, and clean reads were nally generated, which was a total of 135 G. The quality of all sequencing was evaluated using fastqc version 0.11.8 software. The results showed that all samples had no quality problems and the sequencing quality was good. 10000 reads were randomly selected from the FASTQ le of sequencing data and aligned to sequencing in NCBI data using blast version 2.2.26 software (Altschul et al. 1990) to detect the pollution of sequencing.

Data Analysis
Based on the ltered SNPs, the population structure of 129 accessions was conducted using Admixture software, the number of subgroup varied from 1 to 12, the output of STRUCTURE was evaluated to determine the most appropriate k value, which will exhibit the low cross-validation error compared to other k values. Samples were considered to belong to one of the subgroup when the value of the corresponding membership coe cient (Q) was higher than 0.6, if not, the sample was considered as admixed at assigned k.
The genetic distance matrix was calculated using plink software and phylogenetic tree was constructed using Neighbor Joining algorithm method. The entire accessions were divided into subgroups based on the major nodes of the phylogenetic tree.
Principal component analysis (PCA) was performed to assess genetic distinctness and redundancy, and to assess the sample association plots of the rst two resulting principal components using GCTA version 1. The correlation coe cient between genetic distance and actual geographical distance was conducted using R version 3.6.3 software (Martin et al. 2014) to assess their correlation respectively.

Results
A total of 129 D. nobile accessions were sequenced using GBS technology in order to understand the genetic diversity and population structure of the 10 populations. After ltering and quality controlling, raw reads were split and nally generated a total of 135G clean reads. A total of 836,786 SNPs of high quality data was yielded and were used for nal analysis. More than 60% of sequencing aligned to the reference genome, the depth of sequencing was 1×-3×, and the coverage was basically more than 11%. Statistics on sequence data ( Table 2) further showed that the number of total reads ranged from 6,497,132 (XM population)   It was shown in Table 3, from the genomic level, the total number of SNP sites in the 10 populations of D. nobile varied from 170,167 (XM population) to 255,053 (LJ population); The range of homozygous loci was 63,748 (SJ population)-141,624 (ML population), and the percentage of that was between 28.54% (FB population) -75.03% (ML population); the range of heterozygous loci was 46,452 (ML population) -168,225 (FB population), and the percentage of that was 24.97% (ML population)-71.46% (FB population). It was also found that more transitions than transversions, and the ratio of transition/transversion varied from 1.804to 1.911. STRUCTURE analysis was used to study the population structure of D. nobile based on 836,786 high quality SNPs from the 129 accessions. STRUCTURE software was run and the cross validation error (CV error) was calculated with k ranging from 1 to 12 to determine the optimal k value and the number of groups (k) was plotted. The results showed that (Fig. 1): when the k value is from 1 to 2, the CV error value decreased rapidly, at k = 3, a distinct in ection point was observed, the CV error value decreased to the minimum value, and then gradually increased to 8, after a rapid, gentle and linear rising respectively, nally the CV error value reached the maximum value at k = 12. So the most appropriate number was found to be k=3, which produced the lowest CV error value compared to other k values.
As a consequence, a total of 129 D. nobile accessions were classi ed into three groups. To classify groups, accessions unequivocally were assigned to a group when the threshold of the membership coe cient (Q) was higher than 0.6, while those with the Q lower than 0.6 were assigned to the admixed group. It was shown in Fig. 2, the rst group, highlighted in red, consisted of 11 accessions, accounting for 8.53% of all accessions, with 8 accessions from LJ population, 1 accession from FM WT and XM population, respectively. The second group which was the largest group, highlighted in blue, accounting for 58.14% of all accessions, consisted of 75 accessions, including 15 accessions from FB population, 11 accessions from SP and WL population, respectively, 10 accessions from FX population, 9 accessions from FM and SJ population, respectively, 6 accessions from XM population and 4 from WT population. The third group, highlighted in green, consisted of 29 accessions, accounting for 22.48% of all accessions, including 9 accessions from ML population, 7 accessions from XM and WT population respectively, 3 accessions from WL population, and 1 accession from FM, FX and LJ population respectively. In addition, the 14 accessions, with Q value of each group was less than 0.6, accounting for 10.85% of all accessions, were in the admixed group, that could not be unequivocally assigned to any of the above three groups.
The genetic distance matrix of all D. nobile accessions from 10 populations was calculated by p-link software and cluster analysis was carried out to construct the phylogenetic tree, as shown in Fig. 3. The phylogenetic tree was further generated that classi ed 129 accessions into three major clusters, except for the remains of 14 accessions (gray in color) in admixed cluster also. By comparing STRUCTURE groups versus NJ-tree clusters, it was possible to observe that all the accessions in group I, II, III and admixed were distributed in the cluster I, II, III and admixed, respectively. Cluster I (red in color) was the smallest cluster consisting 11 accessions, most of them from LJ population, and others from FM WT and XM population, respectively. Cluster II (blue in color) was the largest cluster consisting of 75 accessions from 8 populations respectively, except for ML and LJ populations. Cluster III (green in color) was comparatively smaller cluster, consisting of 29 accessions from 7 populations, including ML, XM, WT, WL, FM, FX and LJ population respectively.
The clustering in phylogenetic tree revealed genetic relationship of 129 accessions from the 10 populations. The green cluster and the red cluster were close together, indicating that accessions in the two clusters had a close genetic relationship. The blue cluster was far away from the red and green cluster, indicating that the genetic relationship between them is also far away. Different color clusters had different degrees of accessions mixing of different populations, indicating that there was a certain phenomenon of gene exchange between different populations.
Further principal component analysis (PCA) was performed to identify patterns of genetic variation among individual accessions and the 10 populations. All accessions were divided into four groups in the PCA plot (Fig. 4a). "0(orange in color)", "1(dark green in color)", "2(cyan in color)" and "3(purple in color)" represented admixed group, group I, II, and III in the STRUCTURE group respectively. The rst principal component explained 15.02% of variation and the second explained 8.23%, both explained 23.25% of the total variation. All accessions in group II and a small amount of accessions in group III were placed in the positive (upper) portion of the plot of PC2 axis, while all accessions in group I, in admixed group and most accessions in group III were placed in the negative (lower) portion of the plot of PC2 axis. Only group I in the positive (right) portion along the plot of PC1 axis, the other three groups were placed in the negative (left) portion along the plot of PC1 axis. Moreover, it had been found that accessions in the group III showed a loose distribution, while in the group I and II showed centralized in the PCA plot indicating the larger genetic diversity present in group III.
In Fig. 4b, an initial analysis of population structure was performed using PCA. The 10 populations were represented by 10 different colors in the plot. It had been found that accessions from LJ and WT population respectively showed high dispersal, indicating high and higher genetic diversity in the two populations; while accessions from ML population clustered tightly together, indicating lower genetic diversity and it could be distinguished from other populations at the PCA plot. Moreover, different populations, except for ML population, appeared to overlap and became undistinguishable clearly with PCA, indicating that their genetic background was similar.
From the above, we concluded that the results of STRUCTURE analysis, PCA, and the phylogenetic tree showed a high degree of similarity. All accessions could be classi ed into three groups and one admixed group and there was a consistent one-to-one match among the three methods.
In order to understand the genetic diversity of the ten populations, the diversity parameters were calculated in Table 4  The genetic difference among the 10 populations was estimated by comparative analysis of genetic differentiation coe cient (Fst) ( Table 5  Furthermore, correlation between genetic distance and actual geographical distance of the 10 populations were also analyzed. Highly signi cant correlation was observed between genetic distance and actual geographical distance (r = 0.854, P < 0.0001), indicating that the genetic differentiation of the 10 D.nobile populations conformed to the geographical isolation model.
In order to de ne the patterns of genetic variation and to estimate the variation components among the populations, an analysis of molecular variance (AMOVA) was performed based on the pairwise genetic distances between the 10 D.nobile populations. The AMOVA revealed that much greater variation within populations (28.96%+58.84%=87.8%) than among populations (12.2%, p<0.001) ( Table 6). In other words, 28.96% and 58.84% of the variation was attributed to the differences among individuals within populations and within individuals respectively, whereas among the populations, account for 12.2% of the total variation. This con rmed that intra-population variation was the main source of genetic variation in the 10 D. nobile populations. The results also showed that Nm = 1.799 > 1, indicating that there was gene exchange between different populations.

Discussion
Phenotypic traits and molecular markers are two different tools for assessing genetic diversity. In previous study, the diversity of 18 phenotypic traits of the same 10 populations was analyzed, and the results showed that inter-population variation was the main source of phenotypic traits variation of D.nobile (He et al. 2021). In the present study, the AMOVA analysis of the same 10 populations based on GBS molecular markers revealed that the intra-population variation was the major genetic variation of D. nobile, indicating morphological traits were often in uenced by the environment.
At the same time, in previous study, the cluster analysis of the 10 populations of D. nobile based on 18 phenotypic traits showed that at the Euclidean distance of 6.5, the 10 populations could be divided into 3 groups, which generally showed the rule that populations with similar geographical location and latitude gathered together, and phenotypic traits showed the pattern of geographical variation. In this study, the same 10 populations of D. nobile were clustered by UPGMA method according to the genetic distance. At the genetic distance of 0.075, the 10 populations could also be divided into three groups. There were similarities and differences between clustering based on molecular marker and phenotypic traits. The similarities were as follows: in the two kinds of clusters, ML population, at the southernmost end of Yunnan Province was clustered alone. Differences were as follows: in phenotypic clustering, XM population and LJ population gathered together; while in molecular clustering, LJ population gathered solely, the XM population gathered with WT population rstly to form subgroup, and then with the other 6 populations clustering in another subgroup gathered to form a group, indicating that the genetic background of D. nobile resources from XM and WT population was similar, and accessions of XM population might be introduced from Wutong town in Hejiang county of Sichuan province. In addition, in phenotypic clustering, FX population was closer to WT population, but in molecular clustering, FX population was closer to FM population, indicating that the genetic background between the two populations was similar. The SP and SJ population clustered together in molecular clustering, which was grouped independently in phenotypic clustering. Similar reports had been reported that there were inconsistencies between molecular clustering and phenotypic clustering in D. o cinale and other plants species (Zhang et al. 2019;Yang et al. 2016).
The results of cluster analysis were well supported by the analysis of genetic differentiation among the 10 populations based on the genetic differentiation coe cient (Fst). The pairwise values of Fst between ML population and the other 9 populations were the largest, varied from 0.271 to 0.446, all above 0.25, indicating obvious genetic differences between them. It could explain why ML population is not clustered with the other 9 populations. The Fst value between FX population and FM population was 0, showing a high degree of genetic homogeneity, no genetic differentiation between the two populations.
In addition, the results were inconsistent with the report by Bhattacharyya et al. (2013, that a greater degree of variation existed among the populations compared to that within the populations of D. nobile in Northeast India using SCoT, ISSR, and DAMD markers. The other studies had reported the similar results of D. nobile (Zhang et al. 2013;. Similar reports in D.moniliforme, another species of the Dendrobium genus (Ye et al. 2017). The reasons were considered that there had been high genetic differentiation in the long-term evolution process owing to geographic isolation, fragmented distribution and lack of exchange between natural populations. In our study, the accessions of D. nobile were from cultivated populations, the Nm value for the 10 populations was 1.799, implicating that genetic exchange occurred among the D. nobile populations in the neighboring distributing regions.
A different level of genetic diversity of the 10 cultivated populations from southwestern China was observed by analysis of genetic parameters. For example, it was shown the higher genetic diversity in WT population, while the lower genetic diversity in ML population. Firstly, it could be related to its own breeding system which was asexual reproduction. In a separate population far away from other populations and within its own small population, the distribution area might have genetic drift due to geographical isolation and lack of gene exchange with other populations, thus reducing the level of genetic diversity of ML population. This result was consistent with the conclusion of previous report (Yan et al. 2015). Their study found that D. nobile from Hainan Island in China showed a low level of genetic diversity, and there was a high degree of genetic differentiation between D. nobile from mainland China. Due to geographical isolation, there were genetic differences and signi cant regional speci city. There was a relatively higher level of genetic diversity in WT population, the reason was that Wutong town located in Hejiang county of Sichuan province and at the junction with Chishui city of Guizhou province. It is convenient to communicate with Sichuan and Guizhou of two provinces so the genetic diversity in the population was high. Geographical factors also had an impact on genetic diversity. The 10 cultivated populations distributed in different regions, and the longitude and latitude span a large range. Different geographical differences also lead to different levels of genetic diversity to adapt to environment. The results were consistent with the reported that the latitude and altitude had a positive effect and average annual temperature had negative in uence on the population genetic diversity of Gymnocarpos przewalskii (Xu et al. 2018). As Mousadik et al. (1996) reported that the genetic variations had been affected by its own breeding system, distribution area, habitat change, individual cell mutation, founder effect, species legacy areas produced in the Pleistocene, etc.
Molecular clustering still showed obvious geographical characteristics, indicating that genetic differentiation conformed to geographical isolation mode. Through correlation analysis, there was a signi cant correlation between genetic distance and geographical distance. The reason might be that the accessions collected in this study distributed widely, spanning 3 Provinces, including Sichuan, Guizhou and Yunnan province which distributed in southwestern China. It showed different climatic and ecological conditions in different geographical regions. There existed a different degree of genetic differentiation between these populations could be related to their adaptability to different environments. Similar reports had been reported in D. o cinale (Yuan et al. 2013), as well as other families such as naked fruit wood (Xu et al. 2018) and cassava in Brazil (Carvalho and Schaal 2001).
In the current study, a combination of different three analysis methods was employed in studying the genetic diversity and population structure of D. nobile from the 10 populations in southwestern China. Interestingly, the three methods, including STRUCTURE analysis, phylogenetic tree analysis, and principal component analysis (PCA) in general showed the same results and supported each other. The all accessions of D. nobile could be divided into three groups using the three methods respectively, except for the 14 accessions which were in admixed group. Moreover, the accessions classi ed by the three methods correspond to each other one by one. In other words, the three methods play a mutually reinforcing role.
In this study, comprehensive analysis showed that GBS technique could be suitable for the genetic diversity and population structure studies of D.