Genetic diversity and SSR markers development of ancient Camellia sinensis in Sandu County of Guizhou Province

Background: The genetic abundance of ancient tea germplasm has been preserved in the long evolution process, which provides valuable resources for the protection and breed selection of ancient tea germplasm. However, the limited studies related to the genetic diversity of ancient tea germplasm restrict protection and breed selection. Therefore, the genetic diversity of ancient tea germplasm in Sandu county of Guizhou Province was explored in this study. II.Methods and Results: The genetic diversity was analyzed using phenotypes and SSR markers. The ranges for the variation coecients of the six quantitative and seven qualitative characters were 17.76%-60.37% and 18.58%-50.64%, respectively. The ranges of diversity indices of the six quantitative and seven qualitative characters were 1.72-2.74 and 0.55-0.84, respectively. Ninety-six bands were amplied using 15 pairs of SSR primers from the 145 samples, and the average polymorphism information index was 0.66. The analysis revealed that the average values of Nei’s genetic diversity index (H) and the Shannon information index (I) are 0.26 and 0.41, respectively. Further, a genetic similarity coecient of 0.734 shown by UPGMA dendrogram classied the 145 samples decreased into four groups. III.Conclusions: This study revealed the rich phenotypic variation and high molecular genetic diversity and the genetic diversity of the arbor is higher than that of the shrub of the ancient tea germplasm in Sandu of Guizhou province. Thus, this study not only provides a theoretical basis for the protection and breed selection but also promotes further research of ancient tea germplasm.


Introduction
Camellia sinensis is a perennial cross-pollinating plant having high genetic diversity and good genetic differentiation {Karthigeyan, 2008#2}{Karthigeyan, 2008#2}(Karthigeyan et al. 2008. It is a shrub or small tree from the genus Camellia in the family Camellia. It has high biodiversity and is an abundant resource of germplasm, and an essential strategic resource for the sustainable development of the tea industry (Danhui 2013;Lin et al. 2020). The ancient tea germplasm is a very good treasure of resources (Liu et al. 2010). The exploration of the genetic diversity of this germplasm may facilitate effective protection and utilization of these tea resources (Niu et al. 2019). However, due to man-made and natural causes, these valuable resources are gradually diminishing (Tripathi and Negi 2006).
Guizhou is one of the origin places of tea; Guizhou Sandu Aquarium Autonomous County is located in the Yunnan-Guizhou Plateau, the territory of mountain peaks, fertile soil, mineral elements and organic material rich, year-round cloud and mist, frost-free period long, abundant rainfall, year-round average temperature up to 18 °C, is the natural growth of tea forest (Zheng 2017). It is reported that Sandu Aquarium Autonomous County is rich in ancient tea germplasm group. Therefore, samples of ancient tea germplasm from this area were collected for this study, hoping to have a certain understanding of the biodiversity of ancient tea germplasm, and to screen and get the molecular markers that can distinguish the ancient tea germplasm in this region, and to develop the molecular markers. The most effective detection method for plant variety identi cation is DNA molecular marker technology (Jiao et al. 2018).
Good polymorphism, ease of operation, and standardization, in addition to being unaffected by environmental factors, crop growth period, and other factors, are the advantages of this technology (Koebner et al. 2001;Lu et al. 2018). Furthermore, advantages, such as codominance, simple operation, good universality, and high reproducibility, have made the SSR marker an essential agricultural industry standard for the identi cation of wheat (Thungo et al. 2020), watermelon (Mujaju et al. 2011), pepper (Lu et al. 2011), rice (Raza et al. 2020) and other crop varieties in China, among various molecular marker methods (Raza et al. 2020).
In this study, the genetic diversity of ancient tea germplasm in Sandu of Guizhou was analyzed using phenotypic characters and SSR markers combined with capillary electrophoresis technology to enable the exploration of these high-quality resources. Additionally, this study also provides a theoretical basis for the protection and promotes further research of ancient tea germplasm.

Materials
The materials collected from four plots of ancient tea germplasm in May 2019 in Sandu aquatic autonomous county of Guizhou Province, for this study, are given in Table S1.

Phenotypic statistical method
Phenotypic characters of the resources of ancient tea germplasm viz., tree shape, leaf color, leaf length, leaf width, leaf tip, leaf tooth depth, leaf tooth density, leaf vein logarithm, leaf area, leaf shape, altitude and height of individual trees were recorded immediately after sample collection (M et al. 2016). It should be noted that the above-mentioned data was surveyed by a single person to minimize the visual errors for certain phenotypic characters.
Standardized assignment of data of phenotypic character Phenotypic characters used in the study are categorized as quantitative and qualitative characters, based on either direct or descriptive methods of measurement, respectively. Examples of quantitative characters are leaf length and leaf width, whereas those of qualitative characters are leaf color, leaf shape, and leaf tip (Kim et al. 2012). The data values for the quantitative characters were calculated following the direct measurement of collected samples. The 'Description Speci cation and Data Standard of Tea Germplasm Resources' was used to standardize the qualitative characters of the collected samples in this study (Table S2) (Steel 1992).

Statistical analysis of phenotypic characters
Following data standardization, the SPSS 19.0 software was used to analyze the genetic diversity of phenotypic characters (Inês et al. 2015). The genetic diversity index was calculated using the formula for Shannon-Weaver index (H): see formula 1 in the supplementary les section , where Pj is the frequency of the J code of a certain character (Bajracharya et al. 2006).

Cluster analysis of phenotypic characters
Cluster analysis was carried out by using standardized data of phenotypic characters of ancient tea germplasm. Initially, the Euclidean distance between 145 materials was calculated. This was followed by using the intergroup connection method in SPSS software (default) to obtain the cluster diagram (Benedetti et al. 2007).

SSR molecular marker technology
Genomic DNA of ancient tea germplasm was extracted using the CTAB method (Tamari et al. 2013). A total of 15 pairs of primers with good polymorphism, stable ampli cation, and interspeci c differences were used ( Table 1). The components of the PCR mixture for ampli cation are 1.5 µL DNA (2 ng/µL), 10.72 µL ddH 2 O, 1.5 µL 10*Buffer (Mg 2+ ), 0.6 µL dNTP (2.5 mmol/L), 0.08 µL Easy Taq (5U/µL), 0.3 µL Primer F (10µmol/L), and0.3 µL Primer R (10µmol/L). The Touchdown PCR program was used. This program is divided into two stages; First stage-4 min pre-denaturation at 94 °C, 40s denaturation at 94 °C, 30s annealing at Tx+8 °C(Tx is the optimal annealing temperature for the primer), 72 °C extension for 90s, 16 cycles (each cycle is reduced by 0.5 °C) The second stage-the annealing temperature is reduced to Tx, 94 °C modi ed 40s, Tx °C annealing 30s, 72 °C extensions for 90s, 24 cycles, 72 °C extensions for 12 min and 12 ° C storage. The SSR-PCR products were detected by capillary electrophoresis, and the original matrix of 0-1 was derived (Gomes et al. 2018). This experiment using SSR markers was done in triplicate. POPGENE 32 was used to compute the observed number of alleles (Na), the effective number of alleles (Ne), Nei gene diversity index (H), Shannon diversity information index (I), genetic distance, and genetic consistency (Yeh et al. 2000). The NTSYS PC-2.1 software was used for cluster analysis following the unweighted average method (UPGMA), which was used to construct the cluster diagram (Rohlf et al. 2000).

Diversity analysis of phenotypic characters
Results of statistical analysis of seven qualitative characters of 145 ancient tea germplasm resources are tabulated in Table S3. It can be observed that leaf tooth depth displayed the highest coe cient of variation (50.64%), whereas leaf quality showed the lowest coe cient of variation. The coe cients of variation for phenotypic characteristics of leaves such as their tooth depth, tip, color, size, shape, quality, as well as for that of tree shape ranged within the minimum and maximum values of coe cients of variation (Table S3). Further, the diversity indices for all phenotypic characters except leaf quality were found to be higher than 0.75.The diversity index for leaf quality was 0.55. Leaf color displayed the highest diversity index of 0.84.
A statistical analysis of six quantitative characters shown in Table S4 revealed that tree height had the highest (60.37%) and leaf vein logarithm had the lowest (17.76%) coe cients of variation, respectively. The coe cients of variation decreased in the following manner of quantitative characters-tree height, leaf area, the logarithm of leaf teeth, leaf length, leaf width, and logarithm of leaf vein. The diversity indices of all quantitative characters were above 1.5, the lowest Leaf vein logarithm showed the lowest (1.72), whereas the leaf area displayed the highest (2.74) diversity indices, respectively. Resultant values of the statistical analysis indicate the rich genetic diversity in the phenotypic characters of ancient tea germplasm in Sandu County, Guizhou province.
Phenotypic character diversity analysis of ve ancient tea germplasm populations The variation coe cients of ancient tea germplasm were observed to signi cantly vary among different populations and for phenotypic characteristics (Table S5). The tree type for the ancient tea germplasm collected from Landong and Guqi villages were shrubs, and hence the coe cient of variation was not included in the study. Among the ancient tea germplasm of arbors, the variation coe cient of leaf tip in Yangmeng village was observed to be the highest (41.7%), whereas that in Zenya village, was found to be the lowest (30.84%). There was no obvious difference in the coe cient of variation of leaf shape between 19.57% and 25.66%. The largest variation coe cient of leaf tooth depth of the ancient tea germplasm population in Zenya village was 57.77%, and the smallest was 34.54% of the ancient tea germplasm population in Landong village. The largest variation coe cient of leaf color was 55.8% of the ancient tea germplasm in Yangmeng village, and the smallest was 34.81% of the ancient tea germplasm in Landong village. The observed variation coe cients of leaf vein logarithm, leaf teeth logarithm, leaf length, and leaf width were small. The variation coe cient of tree height was the highest in the ancient tea population in Zenya village. Among the two shrub groups of ancient tea germplasm, the variation coe cient range of Guqi village was the widest and ranged from 13.05% to 55.36%.
It can be observed that the average genetic diversity index of the arboreal ancient tea germplasm populations is relatively high (Table S6). The genetic diversity index of leaf tip was the highest in Yangmeng village (1.0889) and the lowest in Landong village (0.6089). The highest genetic diversity of leaf shape and leaf size was observed in the ancient tea germplasm population of Landong village, and the values were 1.0209 and 0.7653, respectively. The highest diversity index of leaf tooth depth was observed for the ancient tea germplasm group in Yangmeng village. The highest genetic diversity index of leaf color was observed to be 1.0067 in Zenya village. The highest genetic diversity index of leaf veins was 2.6921 in Zenya village, and the lowest value, i.e., 1.8867, was observed for Yangmeng village. The highest genetic diversity index of leaf tooth logarithmic was observed for the ancient tea germplasm in Landong village, and the lowest was noted for Yangmeng village. The highest genetic diversity index of tree height belongs to the ancient tea germplasm in Landong village, which is 3.3431. The genetic diversity index of leaf length ranged from 2.3026 to 3.7226, with the highest diversity. The genetic diversity index of leaf width is the highest (3.3193) in Landong village. Further, the lowest was the ancient tea group in Yangmeng village. The genetic diversity index of leaf area was reported to be the highest in Zenya village, as 3.9890, followed by Landong village (arbor) and Yangmeng village. Between the two shrub ancient tea germplasm groups, the genetic diversity indices of leaf size, leaf color, leaf tooth logarithm, tree height, leaf length, leaf width, and leaf area in Guqi village were higher than those of the shrub ancient tea germplasm group in Landong village. This indicates that the genetic diversity index of the ancient tea germplasm in Guqi village is high.

Principal component analysis of phenotypic characters
In order to clarify the role of each character in the phenotypic diversity of ancient tea germplasm, 13 phenotypic characters were subjected to principal component analysis (Upadyayula et al. 2006). The contribution rate of the rst six components reached 85.48%, indicating that they can re ect the features of 13 phenotypic characters (Table S7) (Tehrim et al. 2012). The correlation analysis between the original characteristic variables and the six principal components (Table 2) found that PC1 mainly represented the leaf size, leaf length, leaf width, and leaf area. Its principal component can be de ned as the size factor of the leaf. PC2 mainly represents leaf shape and leaf shape index, while PC4 mainly represents leaf tip and leaf vein logarithm, both of which are related to the leaf shape. These two principal components are de ned as the leaf shape factor. PC3 mainly represents the depth and logarithm of leaf teeth, which are de ned as the leaf teeth factors. PC5 mainly represented leaf quality and was de ned as a leaf quality factor. In a similar way, PC6 was de ned as the vein factor on behalf of the leaf vein logarithm.
The principal component analysis of ancient tea germplasm (Table S8) indicated that the high positive value of PC1 re ects that the leaf size of ancient tea germplasm resources was dominated by middle and large leaves. LDT8, ZYT59, ZYT62, ZYT67, ZYT69, ZYT70, ZYT71, ZYT73, ZYT74, ZYT76, ZYT81, ZYT82, ZYT84, ZYT85, ZYT86, ZYT87, ZYT90, ZYT91, ZYT94, ZYT95, ZYT97, ZYT98, YMT99, and YMT101 belong to this category. On the other hand, a lower negative value of PC1 indicates that the leaves of this ancient tea germplasm are smaller. High positive values of PC2 indicated that the leaf shape of these materials tended to belong to elliptic or lanceolate. LDT14, LDT20, LDT33, LDT35, LDT36, LDT54, ZYT55, ZYT56, ZYT57, ZYT58, ZYT61, and ZYT63 can be classi ed into this category. A low negative value of PC2 indicates that the leaf shape of these materials tends to be round, and LDT21, LDT22, ZYT71, GQB121, GQB123, and GQB124 belong to this category. A higher positive PC3 value indicates that this kind of material has deeper leaf teeth and more logarithm of leaf teeth, ZYT69, YMT100, YMT101, GQB109, GQB110, GQB111, GQB112, GQB113, and GQB115 belong to this kind of material. A lower negative PC3 value indicated that the leaf teeth of this kind of material are shallow, and the logarithm of leaf teeth has a lower value. The high positive value of PC4 indicated that there were many vein logarithms of this material, and the leaf tip is obtuse, and LDT1, LDT2, LDT3, LDT4, LDT14, LDT35, YMT100, YMT101, YMT102, LDB139, and LDB143 can be assigned to this category. A lower PC4 negative value re ects that the leaf tip of these materials is sharp, and the logarithm of the leaf vein is less. The high positive values of PC5 indicated that the leave of such materials are relatively hard, such as LDT10, LDT16, LDT46, LDT47, ZYT58, ZYT64, ZYT66, ZYT68, ZYT82, ZYT83, ZYT96, YMT99, LDB134, LDB136, LDB137, etc. A low negative value of PC5 indicates that the leaves of these materials are soft, LDT5, LDT8, LDT9, LDT11, LDT12, LDT20, LDT29, LDT43, LDT44, LDT45, LDT48 belong to this category. A high positive value of PC6 indicates that this kind of material has more leaf vein logarithms, and LDT2, LDT5, LDT7, LDT9, LDT12, LDT20, LDT33 are in this category. A low negative value of PC6 indicates that the leaf vein logarithms are less, including LDT21, LDT29, LDT40, LDT43, ZYT63, and LDT52.

Cluster analysis of phenotypic characters
The cluster analysis with a genetic distance of 10, divided the 145 experimental samples used in this study into 4 categories. The rst class comprised 31 ancient tea germplasm, which included ancient tea germplasm from trees in Landong village and ancient tea germplasm from all shrubs. The second class consisted of a total of 38 ancient tea germplasm, composed of arboreal ancient tea germplasm in Landong village and all ancient tea germplasm in Yangmeng village. Ancient tea germplasm of a total of 22 shrubs from the Guqi village formed the third class, whereas the fourth class comprised of 54 arboreal ancient tea germplasm in Zenya village.

Geneticdiversity analysis of germplasm resources of ancient tea germplasm
Three sets of duplicate data were analyzed by POPGENE 32 (Table S9 and Table S10) (Liu et al. 2016).
The results showed that the genetic consistency between 145 samples of ancient tea germplasm was 0.5765-0.9529. The genetic consistency of arbor lies between 0.5882-0.9529, and the largest one is for the ancient tea germplasm numbered 25 and 37. It indicates that the genetic background is similar, and the similarity is high. The lowest genetic consistency was numbered 48 and 102, indicating their genetic distance is relatively far. The genetic consistency of ancient tea germplasm of shrubs ranges from 0.6118 to 0.9294. The trees numbered 124 and 126 have the highest genetic consistency, indicating that their genetic backgrounds are quite similar. The two shrubs numbered 113 and 132 have the lowest genetic consistency, indicating that their genetic distance is relatively far. The genetic distance between trees and shrubs consistency is 0.5765-0.9176. The maximum genetic consistency was observed between the arbor numbered 105, and the shrub numbered 108. The minimum genetic consistency was noted between the arbor no. 48 and the shrub no. 127, indicating that the genetic distance between these two ancient tea germplasms was relatively farther. POPGENE 32 software was used to analyze the genetic diversity index (Table 3) of the 145 ancient tea resources. The results showed that the average observed allele number (Na) was 2.0000, the average effective allele number (Ne) was 1.3984, the average Nei's genetic diversity index (H) was 0.2584, and the average Shannon information diversity index (I) was 0.4119. The results indicated that the genetic diversity among the 145 ancient tea germplasm materials is relatively high.
Genetic diversity parameter analysis of ancient tea germplasm at different sampling points Comparing the same genetic parameters between the samples collected at different locations (Table  S11) enabled the researchers to understand the differences between them. Compared with the arboreal ancient tea germplasm population in Landong village, the allele number (Na) of the arboreal ancient tea germplasm population in Zenya village increased by 2.21%, the Na of the shrubby ancient tea germplasm population in Guqi village decreased by 2.65%, and the Na of the shrubby ancient tea germplasm population in Landong village decreased by 7.52%. In contrast to the Shannon information index (I) of the ancient arboreal tea group in Landong village, the value of I for the ancient arboreal tea group in Zenya village was increased by 4.02%. The largest decrease was noted as 8.46% in the arboreal ancient tea group in Yangmeng village. On the other hand, in contrast to the polymorphism points of the arboreal ancient tea germplasm population in Landong village, in addition to the 3.80% increase in Zenya village, the others were decreased by 27.85%, 6.33%, and 16.46%, respectively. Among them, the largest decrease occurred in the ancient tea germplasm group in Yangmeng village. In contrast to with the percentage of polymorphism sites (PPB) in the arboreal ancient tea population in Landong village, the PPB in the arboreal ancient tea population in Zenya village increased by 4.61%, the PPB in the arboreal ancient tea population in Yangmeng Village decreased by 27.53%, the PPB in the shrubby ancient tea population in Guqi Village decreased by 5.53%, and the PPB in the shrubby ancient tea population in Landong Village decreased by 15.71%. Overall, the genetic diversity of arboreal ancient tea germplasm was richer than that of shrubby ancient tea germplasm.

UPGMA cluster analysis
Cluster analysis of 145 ancient tea germplasm was carried out with NTSYS-PC 2.1 (Rout et al. 2009), and the corresponding UPGMA tree diagram (Figure 2) was obtained. It was found that a genetic similarity coe cient of 0.734 divided the 145 ancient tea germplasm into four categories. The rst category included 44 ancient tea germplasm of Landong village. The second category comprised 54 ancient tea germplasm in Zenya village, and the third category had nine ancient tea germplasm in Yangmeng village. The fourth category was further subdivided into two subclasses. The rst subclass included 22 parts of the shrub ancient tea germplasm of Guqi village. The second subclass comprised 15 parts of the shrub ancient tea germplasm of Landong village as well as 1 part of the shrub ancient tea germplasm of Yangmeng village. The results from the cluster analysis suggest that the shrub ancient tea germplasm and the arboreal ancient tea germplasm have a greater genetic distance and far relationship, which conforms to the phenotypic clustering results. Moreover, the clustering results are closely related to the distribution region. Hence all ancient tea germplasm collected from a speci c region were grouped under one category. Thus, it can be said that SSR markers could be used to classify and identify the germplasm resources of ancient tea germplasm and study the relationship between them.

Discussion
There is an obvious connection between different phenotypic characters and genetic variation (Rode and Morrow 2010). Abundant genetic variation has led to the abundant phenotypic characters in ancient tea germplasm, found in Sandu of Guizhou province (Li-Jie et al. 2019). In a study conducted by Niu Suzhen et al. (2012), which analyzed the diversity in10 phenotypic characters of 144 ancient tea germplasm resources in Guizhou, it was reported that in addition to tree appearance, the variation coe cients of the other 9 phenotypic characters were all above 35%. Further, the study reported that the diversity indices of all the 10 phenotypic characters were above 0.85. The variation coe cients and diversity indices computed in this study were found to be lower than those reported by Niu Suzhen et al. (2012). The latter's sample collection of ancient tea germplasm from 32 ancient tea germplasm resources distributed in Guizhou, could account for the difference in the diversity indices and variation coe cients between the two studies.
This study, which also focuses on the phenotypic diversity of ancient tea germplasm reported higher phenotypic diversity in comparison to previous studies conducted by Huang Haitao et al. (2019) and Xie Wengang et al. (2019). Both these studies analyzed 30 phenotypic characters; however, Huang Haitao et al. (2019) sampled 100 Longjing tea plants from 10 natural villages in Xihu District, Hangzhou, while Xie Wengang et al. (2019) sampled 109 middle and small-leaf tea plants in Sichuan. The reasons for reporting a lower phenotypic diversity than this study could be viz., i) The tea germplasm sampled in this study might have experienced a long time of natural selection, which might have led to an accumulation of genetic variation, resulting in richer phenotypic characteristics (Rossetto et al. 1999), and ii) the environment in a particular region might impact the phenotypic characteristics of the tea germplasm (Weifeng et al. 2018).
Previous studies indicate that ancient tea germplasm in 26 county's Guizhou province has rich genetic diversity (Niu et al. 2013). This genetic diversity observed after analysis of 145 ancient tea germplasm in Sandu County was lower than that reported in the literature. This indicates that gene exchange in ancient tea germplasm is lower in comparison to an entire province. Further, the UPGMA clustering tree separates the arbors and shrubs. The clustering results connect the clusters and region of collected samples, which in turn indicates the stability and aggregation of genes of ancient tea germplasm in a speci c region (K. et al. 2003). Being surrounded by mountains, geographical isolation, and tra c inconvenience could block gene exchange between resources of ancient tea germplasm located in different regions. This could ultimately lead to the phenomenon of genetic stability and aggregation in a particular region Conclusion This study used phenotypes and SSR markers technology to analyze the genetic diversity of ancient tea germplasm in Sandu Aquatic Autonomous County. The results revealed that the genetic diversity of the arbor is higher than that of the shrub. Further, considerable genetic stability and aggregation among the ancient tea groups in the studied area, along with a rich genetic diversity within these groups, have been observed. Thus, this study not only provides a theoretical basis for the protection but also promotes further research of ancient tea germplasm.
Declarations "Survey, Identi cation and Utilization of Ancient tea germplasm Planting Resources in Sandu"; Science and Technology Support Program (Agriculture) of Guizhou, China. QianKeHeZhiCheng No. [2020]1Y001 .
Authors' contributions: ZYC, ZDG and DX conceived and planned the experiments. WRY and LQ carried out the experiments. WRY wrote the manuscript, all authors edited and approved the manuscript.
Data availability statements: The data that support the ndings of this study are available from the corresponding author on reasonable request.
Compliance with ethical standards Con ict of interest: No competing or con ict of interests declared by all the authors.
Research involving human and animal rights: No human participants or animals were involved in this research.