Analysis of Genetic Diversity And DNA Fingerprinting In Early-Maturing Upland Cotton Using SSR Markers

In this study, DNA ngerprinting and genetic diversity analysis of 79 early-maturing upland cotton (Gossypium hirsutum L.) cultivars were performed using Simple Sequence Repeat (SSR) molecular markers. From 126 pairs of SSR primers, we selected 71 pairs of primers that gave good polymorphisms and clear bands, had good stability, and showed even distribution on the cotton chromosomes, and 142 polymorphic genotypes were amplied. The average number of alleles amplied with the SSR primers was 2.01. The polymorphism information content (PIC) of the markers ranged from 0.1841 to 0.9043 with an average of 0.6494. The results of ngerprint analysis showed that nine varieties had characteristic bands, and at least six primer pairs could be used to completely distinguish all 79 cotton accessions. Using NTSYS-pc 2.11 cluster analysis, the genetic similarity coecients between the cotton genotypes ranged from 0.3310-0.8705, with an average of 0.5861. All cotton accessions were grouped into ve categories at a similarity coecient of 0.57, which was consistent with the pedigree sources. At the same time, the average genetic similarity coecients of early-maturing upland cotton varieties in China showed a low-high-low pattern of variation over time, revealing the development history of early-maturing upland cotton varieties from the 1980s to the present. This also indirectly reects that in recent years, China's cotton breeders have focused on innovation and have continuously broadened the genetic resources for early-maturing upland cotton. results indicate that seven varieties including ‘ICR-CAAS10,’ ‘Jiu Mian2’, ‘ICR-CAAS16’, ‘Yumian5’, ‘Yumian9’, ‘Jinmian3’, and ‘Lumian10’ group together in Class I. Eight varieties ‘Sumian1’, ‘Zhongmian24’, ‘Zhongmian26’, ‘Yumian7’, ‘ICR-CAAS18’, ‘Heishan Mian1’, ‘Jinzhong200’, and ‘Kings improved1’ are in Class II.


Introduction
Cotton is an important economic crop in China and the leading raw material used in the textile industry [1][2][3] . In recent years, cotton cultivation in China has shown a trend of moving eastward, westward, and northward 4 . In order to ensure safety of cotton production and utilization, China will continue to maintain the three existing cotton-producing regions; the Northwest inland (Xinjiang), the Yellow River Basin (YRB), and the Yangtze River Basin (YTRB) for the foreseeable future. However, expansion of cotton production in Xinjiang is limited by water shortages. The cotton plantation area in YTRB will be maintained at 660,000 hm 2 well into the future because of its suitable geographical climate, developed cotton spinning industry, and stable market demand. Cotton is the dominant crop in YTRB, although its competitiveness is weak due to its long growth period, high labor requirements, and high production cost. Therefore, it is urgent to select and breed new cotton varieties suitable for mechanized production in order to reduce labor costs and increase cotton planting e ciency 5 .
Early-maturing upland cotton has the typical characteristics of a relatively short growth period and concentrated owering and boll opening [6][7] . Early-maturing upland cotton is one of the main targets of cotton breeding in YTRB for the future; this will not only allow for two crops per year by rotation with winter crops such as wheat and rape, but is also suitable for mechanized harvesting to achieve simple and e cient cotton production 8 . A total of 79 early-maturing cotton accessions were collected and introduced from northern China to improve the local core germplasm resources that have long growth periods. To fully realize the genetic variation present in the introduced germplasm resources, it is necessary to study the genetic diversity of the 79 accessions.
Molecular marker technology is one of the main tools used for studying the genetic diversity of cotton varieties both in China and overseas, and the marker types include but are not limited to restriction fragment length polymorphisms (RFLPs), random ampli ed polymorphic DNA (RAPD), ampli ed fragment length polymorphisms (AFLPs), simple sequence repeats (SSRs), and single nucleotide polymorphisms (SNPs) [9][10][11] . Of these marker types, SSRs have the advantages of high polymorphism, good reproducibility, co-dominance, and simple operation [12][13] . SSRs have been useded for cotton DNA ngerprinting, genetic diversity analysis, and QTL mapping 8,[14][15][16][17][18] , and this has enhanced the protection of cotton germplasm resources and enabled the genetic improvement of cotton varieties in China. In the current research, DNA ngerprinting, genetic diversity, and the genetic relationships among 79 early-maturing upland cotton accessions were analyzed with SSR markers. The results of our study will provide genetic resources for the breeding of new varieties of early-maturing upland cotton in Hunan and YTRB through a systematic understanding of the genetic backgrounds of 79 early-maturing upland cotton germplasm accessions.

Experiment material
A total of 79 early-maturing upland cotton germplasm obtained from the National Cotton Mid-term Gene Bank (Anyang, Henan, China) were used for this study. All materials were used with the National Cotton Mid-term Gene Bank's permission and national guidelines. The maturity information of these materials is also provided by the Gene Bank. These materials were collected by the National Cotton Mid-term Gene Bank from six regions, including the Institute of Cotton Research of Chinese Academy of Agricultural Sciences (ICR-CAAS) which we consider to be a separate branch because it is a national institute and its cotton varieties are sui generis (of their own kind), YRB (Henan, Shanxi, Shandong, Jiangsu), the Northwest Inland Region (Xinjiang, Gansu), the Liaohe River Basin, the United States, and the former Soviet Union (Table 1). All materials were planted in the Deshan Experimental Field of the Hunan Institute of Cotton Sciences Research in 2015. We investigated the maturity of these materials and all growth period of the materials were less than 110 days in Hunan province. Polymorphic SSR markers were selected from testing in other materials for many years, and the primers were synthesized by Shanghai Yingjun Biotechnology Co., Ltd. PCR reagents (Taq DNA polymerase, dNTPs, DNA marker size standard) were purchased from Beijing Quanjin Biotechnology Co., Ltd. Our studies did not involve any endangered or protected species. Mature leaves collected from the eld were ash frozen in liquid nitrogen and ground to a powder. Genomic DNA was extracted from young leaf tissue, by CTAB DNA extraction procedure, as described by Zhang 19 with some modi cations, and the nal concentrations were adjusted to 50 ng•μL -1 and stored at -20°C.
Ampli cation reactions were stored at 4°C. The PCR products were separated by polyacrylamide gel electrophoresis (PAGE) on 8% gels. Electrophoresis was performed at 200V for 45 min, and the bands were observed by silver staining and photographed.

Band recording and data analysis
DNA fragments ampli ed with a primer pair that had the same migration position in the PAGE gel were recorded as 1, absence of a band was recorded as 0, bands that were blurred or had a deletion were recorded as 999, and the [0, 1] binary data matrix was constructed. The polymorphism information content of the SSR primers was calculated as ; genotypic diversity was calculated as ; the number of effective alleles per locus was , where Pi represents the gene frequency of the ith allelic variation at a certain locus. Genetic analysis of the 79 cotton accessions was performed using NTSYS-pc2.1 software. The Jaccard similarity coe cient was found using the Qualitative program in Similarity for the original [0,1] binary data matrix obtained from the EST-SSR markers. Based on the genetic similarity coe cient, the UPGMA (unweighted pair group method with arithmetic mean) algorithm in the SAHN program was used for cluster analysis, and the phenogram was generated using the  each had one characteristic primer ( Table 3). The primer pair NAU4044 was able to uniquely identify four varieties including 'Xinluzao25', 'Jiumian9', 'Lumianyan28', and 'Liaomian 5'. Primer pair NAU3254 could distinguish three varieties, 'Xinluzao20', 'Liaomian17', and 'Liaomian19'. These results indicated that these two primer pairs had abundant polymorphism, strong discrimination power, and numerous characteristic bands, and could be used as preferred markers in the identi cation of ngerprints.   (Fig. 2). This demonstrates that the genetic relationships are relatively close among the 79 early-maturing upland cotton accessions, but that some genetic diversity is still present.
We performed genetic diversity analysis of the 79 early-maturing upland cotton accessions from six regions that included the China Cotton Institute, YRB (Henan, Shanxi, Shandong, Jiangsu), the Northwest Inland Cotton Region (Xinjiang, Gansu), the Liaohe River Basin, the United States, and the former Soviet Union. By comparison, the accessions from the former Soviet Union had the smallest average genetic similarity coe cients of the six regions, which increased in the order of the YRB cotton area, the United States, the China Cotton Institute, the Liaohe River Basin Early Maturing Cotton Area, and the Northwest Inland Cotton Area, indicating that there is ample genetic diversity in cotton resources imported from abroad. At the same time, the genetic diversity of accessions from the YRB cotton region is relatively high, which may be related to the complex geographical diversity of the YRB cotton region and the dispersion of breeders in Henan, Shanxi, Shandong, Jiangsu, and other provinces.
We found that the genetic similarity coe cients of accessions from the six regions were between 0.5575 and 0.6143, and the highest similarity coe cients were found between accessions from China and the USA. This indicates that early-maturing upland cotton varieties selected by ICR-CAAS have close genetic relationships to selections from the USA. In general Chinese cotton germplasm is more frequently exchanged for resources from the USA compared with other regions. The lowest genetic similarity coe cient in the early-maturing cotton areas is between YRB and the Liaohe River Basin, with a value of 0.5575, and the second lowest is between YRB and the Northwest Inland Cotton Area, with a value of 0.5636 ( Table 5). The underlying reason for this may be that the YRB cotton-growing area has a better climate with warmer conditions, resulting in more varieties of earlymaturing upland cotton and larger differences between varieties than the early-maturing cotton areas of the Liaohe River Basin and the Northwest Inland Cotton Area.
Comparisons of the genetic similarity coe cients between domestic and foreign early-maturing upland cotton varieties showed that except for the Northwestern Inland Cotton Area, the genetic similarity coe cients between cotton varieties from the ICR-CAAS, YRB, and the Liaohe River Basin are higher. This suggests that earlymaturing upland cotton grown in the ICR-CAAS, YRB, and Liaohe River Basin in the early-maturing cotton area contains more American germplasm. Because of the introduction and utilization of early-maturing upland cotton in China, the majority of early maturity genetic resources came from American gold-colored cotton. The genetic similarity coe cients between accessions from the Northwestern Inland Cotton Region and the former Soviet Union is relatively high. This may be because the Northwest Inland Cotton Region is adjacent to the former Soviet Union, so it is easier to introduce germplasm resources into China from there.  The average genetic similarity coe cients of early-maturing upland cotton varieties in China have shown a lowhigh-low pattern over time (Fig. 3). This may be because before the 1980s, domestic early-maturing upland cotton breeding was mainly carried out by introducing different early-maturing varieties from abroad and systematically using them in breeding. Since the early 1980s, cotton production and the cotton spinning industry have developed rapidly. Due to economic reform and the opening up of the country, transportation is more convenient, and the exchange of germplasm resources between breeding units has become frequent. In particular, a number of outstanding varieties (lines) such as 'Heishanmian1' and 'ICR-CAAS10' stand out from the competition and are used by other breeders as donor parents. This has resulted in closer genetic relationships between the varieties selected at this time, with higher genetic similarity coe cients and less genetic difference. In the 1990s, the di culties of domestic distant hybridization were continuously overcome, and breeders consciously chose parental materials with complex genetic backgrounds for cross-breeding, which resulted in a signi cant reduction in the genetic similarity coe cients of cotton varieties and increased the genetic difference. After 2000, the use of modern breeding technologies (transgenics and molecular markerassisted breeding) not only accelerated the cotton breeding process, but also broadened the source of available cotton genes, resulting in further reductions in the genetic similarity coe cients among new varieties of earlymaturing upland cotton in China 20 .
Based on the Jaccard similarity coe cient, 79 early-maturing upland cotton varieties were grouped using a hierarchical clustering method (UPGMA) (Fig. 4) . This shows that the clustering results re ect certain geographical distribution characteristics, and the genetic differences of the cultivars from the same area are relatively small, which is why they cluster together.

Discussion
In recent years, SSR molecular marker technology has been widely used in the genetic diversity and ngerprinting of cotton germplasm resources. For example, Han et al. 21  However, there are few reports on the use of SSR molecular markers to study early-maturing upland cotton in China. In our study, we constructed DNA ngerprints of 79 early-maturing upland cotton accessions using 73 SSR markers and analyzed the molecular data to determine genetic similarities. We found that 72 main varieties can be divided into ve categories (Classes I, II, III, IV, and V) with a genetic similarity coe cient of 0.57, and the clustering results for the 79 early-maturing upland cotton accessions were consistent with the pedigree (Fig. 5).
For example, 'Liaomian15' and 'Liaomian18' were rst clustered together, traced back to their pedigree sources, and were both found to contain 'Liao 1038' in their ancestry. 'Xinluzao6', 'Xinluzao9', 'Xinluzao27', 'Xinluzao31', and 'Xinluzao39' are grouped together in Class I, and their pedigree sources show that they all are descended from 'Bell Snow'. 'Xinluzao6', 'Xinluzao9', and 'Xinluzao27' clustered into a group with a genetic similarity coe cient of 0.87. We also identi ed several cases of incomplete matching. In addition, the genetic relationship of 'ICR-CAAS37' and 'Liaomian6' seem to be relatively distant but can be grouped together. However, the genetic relationship of 'Liaomian6' and 'Liaomian16' seem to be relatively close but do not cluster together. This suggests that the classi cation is not entirely dependent on pedigree, but is also in uenced by the selection method, breeding process, and target traits. Pedigree analysis can only re ect the relative genetic information between the varieties. Molecular marker analysis using marker loci distributed over the whole genome can more accurately re ect the genetic differences between varieties 24-25 .   The genetic similarity coe cients of early-maturing Upland cotton varieties developed over the past four decades