Genome Wide SSR Development and Their Application in Genetic Diversity Analysis in Wax Gourd

Background: Wax gourd (Benincasa hispida Cong., 2n=2x=24) is one of the most important winter vegetables of the Cucurbitaceae family. There are only limited markers available for this crop and the draft genome of wax gourd provides a powerful tool for SSR marker development. Results: In this study, we developed genome-wide SSR markers from wax gourd genome and characterized their distribution and frequency of different motifs and repeats. A total of 52,431 microsatellites from wax gourd genome were identied, of which 39,319 SSR markers were developed. 1,152 non-wax gourd SSR markers were selected from cucumber, melon, watermelon and pumpkin to test their transferability in wax gourd. 580 SSR markers could be transferable in wax gourd, and 42 of them were detected with polymorphic in 11 tested accessions of wax gourd. In addition, 11 good polymorphic transferrable SSR markers and 21 SSR markers of wax gourd were selected to investigate the genetic diversity and population structure of 129 wax gourd accessions. 112 alleles were detected by these 32 SSR markers. The result of population structure showed that the 129 wax gourd accessions were divided into two main populations, and the genetic diversity analysis separated them into two clusters. Conclusions: The large number of wax gourd SSR markers developed in this study provides a valuable resource for genetic linkage map construction, molecular mapping, and marker-assisted selection (MAS) in wax gourd. for each marker ranged from 2 to 9, with an average of 3.5. The effective number of alleles ranged from 1.142 to 6.684 with an average of 2.060 per locus. Shannon′s information index (I) for each marker ranged from 0.265 with an average of 0.750. Observed heterozygosity (Ho) for each marker ranged from 0.009 to 0.561, with an average of 0.230. Expected heterozygosity (He) for each marker ranged from 0.125 to with an average of 0.427. Polymorphism Information Content (PIC) value ranged from 0.118 to 0.832, with an average of 0.370 per locus. valuable resource for genetic linkage map construction, molecular mapping, and marker-assisted selection (MAS) in wax gourd. Furthermore, the transferability of SSR markers developed from cucumber, melon and watermelon was also validated and compared in wax gourd. The genetic diversity and population structure of 129 wax gourd germplasm collected from different provinces of China was investigated, which showed the distribution and clustering of Chinese wax gourd are closely related to the distribution of water system in China. These data have improved our understanding of domestic wax gourd germplasm resources and conductive to the effective research, utilization and introduction of germplasm resources.


Background
Wax gourd (Benincasa hispida Cong., 2n = 2x = 24) is an important economically horticultural crop of the Cucurbitaceae family which also includes several other important vegetables such as cucumber (Cucumis sativus), melon (Cucumis melo), watermelon (Citrullus lanatus), bottle gourd (Lagenaria siceraria) and pumpkin/squash (Cucurbita spp.). China and East India were considered to be the origins of wax gourd [1]. Nowadays, wax gourd has been widely distributed in China, India, and many other countries [2], and it has a cultivation history of more than 2000 years in China. The germplasms of wax gourd showed a wide diversity which can be easily seen from morphological variations of fruit shape, size and weight. The fruit weight of wax gourd ranged from less than 1 kg to more than 25 kg, and the shape is varied from oblate to long cylinder. The fruit of wax gourd can be stored for a long time, which make it plays an important role in annual supply and regulating off-seasons of the vegetables [3]. Wax gourd contains abundant Vitamin C and no-fat, and it has high nutritional and medicinal values. Ripe wax gourd juice can be used to cure insanity and epilepsy [4] and possesses anti-ulcer activity [5].
The purpose of research resources is to take full advantage of heterosis in breeding, while the relationship of different germplasms is di cult to determine by simply relying on agronomic traits or geographical proximity. Modern molecular biotechnology is an effective method to study the genetic relationship and diversity of resources [6]. They can facilitate the rapid screening of polymorphic loci for markers were developed. Among those molecular markers, Simple Sequence Repeat (SSR) marker is an ideal marker in various applications, dues to many desirable features including easy to use, relative abundance, reproducibility, codominant inheritance and whole genome coverage [7,8]. For these excellent advantages of SSR markers, it was widely used in many applications such as gene mapping construction, ngerprinting, genetic diversity, population structure analysis and comparative mapping [9][10][11][12].
However, the genomic SSR markers of wax gourd are still very limited, only a few SSR markers were developed based on the transcriptome sequences [3], which greatly limits their application in many genetic studies. Up to now, only a high-density genetic map [19], transcriptome sequences for several tissues [3], and a small number of genomic fragments [3,20] have been developed in wax gourd. Abundant SSR markers have been widely used in many other crops for numerous diverse studies. For instance, the genome wide SSR markers have been developed and applied in watermelon [21], melon [12] and cucumber [22]. The genome wide transferrable SSR markers have also been identi ed in cucurbit species by comparative genome analysis between cucumber, melon and watermelon [12,21]. Therefore, it will be valuable to development whole genome SSR markers of wax gourd for its genetic diversity analysis and genetic mapping.
In the present study, we identi ed and characterized the distribution and frequency of different microsatellite motifs in the wax gourd genome. In addition, a total of 1,152 non-wax gourd SSR markers were selected to test their transferability in wax gourd. Finally, 11 good polymorphic transferrable SSR markers and 21 SSR markers developed from wax gourd genome were selected and applied in a collection of 129 wax gourd accessions to investigate their genetic diversity and population structure.

Results
The transferability of non-wax gourd SSR marker in wax gourd. Eleven wax gourd accessions with high morphological diversity were selected to test the transferability of 1,152 SSR markers developed from other crops of the Cucurbitaceae family. 580 of them had ampli cation products in the tested wax gourd accessions, and each of them had clear bands in at least ve of these accessions. Of them, the SSR markers developed from watermelon had the highest transferability, and 170 of 288 (59.03%) were transferrable in wax gourd, followed by 153 from melon, 142 from pumpkin and 115 from cucumber. We also checked the polymorphism of 580 SSR markers in eleven wax gourd accessions, and 42 of them were polymorphic in different accessions of wax gourd with 2, 15, 5 and 20 from cucumber, melon, watermelon and pumpkin, respectively (Table 1). Table 1 The numbers of transferrable and polymorphic SSR markers in wax gourd.

Origins
Watermelon Melon  Cucumber Pumpkin  Total   SSR markers  288  288  288  288  1152   Transferable markers 170  153  115  142  580   Polymorphic markers 5  15  2  20  42 Genome wide SSR markers development in wax gourd. The analysis of transferrable markers from nonwax gourd genomes showed a low transferability of these markers, which is far from enough in the genetic study of wax gourd. Therefore, we developed the whole genome wide SSR markers from wax gourd draft genome assembly of B227. A total of 52,431 microsatellite loci were identi ed from wax gourd draft genome. The total sequence length of all microsatellites accounted for 0.13% of the whole genome, with an average of 55 SSR/Mb. Among different repeat types, the dinucleotides were the most common type accounting for 41.19% of the total SSR loci discovered, followed by trinucleotides (16.71%), while octonucleotides were the least frequent repeat type (3.12%) ( Table 2). The SSR motif distribution with regarded to repeat numbers has also been investigated. The microsatellite frequency was decreased as the number of repeat units increased, which was more obvious for longer SSR motifs (Fig. 1). For example, the mean number of repeat motifs in heptanucleotides and octonucleotides were 3.16 and 3.14 respective, and the number of microsatellites were 6,170 and 1,637 respective ( Table 2). Furthermore, the repeat motifs for each type of SSRs identi ed in the wax gourd genome were also examined. We found that some nucleotide motifs were more prevalent than others. For example, the AT motif was dramatically overrepresented in dinucleotide motifs, and it was also the most frequent motif in the entire wax gourd genome accounting for 35.8% of the total SSR loci discovered. Similarly, the AAT, AAAT, AAAAT, AAAAAG, AAAAAAT, and AAAAAAAG were the most abundant repeats types in each class (Additional Fig. S 1). The frequency and distribution of SSR in each chromosome showed that the number of microsatellite loci was positively correlated with their chromosome size (Additional Table S 4, Fig. 2). The largest number of microsatellites were detected on chromosome 01 (5,715), followed by chromosome 08 (5,391), and the least SSR number was found on chromosome 07 (2,880). The SSR density near the centromeres is generally low, and there is also a low SSR density at the end of some chromosomes. The sequences containing microsatellite loci were screened for PCR primer design using Primer 3, and 50,298 SSR loci contained suitable anking sites for SSR primer design. Finally, we designed 39,319 SSR primers with some SSR loci included in the same primers as compound SSRs. The exact positions of these SSRs in the wax gourd chromosomes, as well as information on repeat motifs, expected PCR product size are presented in Additional Table S 5.
The colors from blue to red indicate a gradual increase in the density of SSR markers Genetic diversity of different phenotypes in wax gourd. We collected 8 phenotypes of 129 wax gourd accessions, and found that the range of coe cient of variation of them was from 0.11 to 0.40 ( Table 3).
The 129 wax gourd accessions showed relatively narrow variation in eight quantitative traits. The trait with the maximum coe cient of variation was fruit weight (0.40), followed by fruit length (0.28) and fruit diameter (0.20), while the trait with the minimum coe cient of variation was leaf length (0.11). The results showed that the coe cient of variation for fruit related traits was higher than that of all the leaf related traits, indicating that fruit related traits have a higher potential for improvement from these germplasms.
In this study, we also performed a correlation analysis for some related traits. The results showed that there was a high positive correlation between fruit related traits, and the same tendency was also observed among leaf related trait, while the correlation relationship between leaf traits and fruit traits was very weak. Among them, the positive correlation between leaf length and leaf width was the highest (0.92), followed by single fruit weight and esh thickness (0.69) (Fig. 3).  Genetic diversity analysis of wax gourd accessions. To investigate the genetic diversity of different wax gourd germplasm, 19 SSR markers developed from transcriptome of wax gourd and 48 SSR markers developed from draft genome of wax gourd in this study were used to evaluate their polymorphism in 11 tested wax gourd accessions. 21 of them were identi ed with good polymorphism, and they were selected together with another 11 non-wax gourd SSR markers in the genetic diversity analysis of 129 wax gourd accessions. The information of those 32 SSR markers was list in Additional Table S6. Totally, 112 alleles were detected by these 32 SSR markers (  Population structure and genetic diversity analysis of wax gourd germplasm. The population structure of 129 wax gourd accessions was analyzed using a model-based software STRUCTURE that employs Bayesian assignment. Evanno′s correction method [23] was applied, which showed a clear peak at K = 2 (  Table S 1).
The ngerprinting data of 129 wax gourd accessions were used to construct a dendrogram using neighbor-joining method in MEGA 6, and these accessions were classi ed into 2 major clusters, named cluster and cluster , respectively (Fig. 5) (Fig. 5).

Discussion
With the rapid development of sequencing technologies, the genome sequence of many eld crops and horticultural crops have been completed [11,13,[24][25][26]. These available genomic sequences are valuable resources for SSR development, and genome wide identi cation of SSR have been investigated in many plant species [12,22]. SSR marker has been widely used in many plants species for genetic diversity analysis such as rice [27], wheat [28], maize [29] and cucumber [10]. Though the genome sequence of wax gourd had been sequenced recently, the genome wide SSR markers have not been developed. The lack of su cient molecular markers has become a major challenge and restricted the development of many studies, such as genetic diversity analysis, ne mapping and genome wide association analysis. SSR markers are conserved among closely related species and genus, thus SSR markers have been proved to be useful in cross-species transformation [30][31][32]. Cross-species transferability of SSR markers offered an easy, time and cost effective way to develop SSR makers in such species from related species whose SSR markers have been developed [33]. They are not only useful in the genetic diversity analysis and map construction in closely related species, but also provide important genetic information for comparative genomics [11,21,34]. Therefore, the transferability of SSR markers has been widely used in the family of Poaceae [32], Leguminosae [31], Vitaceae [35], and Cucurbitaceae [30,36,37]. However, the transferability of SSR markers in wax gourd from other cucurbit species is still large unknown. In the present study, the transferability of SSR markers from cucumber, melon, watermelon, and pumpkin were tested in wax gourd by selecting 288 SSR markers developed from each of these species. The number of transferability of SSR markers in wax gourd was 170, 153, 142 and 115 from watermelon, melon, pumpkin and cucumber, respectively. The transferability of SSR markers are usually higher between the evolutionary more closely related species [21]. Among these ve cucurbit species, the wax gourd is more closely related to watermelon [38], and the highest number of transferable SSR markers were also developed from watermelon which were consist with their evolutionary genetic relationship.
Although the number of transferrable markers may be higher, the proportion of polymorphism may be very lower. In our study, the transferrable SSR markers of non-wax gourd species showed a low polymorphism in wax gourd germplasms. It is far from enough to meet the needs of map construction and others genetic research. With the draft genome sequence of wax gourd available, we identi ed 52,431 microsatellites from wax gourd B227 genome assembly with a frequency of 55 SSR/Mb in this study ( Table 2). The number of microsatellites and their density identi ed in our study was lower than that in cucumber (552 SSR/Mb) and Arabidopsis (371 SSR/Mb) [22], watermelon (111 SSR/Mb) [21], and melon (109 SSR/Mb) [12]. One main reason for these differences was due to the search parameters used for detection of microsatellites. For example, different repeat types (mononucleotides to pentanucleotides vs. mononucleotides to octanucleotides) of different minimum lengths (12 vs. 18 bp) were searched using the same or different software. In the present study, we analyzed the distribution and frequency of microsatellites with motifs of 2-8 bp long and minimum lengths of 18 bp or minimum of three repeat units in wax gourd genome (Fig. 1). The criterion we used was according to the fact that polymorphism levels and mutation rate correlate positively with the number of repeat units. Different studies have revealed that high numbers of repeats especially for dinucleotides and trinucleotides in long microsatellites are more likely to be polymorphic as compared to shorter one because of higher rate of DNA replication slippage [39,40].
The differential SSR distribution has been reported between intronic and intergenic regions, and different chromosomes, and different species have different frequencies of SSR types and repeat units [41][42][43]. Frequency analysis of various nucleotide repeats in wax gourd revealed that dinucleotide repeats were the most abundant SSRs followed by trinucleotide, tetranucleotide, pentanucleotide, heptanucleotide, hexanucleotide, and octonucleotide repeats ( Fig. 1 and Table 2). This was different from the trend in other species. For example, the tetranucleotide repeats were the most abundant in cucumber, Medicago truncatula, Populus trichocarpa, and Vitis vinifera, and the trinucleotide repeats were the most abundant in Glycine max, Arabidopsis thaliana, Oryza sativa, and Sorghum bicolor [22]. Overall, the AT rich motifs such as AT and AAT were the predominant SSR repeat types in each class in wax gourd, representing 41.2% and 16.7% in dinucleotide repeats and trinucleotide repeats, respectively. Conversely, GC-rich repeat SSR motifs were very rare in all the nucleotide repeats. This result is consistent with other studies indicating that genomic SSRs with GC-rich repeats are rare in plant species [44,45]. The frequency and distribution of different SSR type in different chromosomes revealed that the frequency of microsatellite loci was positively correlated with the chromosome size in wax gourd. This was different from the trend in melon and water melon [12,21].
SSR markers have been used in the genetic analysis of many horticultural crops [12,21,[46][47][48]. Though there are a few SSR markers of wax gourd developed from the transcriptome [3], the genetic diversity of wax gourds has been rarely analyzed in previous studies. In this study, 32 SSR markers were used for inferring population structure and genetic diversity analysis of 129 wax gourd accessions. 112 alleles were detected in the 129 wax gourd accessions using the 32 markers and the number of different alleles (Na) for each marker ranged from 2 to 9, and their PIC values ranged from 0.118 to 0.832 with an average of 0.370. The average of alleles and PIC values were lower than studies in bottle gourd [48], radish [46] and watermelon [47]. The lower levels of polymorphism may be caused by the narrow genetic background of 129 wax gourd accessions, which were all collected from China. This was further con rmed by population structure and genetic diversity analysis. The 129 wax gourds were divided into two populations and clusters (Figs. 4 and 5), and some accessions from different regions were mixed, suggesting that they have similar genetic background.

Conclusions
In our study, we identi ed 52,431 microsatellites from wax gourd genome and developed 39,319 SSR markers from them loci. The distribution and frequency of different motifs and repeats was also characterized on different chromosomes. The large number of wax gourd SSR markers developed in this study provides a valuable resource for genetic linkage map construction, molecular mapping, and markerassisted selection (MAS) in wax gourd. Furthermore, the transferability of SSR markers developed from cucumber, melon and watermelon was also validated and compared in wax gourd. The genetic diversity and population structure of 129 wax gourd germplasm collected from different provinces of China was investigated, which showed the distribution and clustering of Chinese wax gourd are closely related to the distribution of water system in China. These data have improved our understanding of domestic wax gourd germplasm resources and conductive to the effective research, utilization and introduction of germplasm resources.

Methods
Plant materials. A total of 129 wax gourd accessions were used in this study for genetic diversity analysis, which were collected from 23 different provinces of China. These accessions were provided by the national germplasm repository for vegetatively propagated vegetables. Of them, 24 accessions were selected from Henan province, followed by 16 from Hunan, 14 from Shandong, 12 from Jiangsu, 10 from Fujian, and the all the remaining provinces have less than 10 accessions. The origin information of 129 wax gourd accessions was list in Additional  China). The phenotypes of eight traits were collected from 129 wax gourd accessions including fruit diameter, fruit length, esh thickness, single fruit weight, leaf diameter, leaf length, petiole length and petiole diameter. These phenotypes were collected from ve plants of each accession. The statistical formula of coe cient of variation: CV = SD / X, and the correlation were calculated using R 3.5.1.
Non-wax gourd SSR markers selection. In order to test the transferability of SSR markers from other species of the Cucurbitaceae family in wax gourd, 1,152 SSR markers were selected from cucumber, melon, watermelon and pumpkin with 288 SSR markers from each crop. These SSR markers of four species were selected based on their physical position in the genome assembly. In cucumber, an average of 41 markers were selected from each of the seven cucumber chromosomes. In pumpkin, 288 SSR markers were selected from all the 20 chromosomes with at least three markers from each of the 20 chromosomes. In watermelon, 288 SSR markers from 2-11 chromosomes, it is evenly distributed across each chromosome. In melon, 288 SSR markers were selected from seven chromosomes, they are mainly come from chromosome 7-12, except for three markers from the end of chromosomes 6. The detail information for these SSR markers were provided in Additional Table S 2. All these SSR markers were grouped into non-wax gourd SSR markers in this study. Furthermore, 19 SSR markers of wax gourd (Additional Table S 3) provided by Guangdong Academy of Agricultural Sciences were also used as control in genetic diversity analysis.
SSR identi cation and primer design in wax gourd genome. The wax gourd draft genome was downloaded from Cucurbit Genomics Database (http://cucurbitgenomics.org/). In order to develop a higher polymorphism SSR platform for future study, the parameter of microsatellite identi cation in this study was from 2-to 8-bp motifs, and mononucleotides were not considered due to the di culty of distinguishing bona de microsatellites from sequencing or assembly error. DNA sequences were searched for both perfect and compound microsatellites, with a basic motif of 2-8 bp, using the computer program MISA (Microsatellite identi cation tool) [49]. Repeats with a minimum length of 18 (for di-to tetranucleotides), 20 (for pentanucleotides), 24 (for hexanucleotides), 21 (for heptanucleotides), and 24 bp (for octanucleotides) were recorded. The physical positions of the SSRs found in the chromosomes were also recorded, and oligonucleotide primers were designed for the genomic sequence anking these SSRs using Primer 3 (v. 1.1.4) software [50]. Primers were designed to generate amplicons of 100-300 bp in length with the following minimum, optimum and maximum values for Primer 3 parameters: primer length (bp) 18-20-24 and Tm (℃) 50-55-60. Other parameters used the default program values.
DNA extraction and PCR ampli cation. The unexpanded young leaves from each accession were collected into 2.0 mL microcentrifuge tubes, lyophilized in a freeze dryer, and ground into ne powder.
Genomic DNA was extracted with a modi ed cetyl trimethylammonium bromide (CTAB) method [51], and the quality of DNA was quanti ed by spectrophotometer and further checked on agarose gel. Each polymerase chain reaction (PCR) contained 25 ng template DNA, 0.5 µM each of forward and reverse primers, 0.2 mM dNTPs mix, 0.5 unit of Taq DNA polymerase and 1 × PCR buffer in a total volume of 10.0 µl. The ampli cation was carried out at initial denaturing step at 94 ºC for 4 min followed by 30 cycles of 94 ºC for 20 sec, 58 ºC for 45 sec and 72 ºC for 1 min. In the last cycle, primer extension was performed at 72 ºC for 10 min and storage at 4 ºC till electrophoresis. The PCR products were sizefractionated in a 9% polyacrylamide gel. The 100-bp DNA ladder was used as molecular size marker. After gel electrophoresis, band patterns were visualized with silver staining, and gel images were taken with a digital camera.
Data analysis. To test the transferrable SSR markers of non-wax gourd crops, we used the DNA template of crops as positive control in which the SSR markers were developed and collected. In addition, 11 wax gourd germplasms with high diversity in morphological difference were selected to test the polymorphism of SSR markers. Furthermore, 12 wax gourd accessions with high diversity in morphological difference were selected to test the polymorphism of 67 SSR markers developed from wax gourd genome.
The polymorphic SSR markers were manually scored as binary date with presence as "1" and absence as "0". The observed (Na) and effective (Ne) number of alleles, Shannon's information index (I), and levels of observed (Ho) and expected (He) heterozygosity were calculated by Popgen 32. Polymorphic information content (PIC) for molecular markers was calculated as PIC = 1-∑Pij2, where Pij is the frequency of the ith allele for the jth SSR locus [52]. The genotypic distance matrix were conducted using the GeneAlEx 6.5 [53] and the neighbor-joining method in the software MEGA 6 [54] was used to construct the dendrogram on the basis of the distance matrix.
The population structure of 138 wax gourd accessions was inferred using STRUCTURE 2.3.4 [55]. Several population numbers (from K = 1 to 10) were tested by the software to identify the highest ΔK who represented the true value of K [23]. The option of correlated allele frequencies were selected, a burn-in period of 50,000 steps, and 100,000 Markov Chain Monte Carlo (MCMC) replicates; each run was replicated 10 times to ensure consistency of results. Distribution of SSR motif repeat numbers and relative frequency in the wax gourd genome. The vertical axis shows the abundance of microsatellites that have different motif repeat number (from 3 to >15), which are discriminated by legends of different colors