Genome wide characterization and comparative analysis of simple sequence repeats in Cucurbita genomes

The Cucurbita genus contains important economic crops in the world, while limited molecular markers have been developed in the past years. Simple sequence repeats (SSR) markers are powerful tools for the study of genetic mapping construction, genetic diversity analysis and genome wide association. The availability of pumpkin genome information has made it possible to analyze SSRs in genome wide across three Cucurbita species. In this paper, based on the whole genome sequences, 34,375 SSR loci were found in C. moschata , 30,577 SSR loci were found in C. maxima and 38,104 SSR loci were found in C. pepo . C. pepo has the maximum density of SSRs with an average of 145 SSR/Mb. In general, the frequency in total SSR loci decreased with the increase of the motif length, dinucleotide motifs were the most common motifs in the three species, and for the same repeat types, the SSR frequency decreased sharply with the increase of the repeat number. Most of those SSR loci were suitable for marker development (84.75% in C. moscata , 94.53% in C. maxima and 95.09% in C. pepo ). Based on those markers, we compared and analyzed the cross-species SSR markers between C. pepo and other Cucurbitaceae species by silico-PCR. Using these cross-species primers, the high collinear relationships between C. pepo and the other two species were detected, respectively. Furthermore, the application of SSR markers in genetic diversity analysis was tested in C. pepo , the results showed that they were good tools to be used in genetic diversity analysis. three some and SSR cross-species understand the universality and correlation of SSR markers among Cucurbitaceae crops, we compared and analyzed the cross-species SSR markers between C. pepo and other Cucurbitaceae species by silico-PCR. We identified 391 cross-species SSR markers between C. pepo and C. sativus , 425 cross-species SSR markers between C. pepo and C. melo , 717 cross-species SSR markers between C.

cucumber were confirmed by comparative mapping [22]. Based on massive genomewide SSR markers, the complicated mosaic patterns of the chromosome synteny between melon and watermelon or cucumber were also established [23]. However, the developments of SSR markers in cucurbit species were limited. According to the conserved sequences among species and genera, a few of AFLP, RAPD and SSR markers were developed in previous studies [24][25][26][27][28]. These markers were still far from enough in the genetic diversity or comparative genomics researches. Although Esteras et al, constructed the first Cucurbita genetic map based on the SNPs and found that the Cucurbita linkage groups were partially homoeologous to cucumber chromosomes using 304 SNPs and 11 SSR markers [29], the applications of these EST-SNP markers still had a great limitation. For example, the high cost of enzymes and the complicated operation procedure. Until recently, the related reports about Cucurbita genus gradually increased and the whole genome sequence of pumpkin was available. The whole-genome synteny analysis indicated that both C. maxima and C. moschata genomes underwent a whole genome duplication (WGD) event and pairs of C. maxima (or C. moschata) homoeologous regions shared between chromosomes corresponding to two sub-genomes [9]. Latterly, Montero et al. analyzed the genomic synteny between C. pepo and other Cucurbitaceae species (Cucumis melo, Cucumis sativus and Citrullus lanatus) and identified that the covered regions in most of the zucchini genome had a whole genome duplication again [8]. Furthermore, some transcriptomes of Cucurbita species were completed, and the EST-SSRs were developed [30][31][32][33][34]. The successive reports of Cucurbita genomic resources greatly promoted the large-scale development of SSR markers and gave us a benefit to construct the high-resolution maps and study the syntenic relationship and chromosomal rearrangements between Cucurbita species.
In this study, we carried out a genome wide identification of SSR motifs in three Cucurbita species, analyzed the distribution and frequency of different repeats types and studied the chromosome synteny of Cucurbita pepo with other Cucuribitaceae species. The derived SSR loci will be useful for population structure, genetic diversity, molecular assisted selection, map-based cloning and other studies in Cucurbita species. Here, we analysed repeat types ranged from dinucleotide to octanucleotide. Among all of these nucleotide motifs, dinucleotide motifs (41.0%) seemed to be the most common motifs accounting for 41.78%, 39.90% and 41.01% of the total SSR loci discovered in the three genomes, respectively, followed by trinucleotide motifs (16.97%, 19.19%, and 17.88%, respectively), the octanucleotide motifs (3.78%, 3.76% , and 3.38%, respectively) were the least represented repeat types (Table. 1). In general, the frequency in total SSR loci decreased with increase of the motif length, except for heptanucleotide.

The frequency and distribution of different SSR types in Cucurbita genome
We further examined the SSR motif distribution with regard to their repeat numbers ( Fig. 1). For all the repeat types, with increased of the repeat number, the SSR frequency decreased sharply, and this change was more obvious in the longer SSR motifs (Fig. 1). Consequently, the mean repeat numbers in dinucleotides were the highest in all the repeat types (Table. 1). The analysis of individual SSR types revealed that some of its own specific motifs were more prevalent than others in each class (Additional file 1, Fig S1). For example, The AT motif also was the most frequent motif in the three genomes, which accounting for 31.61% (in C. moschata), 28.81% (in C. maxima) and 30.45% (in C.pepo) of the total loci discovered. Similarly, the AAT, AAAT, AAAAT, AAAAAT, AAAAAAT, and AAAAAAAT motif (AATAATAT motif in C. maxima) were the most frequent types in each class, respectively (Additional file 1, Fig S1). These results revealed that AT-rich motifs were the most abundant ones in all SSR motifs in C. moschata, C. maxima and C. pepo genomes.

Page 7
We investigated the SSRfrequencies in each chromosome of the three Cucurbit species (Table. 2). The frequency of microsatellite loci was not correlated with the chromosome size. For example, in the C. moschata genome, the SSR density of the longest chromosome (Chr04) was just at the middle level, while Chr02having a much shorter length than Chr04 had the highest SSR density. Similar things also happened in the other two genomes indicating that the distribution of SSRs was uneven in the Chromosomes (Table. 2). For better understanding of the distributions of different SSR motifs, we further checked their frequencies on each chromosome (Fig. 2). The results showed that different SSR types had the same tendency among different chromosomes, it seemed that the distribution of different SSR types on one chromosome was decided by their frequencies in the whole genome and the SSR density of the Chromosome (Fig. 2).
The genomics sequences containing these microsatellites were screened for PCR primer design, and 94,272 SSR microsatellite loci contained suitable flanking sites for SSR primer design. C. moschata had the lowest proportion of SSRs suitable for primers design (84.75%), while the percentages in C. maxima and C. pepo reached 94.53% and 95.09%, respectively (Table. 1). Though the dinucleotide repeat type was the most frequent one in all the three genomes, it didn't have a good peformance in primers design. Hexatanucleotide repeat type has the highest ratio of SSRs suitable for primers design, followed by pentanucleotide repeat types in all the three genomes, indicating that the longer motifs were more suitable for primers design in Cucurbita (Table. 1). Finally, 91,248 SSR primers (28,194 in C. moschata, 28,061 in C. maxima and 34,993 in C. pepo) were designed with some primers included more than one SSR loci as compound SSR (Additional file 2, Table S2-Table S4).

Chromosome synteny relationship of Cucurbita pepo with other Cucurbitaceae species
In order to understand the universality and correlation of SSR markers among Cucurbitaceae crops, we compared and analyzed the cross-species SSR markers between C. pepo and other Cucurbitaceae species by silico-PCR. We identified 391 cross-species SSR markers between C. pepo and C. sativus, 425 cross-species SSR markers between C. pepo and C. melo, 717 cross-species SSR markers between C.  Table S5-S9). Interestingly, the ratio of collinear-blocks to inversion-blocks was nearly 1:1 among three Cucurbita species. Each C. pepo blocks shared 3-34 SSR markers with C. sativus, C. lanatus or C. melo. However, most of the C. pepo syntenic block shared larger number of SSR markers (3-1,080) with C. maxima or C. moschata. The C. pepo syntenic block CpeCmos53 had the largest number of shared SSR markers (1,080) between Cpe-Chr03 and Cmo-Chr14, which indicated that these regions were conservative during chromosome evolution.
By comparing the physical positions of those common shared-markers, the main syntenic relationships between C. pepo and other Cucurbitaceae species were listed in Table. 3, and the visualized chromosomes syntenic relationships of C. pepo with C. lanatus, C. melo, and C. sativus were shown in Fig. 3. The main syntenic relationships among chromosomes revealed complex mosaic patterns. In Fig. 3, each C. pepo chromosome was syntenic to more than two chromosomes in other Cucuribitaceae species. C. pepo chromosome Cpe9 and Cpe16 had the simplest syntenic pattern with watermelon, and each of them was mainly syntenic to one watermelon chromosome (Table. 3). Cpe9 was syntenic to watermelon chromosome W5, and 14 commonly shared SSR markers were found on them. From marker CpeSSR15544 to CpeSSR16107, there were three blocks and each block contained at least four SSR markers. According to the continuous physical positions of these markers on both of the reference genomes, the block CpeWM37 and CpeWM38 showed an inversion pattern and the block CpeWM39 showed a collinear pattern between C. pepo and C. lanatus (Aditional file 2, Table S7). Similar comparison works were carried out between C. pepo and C. sativus or C. pepo and C. melo using the cross-species SSR markers. C. pepo chromosomes cpe7, cpe8, cpe11 and cpe20 had the simplest syntenic pattern with C. sativus, while each of them just syntenic to one cucumber chromosome. Meanwhile, the simplest syntenic pattern between C. pepo and C. melo was mainly found on Cpe15, Cpe18, Cpe19 and Cpe20 (Table. 3).
The most complicated syntenic pattern was found on Cpe1, which was corresponding to five chromosomes of C.moschata, four chromosomes of C. maxima, seven chromosomes of C. lanatus, three chromosomes of C. sativus and five chromosomes of C. melon (Table. 3).
Compared with the above analysis, the syntenic relationships between C. pepo and C. moschata, or between C. pepo and C. maxima were more simple and clear. Each of the 20 chromosomes in C. pepo was mainly syntenic with one chromosome in C. moschata and one chromosome in C. maxima (Table. 3). In order to see their syntenic relationships more directly, the single chromosome comparison analysis was carried out (Fig. 4). From Fig. 4, it was easy to find that each chromosome in C. pepo was collinear to one chromosome in C. moschata and one chromosome in C. maxima, these three chromosomes had highly consistency; the two chromosomes, individually from C. moschata and C. maxima, even were named as the same order in each genome. For example, Cpe1 was syntenic with Cmo4 and Cma4, Cpe2 was syntenic with Cmo1 and Cma1 (Fig. 4). Those results indicated that C. pepo had closer relationships with C. moschata and C. maxima, compared with other species in Cucurbitaceae; C. moschata and C. maxima might have a closer syntenic relationship than with C. pepo.

Application of SSR markers in C. pepo genetic diversity and population structure analysis
In a preliminary experiment, about four hundred SSR markers were screened using 61 materials. Based on the account of allelic number, the genomic coverage, and the PCR amplification efficiency, 66-core SSR markers were obtained (Additional file 2, Table S10). These markers exhibited clear band spectrums and distributed evenly on the chromosomes. In this study, 66 pairs of primers amplified 276 polymorphic sites in 61 zucchini materials with an average of 4.18 loci per pair of primers (Additional file 2, Table S11). The number of observed alleles (Na) ranged from 2 to 9. The highest number of Na (9) were observed by SSR010246, SSR026560, SSR026918, SSR027656, and SSR026980, followed by SSR011546, SSR003315 and SSR026797 Page 10 with eight alleles. The number of effective alleles (Ne) varied from 1.03 to 6.07 with an average of 2.31. The Shannon information index (I) ranged from 0.083 to 1.96 with an average of 0.83. The PIC value ranged from 0.03 to 0.83 with an average of 0.43.
According to the results of the structure operation, it could be seen that when K=2, △K showed a significant peak value, indicating that 61 materials selected in this study could be divided into two groups (Additional file 1, Fig S2), named group I and group II. The group I contained 5 materials (8.20%), all of them were wild varieties; The group II contained 56 materials (91.80%), which were all cultivars (Fig. 5A). It indicated that our primers could clearly distinguish the cultivated materials from the wild varieties, and the backgrounds of the choosed materials were narrow. Furthermore, the phylogenetic tree was drawn by MEGA6.0 software (Fig. 5B), the 5 materials (2, 29, 30, 31 and 45) in group I were at the bottom of the phylogenetic tree, indicating that, based on the genetic analysis, the 61 zucchini materials were also divided into two clusters, which were completely in accordance with the result of structure analysis.

genomes
With the development of the sequencing technology, the discovery and mining of genomic SSR loci had successful applications in many plant species such as cotton [35,36], foxtail millet [37], cucumber [38], watermelon [39], tobacco [40] and melon [23]. Nevertheless, little effort has been paid on Cucurbita species. In this study, the density of the SSR markers in the genome of the Cucurbita species was about 113-145 SSR/Mb, C. pepo had the minimum length of the genome size and maximum number of microsatellites. The number of microsatellites and their density identified in this study was lower than that in cucumber (552 SSR/Mb), but was higher than that in melon (109 SSR/Mb) and watermelon (111 SSR/Mb) [23,38,39]. Except the natural difference among different genomes, there are many factors that can affect the deviations of the SSR density, different software, different parameter settings, and sequencing depths. For example, in this paper, basic motifs from dinucleotides to octanucleotides were checked with a minimum length of 18 bp (for di-and Page 11 tetranucleotides), 20 bp (for pentanucleotides), 24 bp (for hexanucleotides), 21 bp (for heptanucleotides), and 24 bp (for octanucleotides). In cucumber [38], though they also checked the motifs from dinucleotides to octanucleotides, their minimum lengths (12 bp for di-and tetranucleotides, 15 bp for pentanucleotides, 18 bp for hexanucleotides, 21 bp for heptanucleotides, and 24 bp for octanucleotides) were mostly shorter than our setting. The size of assembled grass genomes was higher than that in Cucurbita species, which leading to their higher SSR numbers than ours.
In this research, we analyzed the distribution and frequency of microsatellites in assembled Cucurbita genome. 36,234 polymorphic SSR markers in C. pepo identified by in silico analysis showed that 40% (14,489) and 18% (6,578) of them were dinucleotides and trinucleotides types, respectively (Table. 1

). Previous studies have
shown that the dinucleotides motifs with high repeat numbers were more polymorphic compared to those with short repeat units [41]. Consistent with the studies in watermelon and melon, the frequency analysis of various nucleotide repeat types in Cucurbita genus indicated that dinucleotide repeats were the most popular SSRs, followed by tri-, tetra-, penta-, hepta-, hexa-and octonucleotide repeats [23,39]. A general negative correlation was observed between the microsatellite frequency and the number of repeat units. Furthermore, AT or AAT type prevailed in the dicot plants [20]. Our results agreed with this result. Lately, SSR markers analysis in bitter gourd showed that all kinds of triple repeat units were the main type. A/T, AT/AT, AAT/ATT, and AAAT/ATTT were overrepresented and totally accounted for 63.86-76.30% of all motifs identified in the seven cucurbit genomes, respectively [42]. This pattern was also found in other genomes [18,43,44]. On the contrary, the frequency of the GC or CCG type was very low in the genomic level [45,46]. GC, TC or GA Page 12 types have relatively stable structures. Most of the AT types distribute in non-genic regions and TC/GA types in coding sequences [40]. It has been reported that many bacteria SSRs in intergenic regions have regulating functions [47]. Whether these pumpkin SSR motifs played any role during specialization or protein coding was uncertain.

The chromosome synteny analysis between C. pepo and other Cucurbitaceae species by cross-species transferable markers
The chromosome synteny analysis has been carried out in many plants, such as cucumber, watermelon and melon [23], but little has been reported in Cucurbita genus.
In this study, a large number of cross-species SSR markers were developed in Cucurbita genome, and they enabled us to investigate the syntenic blocks at a high resolution. Though the size of the pumpkin genome is similar to that of the other sequenced Cucurbitaceae species, the number of cross-species SSR markers in Cucurbita genus is much higher. Compared to hundreds of shared markers in previous studies [21], much more cross-species transferable SSR markers in Cucurbita genus were used for chromosome synteny analysis in our study. It might be the case that there had a WGD event leading to the high abundance of SSR markers [8,9], but it's not observed in other sequenced Cucurbitaceae species, such as cucumber [15], melon [48], watermelon [16].
After the genome comparison between C. pepo and cucumber, C. pepo and melon, C. pepo and watermelon, 52, 61 and 89 syntenic blocks were identified, respectively.
These blocks distributed on all chromosomes. The similarly works among Cucurbita species was also reported in previous study, about 63.2%, 58.7%, and 68.3% of the C. maxima genomic regions were syntenic to melon, cucumber, and watermelon, respectively, while those numbers changed to 64.0%, 62.2%, and 69.5% in C. moschata, and they were composed of homoeologous blocks [9]. In most cases, because of the chromosome fission, the multiple synteny relationships between C.
pepo and other Cucurbitaceae species were shown. The most complicated syntenic pattern existed in Chr1 of C. pepo, which was syntenic to seven watermelon chromosomes, indicating that complex structure changes occurred after their divergence from the ancestor. The ratio of collinear-blocks to inversion-blocks was Page 13 nearly 1:1 in Cucurbita genus, the possible reason was that the genome duplication and interchromosomal exchanges happened randomly [9].
According to the cross-species SSR markers analysis, we identified more syntenic blocks in Cucurbita species than that in melon, cucumber or watermelon. At the same time, we found that each block contained abundant SSR markers. For example, the C. pepo syntenic block CpeCmos53 had the largest number of shared SSR markers (1,080). As we all known, the largest number of shared SSR markers (386) between melon and cucumber was found in block C3 [23]. This indicated that there were lots of genetic variations and highly evolutionary relationships among homologous species. The single chromosomal analysis by cross-species shared SSR markers showed that there were six large-scale inversion regions in different chromosomes between C. pepo and C. moschata or between C. pepo and C. maxima. The previous study reported that all chromosomes except Chr4 contained only homologous blocks between C. maxima and C. moschata [9]. This might be due to genome duplication, large-scale inter-chromosomal exchanges or long-term evolutionary forces. It also indicated that C. pepo had more complex evolutionary processes than other Cucurbita species.

The genetic diversity and population structure of C. pepo
As one of the species with the most complex genetic background, limited molecular markers have been used for genetic diversity and population structure analysis in C.
pepo. For example, Ferriol et al. used SRAP and AFLP markers to analyze the population structure in 69 C. pepo species, and found that SRAP markers were more consistent with the morphological character [49]. Nonthuko et al. used nine RAPD markers and ten SSR primers to study genetic diversity in seven C. pepo inbred lines and detected 100 and 56 alleles, respectively, but only the RAPD markers could distinguish these cultivars according to their fruit colors [50]. Paris et al. used AFLP, ISSR and SSR markers to analyze the relationship between wild and domestic species, and successfully clustered them into three groups based on their fruit colors, fruit sizes and origins [26]. Because of the scarcity of highly polymorphic markers and genome-sequencing data, most previous studies in Cucurbita species have been done using low throughput and anonymous markers. In this study, we developed 91,248 Page 14 SSR markers from the three Cucurbita genomes, they would benefit the genetic studies in the future.
Genome wide SSR markers were not developed until the completion of the genomic sequencing in Cucurbita species. In this study, we employed 66 pairs of high polymorphic SSR markers to investigate the genetic diversity of 61 C. pepo materials and an average of 4.18 alleles were amplified. The UPGMA cluster analysis results were almost consistent with the results of population structure analysis (Fig. 5). In this study, we have successfully clustered these materials into two subspecies, subsp. The Cucurbita species is an economically important crop, but its breeding process fall behind the other Cucurbitaceous species. Limited high-quality cultivars cannot meet the production requirements. Thus, speeding the current breeding programs using MAS become more and more important. The whole-genome SSR markers detected in this study will promote the development and the utilization in basic and applied research.

Conclusions
In this paper, based on reference sequences of three Cucurbita species, a total of 91,248 SSR markers were developed at the whole genome level. Then their frequency and distribution were detected and analysed. According to these markers, the cross-Page 15 species SSR markers were checked by silico-PCR and the synteny relationships were analysed between C.pepo and other Cucurbitaceae species. Furthermore, 66 polymorphism SSR markers were employed to check the genetic diversity of C.pepo.
These SSR markers will benefit the related researches and accelerate the process of pumpkin breeding.

Plant Materials
All the pumpkin materials used in this study were introduced from the national crop germplasm resource platform of china (sub-platform of vegetable germplasm resource) in 2018, among which 4 materials came from Russia, 1 from America and 56 materials came from 17 provinces of China. The number and sources were shown in Table S1.

Genome SSR identification and development in Cucurbitaceae
The genome information of watermelon, melon, cucumber and pumpkin was downloaded from http://cucurbitgenomics.org/. To develop a set of higher polymorphism SSR primers for the future study, the criteria used for microsatellite identification in this study was from 2 to 8 bp, and mononucleotides were not considered due to the difficulty in distinguishing bona fide microsatellites from sequencing or assembly error. The Microsatellite Identification tool (MISA) was used to identify and analyze SSR markers, including perfect and compound microsatellites.
Specific screening details were as follows: repeats with a minimum length of 18 bp

In silico PCR and synteny analysis of SSR markers in Cucurbitaceae
Page 16 Using the zucchini (C. pepo MU-CU-16) genome SSR markers as a reference, we comparatively analyzed the genome SSR information of cucumber (Gy14), melon (DH92), watermelon (97103), C. moschata cv. Rifu and C. maxima cv. Rimu, respectively. This was performed with a custom Perl script that used the NCBI BLASTN program as a search engine with expect value of 10 and filtering. We allowed up to five nucleotide mismatches at the 5'end of the primer but no mismatches at the 3'end, and a minimum of 90% overall match homology. To establish the syntenic relationships of chromosomes between C.pepo and C. sativus, C. lanatus, C. melo, C. maxima or C.moschata. We discarded the repeat SSR markers in different genomes, only kept the SSR markers in these genomes, which had single silico-PCR product. Besides, these shared SSR markers located on the chromosomal unanchored scaffolds were further filtered. The SSR marker-based syntenic relationships were finally visualized with visualization blocks in Circos software v-0.55 [53].

Genomic DNA extraction, PCR amplification and electrophoresis detection
Genomic DNA of all the materials was extracted using 1 g young leaf sample with the CTAB method. The extracted DNA was dissolved in 1×TE buffer. The concentration and purity were detected by Nanodrop-2000 nucleic acid analyzer. The extracted DNA was diluted to 30 ng/μl as a working solution, and kept at 4 ℃.
Each PCR reaction contained 1 μl of template DNA, 0.5 μl each of forward and reverse primers, 5 μl mastermix and 3 μl ddH2O. The amplification was carried out as follows: an initial denaturing step at 95 ℃ for 5 min, 94 ℃ for 30 s, followed by 6 cycles of 68-58 ℃ for 45 s, each cycle was reduced by 2 ℃, each annealing time was 1 min, and 72 ℃ for 1 min; 30 cycles of 94 ℃ for 30 s, 50 ℃ for 30 s, and 72 ℃ for 1 min; In the last cycle, primer extension was performed at 72 ℃ for 10 min. PCR products were analyzed by 9% polyacrylamide gel electrophoresis, and 100 bp DNA ladder was used as the reference marker. After electrophoresis, silver staining was performed to display the PCR products, and photos were taken for preservation.

Cluster calculating
The heterozygosity (He), observer gene number (Na), effective alleles (Ne), observed heterozygosity (Ho) and Shannon-Weaver index (I) were calculated by the pop-gen software. Polymorphism information content (PIC) of SSR markers was computed by EXCEL. When the PIC of a SSR marker was below 0.25, it was considered as a low polymorphism marker, and it was considered as a high polymorphism marker if its PIC was above 0.5.
These amplification bands of each SSR-PCR primer pairs were separated by polyacrylamide gel electrophoresis. The number of alleles stood by the total number of these bands in this SSR loci. In the same location, having band was marked as "1", no band was marked as "0", and the missing band was marked as "-1". In this paper, we used Genalex-6 software to conduct the matrix calculation of SSR marker data which had been assigned a value, then transformed it into triangle matrix, saved it as a mega-file, finally, imported the mega-file into the Mega-6.0 software, and selected UPGMA algorithm in "phylogeny" drop-down menu to draw the cluster diagram [54].
The software Structure V2.3 was used to analyze the population structure [55,56].
An admixture model and correlated allele frequencies was used to estimate the number of the populations. For each of the K-values (range from 1 to 5), ten independent runs were performed with a burn-in period of 100,000 followed by 500,000 Markov Chain Monte Carlo. The optimal K-values depends on the peak of K=mean (|Ln"P(D)|)/(sdLnP(D)). Based on the structure results, the most probable Kvalue was analysis by the Structure Harvester (http://taylor0.biology.ucla.edu/ struct_harvest/).

Declarations
Ethics approval and consent to participate