Development and characterization of Novel EST-based single-copy genic microsatellite DNA markers in white spruce and black spruce

Due mainly to large genome size and prevalence of repetitive sequences in the nuclear genome of spruce (Picea Mill.), it is very difficult to develop single-copy genomic microsatellite markers. We have developed and characterized 25 polymorphic, single-copy genic microsatellites from white spruce (Picea glauca (Moench) Voss) EST sequences and determined their informativeness in white spruce and black spruce (Picea mariana (Mill.) B.S.P.) and inheritance in black spruce. White spruce EST sequences from NCBI dbEST were searched for the presence of microsatellite repeats. Forty-seven sequences containing dinucleotide, trinucleotide, tetranucleotide and compound repeats were selected to develop primers. Twenty-five of the designed primer pairs yielded scorable amplicons, with single-locus patterns, and were characterized in 20 individuals each of white spruce and black spruce. All 25 microsatellites were polymorphic in white spruce and 24 in black spruce. The number of alleles at a locus ranged from two to 18, with a mean of 8.8 in white spruce, and from one to 17, with a mean of 7.6 in black spruce. The expected heterozygosity/polymorphic information content ranged from 0.10 to 0.92, with a mean of 0.67 in white spruce, and from 0 to 0.93, with a mean of 0.59 in black spruce. Microsatellites with dinucleotide and compound repeats were more informative than those with trinucleotide and tetranucleotide repeats. Eighteen microsatellite markers polymorphic between the parents of a black spruce controlled cross inherited in a single-locus Mendelian fashion. The microsatellite markers developed can be applied for various genetics, genomics, breeding, and conservation studies and applications.


Introduction
Microsatellites or simple sequence repeats (SSR) are codominant, highly polymorphic and reproducible molecular genetic markers, which have been widely used for various biological, breeding, forensics and conservation studies and applications. Microsatellite markers are especially suitable for determining neutral population genetic processes, and for DNA fingerprinting and forensics applications. Numerous methods based on microsatellite-enriched and non-enriched genomic libraries have been commonly used to isolate microsatellite-containing genomic sequences in plants and animals [1]. Expressed sequence tag (EST) sequences in the NCBI database and transcriptome sequences provide an important resource for identifying microsatellite sequences and developing cDNA-based genic microsatellite markers in plants e.g., [2,3], including conifer trees [4][5][6][7][8]. EST-derived microsatellite markers have several advantages, such as being genic markers, high prospects of being single-locus types, conserved across related species, and their ability to detect variation directly in the expressed genes.
White spruce (Picea glauca (Moench) Voss) and black spruce (Picea mariana (Mill.) B.S.P.) are widely distributed transcontinental tree species of the Canadian boreal forest and have high ecological and economic importance 1 3 [9]. There are several genetics, genomics, breeding and reforestation programs for these species. Highly informative microsatellite markers, especially genic microsatellites, could facilitate these programs. White spruce genome is large (~ 20 Gbp) and has repetitive sequences throughout the genome [10]. Black spruce probably has similarly large and repetitive genome as inferred from its nuclear DNA contents [11]. Such genome features make the development of single-locus and highly informative genomic microsatellite markers in white spruce and black spruce very difficult as evident from the availability of handful genomic microsatellite markers in these species. Nuclear genomic microsatellite markers have been developed in white spruce using microsatellite-enriched and non-enriched sequences [12,13], and in black spruce using microsatellite-enriched genomic libraries [14] and microsatellite-enriched genomic AFLP DNA fragments [15]. Among these methods, the AFLP-microsatellite method [15] was found to yield a higher number of highly informative single-copy microsatellite DNA markers. In addition, 25 cDNA-derived microsatellites have been developed from white spruce ESTs and characterized in several spruce species [5]. The total number of informative markers from the above studies is around 60. Thus, there is a great need to develop additional single-copy microsatellite markers, especially gene-based microsatellite markers, in white spruce and black spruce.
In the present study, we have developed and characterized 25 new cDNA-based genic microsatellite markers in white spruce and black spruce. We have evaluated informativeness of these markers in white spruce and black spruce and established their single-locus Mendelian inheritance in black spruce.

Expressed Sequenced Tags (ESTs) with microsatellites, primers and their sequences
White spruce EST sequences were downloaded from the NCBI dbEST. The sequences were searched for the presence of microsatellite repeat motifs using Repeatmasker (http://www. repea tmask er.org). Attempts were made to select sequences whose gene function was annotated. Forty-seven sequences containing dinucleotide, trinucleotide, tetranucleotide and complex repeats were selected (Tables 1, S1) for primer design. The number of sequences containing dinucleotide, trinucleotide, tetranucleotide, and complex repeats was 22, 14, 3, and 8, respectively (Tables 1, S1). The TA repeat was the most abundant among the dinucleotide microsatellites. One sequence (RPGSE45) contained microsatellite repeats at two different places and two primer pairs were designed from this EST sequence. The primers for 48 microsatellite sequences were designed using Primer 3 program [16]. However, only one of the two primer pairs developed from the RPGSE45 sequence was used. The information on the source of the EST sequences, GenBank accessions, primer sequences and gene annotations are provided in Tables 1, S1. For each microsatellite marker, the M13 universal primer sequence labeled with either 700 or 800 IRDye™ (Li-Cor, Lincoln, Nebraska, USA), was added to the forward primer for detection of amplicons on Li-Cor 4200 automated sequencer (Li-Cor, Lincoln, Nebraska, USA).

PCR amplification, primer optimization and microsatellites visualization
PCR amplification, microsatellite primer testing, and primer optimization were carried out using four black spruce individuals, including two parents of a genetic mapping population. DNA was extracted from these samples using a Qiagen DNeasy Plant mini kit (Qiagen Canada, Mississauga, Ontario, Canada) and then prepared for downstream analysis.
Polymerase chain reactions (PCRs) were performed in a 10 μl mixture containing 10-20 ng of purified spruce DNA, 2 μg BSA, 1 × Taq buffer, 0.2 mM of each dNTP, 0.025U of Taq DNA polymerase (MBI Fermentas, Waltham, Massachusetts, USA), 0.1 μM of reverse primer, 0.1 nM of M13 forward primer labeled with either 700 or 800 IRDye™ (Li-Cor, Lincoln, Nebraska, USA), and 1.5 mM MgCl 2 . All PCRs were carried out using a PTC-200, Peltier Thermal Cycler (MJ Research, Waltham, Massachusetts, USA). Each PCR cycle had the following conditions: initial denaturation at 94 °C for 5 min followed by 39 cycles each of 94 °C for 30 s, varying annealing temperatures ranging from 50 to 65 °C (determined using a 12-column gradient) for 30 s, 72 °C for 1 min, and a final extension at 72 °C for 6 min.
Amplification products were denatured and electrophoresed through a 6.5% polyacrylamide gel run on a Li-Cor sequencing system IR 4200 (Li-Cor, Lincoln, NE, USA) with IRDye 700 and 800 labelled size standards (IR 50-350 bp). Each gel was run for 2.5 h with the run parameters of 1400 V, 40 mA, 40 W, 50 °C plate temperature, and 16-pixel depths for collection of TIFF image files. The genotypes of individuals were scored manually, and the size of alleles was determined by using SAGA Generation 2 software (Li-Cor, Lincoln, Nebraska, USA). Alleles at a microsatellite locus were identified and named by their molecular size.

Informativeness, characterization and inheritance of microsatellite loci
We evaluated the informativeness of 25 microsatellite loci which could be amplified and resolved by genotyping 20 where Pi is the frequency of the ith allele in 20 individuals of a species examined. Thus, PIC and H E calculations and estimates are the same. Thirteen white spruce samples originated from seven populations located at five sites in Saskatchewan: Scales Lake, Timber Cove, Snowfield Road, Prairie River, and Tee Pee Creek (details are provided in [18,19]). Seven white spruce samples originated from seven sub-populations (compartments) of the EMEND (Ecosystem Management Emulating Natural Disturbance) project located in northern Alberta approximately 90 km northwest of Peace River [20]. The black spruce samples originated from two populations at Pine Falls, four populations at Bissett, one population at The Pas and four populations at Snow Lake sites in Manitoba, Canada (described in [21]). Inheritance of the polymorphic microsatellite markers was examined in a controlled cross from a three-generation outbred pedigree of black spruce (described in [22]), consisting of the parents and their 90 progenies. χ 2 goodness-of-fit tests were performed to test the inheritance and segregation patterns of the microsatellite DNA markers.

ESTs, microsatellite repeat motifs and microsatellite marker resolution
Twenty-two of the 47 selected EST sequences contained dinucleotide repeats, and 14 trinucleotide, three tetranucleotide and eight compound repeats. The most common repeat motif was TA, which accounted for 20 of the 22 dinucleotide-repeat and three of the eight compound-repeat microsatellites (Tables 1, S1). From the 47 primer pairs, 30 amplified DNA and 25 yielded clear single-locus microsatellite variant patterns in black spruce and white spruce. These included nine dinucleotide, eight trinucleotide, three tetranucleotide and five compound-repeat microsatellites ( Table 1). Seventeen primer pairs did not amply any DNA amplicon and five produced complex patterns (Table S1). The source ESTs of 19 microsatellites were annotated to specific genes, whereas the source ESTs of three microsatellites were annotated to hypothetical proteins and that of three were not functionally annotated ( Table 1). The optimized annealing temperature for different primer pairs varied from 50 to 63.9 °C (Table 1).

Microsatellite marker informativeness in white spruce and black spruce
In white spruce individuals, all 25 microsatellite loci were polymorphic; however, there was significant variation among the loci for polymorphism and informativeness ( Table 2). The number of alleles ranged from two to 18, with a mean of 8.8, observed heterozygosity ranged from 0 to 0.95 with a mean of 0.49, and PIC/expected heterozygosity ranged from 0.10 to 0.92, with a mean of 0.67. The most informative loci were RPGSE48 and RPGSE43 (the loci with a simple and compound TA repeats), and the least informative locus was RPGSE17 with a trinucleotide repeat ( Table 2). Twentyfour of the 25 microsatellite loci were polymorphic in black spruce individuals. The trinucleotide repeat locus RPGSE08 was monomorphic. There was significant variation among the loci for polymorphism and informativeness in black spruce ( Table 2). The number of alleles ranged from one to 17, with a mean of 7.6, observed heterozygosity ranged from 0 to 0.90, with a mean of 0.43, and PIC/H E ranged from 0.10 to 0.93, with a mean of 0.59 ( Table 2). The most informative locus was RPGSE34, a locus with a simple TA repeat, and the least informative locus besides the monomorphic locus RPGSE08 was RPGSE17 with a trinucleotide repeat ( Table 2). In both white spruce and black spruce, significant differences (P < 0.05) in the number of alleles and H E /PIC were observed among microsatellite loci classes with different repeat types (di, tri, tetra and compound). Microsatellite loci with dinucleotide and compound repeats had significantly higher informativeness for all three parameters (A, H O and H E ) than the microsatellite loci with trinucleotide and tetranucleotide repeats ( Table 2). The trinucleotide microsatellite loci were slightly but not significantly more informative than the tetranucleotide microsatellite loci for one or two of the three parameters (Table 2). Each white spruce or black spruce individuals could be uniquely fingerprinted by their two-locus microsatellite genotypes. For the combined sample of white spruce and black spruce individuals, although the informativeness for microsatellite loci was higher than for individual species samples, the informativeness patterns were similar to those observed from individual species samples ( Table 2).

Inheritance of microsatellite markers in black spruce
Eighteen microsatellite loci were polymorphic between the parents of the black spruce controlled cross (Table 3). At each of the 18 loci, the progenies of the cross segregated into two to four microsatellite genotypic classes as expected for single-locus Mendelian inheritance patterns (Table 3; Figure  S1 for RPGSE35).

Discussion
We have developed highly informative single-copy cDNAbased genic microsatellite DNA markers in white spruce and black spruce. From the screening of partial white spruce EST database, it was apparent that the dinucleotide TA repeat motif is most common in the white spruce EST sequences set that we searched. This result is consistent with the similar observation in loblolly pine (Pinus taeda) ESTs [6]. However, AT repeat motif was found to be most abundant in the Genome BC Forestry spruce EST database [5] and AG repeat motif in Norway spruce cDNA clones [4]. We establish that 24 of the 25 genic microsatellite markers we characterized in this study are informative in white Table 3 Chi-square goodness-of-fit tests for segregation of microsatellite genotypes in the progeny obtained from a three-generation outbred pedigree in black spruce n.s. non significant, s.* significant at 5% level of significance spruce and black spruce. Of these, 11 (RPGSE24, RPGSE25, RPGSE29, RPGSE34, RPGSE35, RPGSE40, RPGSE41, RPGSE243, RPGSE45, RPGSE46, and RPGSE48) were highly informative loci in white spruce, each with over 10 alleles and H E /PIC close to 0.90. In black spruce, 9 loci (RPGSE02, RPGSE22, RPGSE25, RPGSE29, RPGSE34, RPGSE35, RPGSE43, RPGSE46, and RPGSE48) with 10 or more alleles at a locus were highly informative.
Our results reveal that microsatellite loci with dinucleotide perfect and dinucleotide compound repeats are more informative than microsatellite loci with trinucleotide or tetranucleotide repeats. These results are consistent with the general view and actual observations (e.g., [23]) that microsatellites with trinucleotide and tetranucleotide repeats have lower polymorphism than dinucleotide repeats. In contrast, previously, we found trinucleotide repeat microsatellite loci to be highly polymorphic in Populus tremuloides [24,25]. Also, there is a general view and experimental evidence that compound-repeat microsatellite loci display lower variation than simple-repeat microsatellite loci (e.g., [24,25]). However, our study demonstrates that microsatellite loci with compound repeats (RPGSE02, RPGSE24, RPGSE43, RPGSE45, and RPGSE46) are highly polymorphic in both white spruce and black spruce ( Table 2).
Gene sequences potentially have lower mutation rates, slow evolution and selection constraints, thus higher conservation among individuals within a species and across the related species. Due to these features, EST-derived microsatellite markers are expected to show lower polymorphism than genomic microsatellites. This was somewhat true for the EST-based markers developed in this study with regard to the number of alleles, which were lower than that observed from genomic microsatellites in white spruce ( [12,13]; Table S2) and black spruce ( [15]; Table S2). However, allelic diversity in our genic microsatellites was much higher than that reported for genomic microsatellites by Dobrezeniecka et al. [14] for black spruce (Table S2). Informativeness of the white spruce EST-based genic markers in our study is similar to that previously reported for white spruce EST-based markers in these two species ( [5]; Table S2). The average number of alleles per locus and observed and expected heterozygosity for eight trinucleotide-repeat microsatellite markers in our study is higher than that reported for EST-derived trinucleotide microsatellite markers in Norway spruce (Picea abies) ( [7]; Table S2).
Up to 18 of the 20 (90%) black spruce and 17 of the 20 (85%) white spruce individuals could be uniquely fingerprinted from their single-locus microsatellite genotypes. All individuals of white spruce or black spruce could be uniquely fingerprinted by their genotypes at two microsatellite loci. These results clearly show that the SSR markers reported in this manuscript could provide an excellent molecular tool for fingerprinting and identification of white spruce and black spruce individuals and clones.
Our study demonstrates that the microsatellite variants at each of the 18 loci, polymorphic between the parents of the controlled cross, are controlled by a single nuclear locus because they conformed to single-locus inheritance patterns (Table 3). Significant deviation from the expected 1:1 segregation ratio for RPGSE05 could be by chance or some biological mechanisms. Segregation distortion has been commonly observed in backcrosses and F 2 progeny of plants [26]. The black spruce cross that we used was from a three-generation outbred pedigree, equivalent to a F 2 cross.
Several features of the novel genic microsatellite markers that we have developed, such as single-copy, high informativeness and single-locus co-dominant inheritance patterns, makes them highly suitable for various applications in genetics, genomics, DNA fingerprinting and tree forensic, breeding, and genetic resource conservation programs in white spruce and black spruce. Five of these markers have already been successfully used in white spruce population genetics studies [19,20], and 16 of these markers have been genetically mapped on seven linkage groups in black spruce [22]. The microsatellite markers reported here could potentially be used in other spruce species.

Conclusions
We have developed 25 new reliable and highly informative single-copy genic microsatellites in white spruce and black spruce from white spruce EST sequences. Microsatellites with dinucleotide and compound repeats were more informative than microsatellites with trinucleotide and tetranucleotide repeats. The markers inherited co-dominantly in a single-locus Mendelian pattern. The new set of genic microsatellite markers provides an excellent genetic/genomic resource that could be used for diverse biological, genetics, DNA fingerprinting, genomics, tree breeding, and genetic conservation studies and applications in white spruce, black spruce and possibly other spruce species.