Development of Novel Coconut SSR Markers Derived From Genome-Wide Bioinformatics Prediction

In the past, simple sequence repeat (SSR) marker development in coconut is achieved through microsatellite probing in bacterial articial chromosome (BAC) clones or using previously developed SSR markers from closely related genomes. These coconut SSR markers are publicly available in published literatures and online databases; however, the number is quite limited. Here, we used a locally established, coconut genome-wide SSR prediction bioinformatics pipeline to generate a vast amount of coconut SSR markers. A total of 7,139 novel SSR markers were derived from the genome assembly of coconut ‘Catigan Green Dwarf’ (CATD). A subset of the markers, amounting to 131, were selected for synthesis based on motif ltering, contig distribution, product size exclusion, and success of in silico PCR in the CATD genome assembly. OligoAnalyzer-tool was also employed using the following desired parameters: %GC: 40–60%; minimum ΔG value for hairpin loop: -0.3 kcal/mol; minimum ΔG value for self-dimer: -0.9 kcal/mol; and minimum ΔG value for hetero-dimer: -0.9 kcal/mol. We have successfully synthesized, optimized, and amplied 131 novel SSR markers in coconut using ‘Catigan Green Dwarf’ (CATD), ‘Laguna Tall’ (LAGT), ‘West African Tall’ (WAT), and SYNVAR (LAGT x WAT) genotypes. Of the 131 SSR markers, 113 were polymorphic among the analyzed coconut genotypes. The development of novel SSR markers for coconut will serve as a valuable resource for mapping of quantitative trait loci (QTLs), assessment of genetic diversity and population structure, hybridity testing, and other marker-assisted plant breeding applications.


Introduction
Coconut (Cocos nucifera L.) is one of the most economically important crops in the Philippines. In 2017, the country produced 14.05 million metric tons of coconut and the value of production hit 120.3 million pesos (PSA 2018). The Philippines remained to be the top global supplier of coconut copra and desiccated coconut in both volume and total USD value as of 2010 (FAOSTAT 2013). Coconut oil, one of the many diversi ed products of coconut, ranked rst among the top ten agricultural exports of the Philippines comprising 21.9 percent of the total agricultural exports in 2015 (PSA 2017).
Coconut is situated across the tropical and subtropical latitudes that are accessible to the equatorial Paci c Ocean current which possibly favored the evolution and dispersal of coconut. Coconut palms thrive well in humid coastal environments at about 18 degrees of latitude north or south of the equator where there is fertile soil, favorable temperature, and year-round rainfall (Foale 2003). Coconut belongs to the Indian center (II), and Indo-Malayan sub-center (II-A, where the Philippines belongs) in Vavilov's center of origin of cultivated plants (Vavilov 1926). It is generally classi ed into two types: tall and dwarf. The tall types are generally allogamous (heterozygous) or cross-pollinating, slow to mature; ower at 6-10 years with an economic life of 60-70 years. Dwarf types, on the other hand, are highly autogamous (homozygous), or mainly self-pollinating, early to ower at around 4-6 years after planting with a productive life of 30-40 years (Harries 1978;Meerow et al. 2012;Batugal et al. 2009).
Coconut is a diploid with 32 chromosomes (2n= 2X = 32). It belongs to the family Arecaceae (Palmaceae) in the subfamily Cocoideae and is the lone species of genus Cocos (Perera et al. 2007). The estimated genome size of coconut is approximately 2.6 Gbp comprising of 50-70% repetitive sequences (Alsaihati et al. 2014). Lantican et al. (2019) reported the estimated genome size of 'CATD' to be 2.14 Gbp. The abundance of repeat contents in the coconut genome becomes advantageous in the assessment and characterization of coconut varieties/populations using molecular marker techniques. The use of molecular tools offers a more accurate assessment than the conventional way of characterizing coconut which is through morphological and agronomical traits that are mostly in uenced by many environmental factors (Perera et al. 2003).
Molecular markers have established its importance as a modern breeding tool for crop improvement (Xu and Crouch 2008; Kesawat and Das 2009; Sindhumole and Ambili 2011). The use of molecular tools can signi cantly accelerate the overall duration of breeding programs for coconut improvement. One of the extensively used markers in molecular breeding and genetic diversity analyses is the simple sequence repeats (SSR). SSRs are short tandem repeats that have repeating units of di-, tri-, tetra-and pentanucleotides (Powell et al. 1996). They are approximately 1-8 bp long, abundant, and well distributed throughout the genome on which repeat units can vary between genotypes/individuals which make it a very useful tool in ngerprinting, genotyping and genetic diversity analyses (Sharma et al. 2008).
In the past, SSR marker development in coconut was achieved through microsatellite probing in bacterial arti cial chromosome (BAC) clones or using previously developed SSR markers from closely related genomes (Rivera et al. 1999;Perera et al. 2003). These coconut SSR markers are publicly available; however, the number and distribution across chromosomes is quite limited for quantitative trait loci (QTL) mapping and genetic diversity studies. Fortunately, with the current advancements in next-generation sequencing (NGS) technologies, it has now become possible to mine SSRs across the entire genome. By using genome-wide bioinformatics prediction, we can generate a vast amount of SSR markers e ciently.
This study aims to provide a valuable resource of SSR markers for potential use in marker assisted selection breeding for coconut.

Genomic DNA extraction of Coconut Parental Genotypes
Genomic DNA extraction. A total of eight (8) individuals/palms of the coconut genotypes (Table 1) were collected and extracted with genomic DNA following the procedure adapted from Doyle and Doyle (1990) with modi cations. DNA quality and yield were determined by electrophoresis in 1% UltraPure™ agarose (Invitrogen Corp., Carlsbad, California, USA) in 1× Tris-borate EDTA (TBE) running buffer at 100 V for 40 min, 0.5 ug mL -1 ethidium bromide staining, and UV illumination at 300 nm using the Enduro GDS Touch Imaging System (Labnet International, Inc, Edison, New Jersey, USA). DNA concentration was estimated by visual comparison of the intensity of uorescence with known concentrations of lambda (λ) DNA molecular weight standards (Sigma-Aldrich Inc., St. Louis, Missouri, USA).

Development of SSR markers using the genome assembly of coconut 'Catigan Green Dwarf' (CATD)
A set of 7,139 novel SSR markers was previously automatically generated based on the SSR loci annotation of the genome assembly of coconut 'Catigan Green Dwarf' (CATD) using GMATA software package (Wang and Wang 2016; Lantican et al. 2019). Given the vast amount of the predicted SSR markers, selection criteria were employed to obtain high-quality markers for eventual use in coconut genotyping. Motif ltering, contig distribution, and product size exclusion were used to further lter the predicted markers by manual checking. Markers with AT/AT and TA/TA repeat motifs were excluded in the selection. In silico PCR in the 'CATD' genome assembly (Lantican et al. 2019) was then performed to ensure in vitro SSR ampli cation prior to synthesis (Rotmistrovsky et al. 2004). OligoAnalyzer-tool (Integrated DNA Technologies, Inc., Coralville, Iowa) was also employed using the following desired parameters: %GC: 40-60%; minimum ΔG value for hairpin loop: -0.3 kcal/mol; minimum ΔG value for selfdimer: -0.9 kcal/mol; and minimum ΔG value for hetero-dimer: -0.9 kcal/mol for further ltering of the SSR markers.

Results And Discussion
Local bioinformatics pipeline produces high-quality SSR markers A vast amount of coconut SSR markers amounting to 7,139 was previously generated using a locally ) on which the high repeat content may hinder speci city of the markers and/or may result to non-speci c ampli cation of products. Markers were also selected based on the distribution in the contig to cover the entire coconut genome. In silico PCR in the CATD genome assembly was performed. This allows checking of contig speci city of the marker and ensures in vitro SSR ampli cation (Rotmistrovsky et al. 2004). Product size range of the markers was also limit to 80-400bp for easy visualization in gel and OligoAnalyzer-tool was used to check dimerization capability and formation of hairpin-loop of the primers to produce high-quality markers.
A total of 131 SSR markers were synthesized and 98% of these were comprised by dinucleotide repeats (or 2-mer) while the remaining 1 and 1% were composed of trinucleotide and tetranucleotide repeats, respectively, as shown in Figure 1.  previous studies on which high levels of polymorphism are likely attributed to phenotypic variation and differences in the breeding behaviors of the dwarf and tall varieties which are said to be generally autogamous (self-pollinating) and allogamous (cross-pollinating), respectively (Perera et al. 1999;Rivera et al. 1999;Teulat et al. 2000). Representative gels of polymorphic SSR markers optimized among coconut genotypes are presented in Figure 2 on which distinct and good ampli cation patterns were observed. Also, majority of the polymorphic markers have AG repeats (29%) and GA repeat motif (19%) as shown in Figure 3. The product size of these markers ranged from 130 to 690 bp. The summary of the characteristics of the selected SSR markers are presented in Table 2 which includes the name of marker, annealing temperature, repeat motif, contig distribution, product size range and number of alleles. Microsatellites or SSR markers are a very useful molecular tool for studying genetic diversity and genotyping of coconut (Lebrun et al. 1998;Perera et al. 1998;Perera et al. 2003;Konan et al. 2007;Xiao et al. 2013). It has been extensively used in these analyses since SSR markers are abundant and well distributed throughout the genome, multi-allelic, co-dominant, highly polymorphic, and highly reproducible Here, we demonstrated that a locally established bioinformatics pipeline can mine SSR markers from NGS data with actual utility in terms of ampli cation and distinguishing power across several varieties of coconut. The advantage of using a genome-wide bioinformatics prediction approach in marker development is its relatively fast and cost-effective way of generating vast amounts of markers (Anderson and Lubberstedt 2003;Gupta et al. 2013). SSRs and SNPs can be easily generated automatically in the genome sequences with the use of these programs or pipelines (Ching et al. 2002;Lantican et al. 2019).
Polymorphic markers in this study will be further used to genotype the coconut mapping population generated from a three-way cross of 'Paci c' LAGT and CATD, and 'Indo-Atlantic' WAT coconut for QTL mapping analysis. The development of novel SSR markers for coconut will serve as a valuable resource for mapping QTLs, assessment of genetic diversity and population structure, hybridity testing, and other marker-assisted plant breeding applications.

Figure 1
Percentage of repeat motifs of the selected SSR markers.

Figure 2
Representative gels of polymorphic SSR markers optimized among coconut genotypes.