Molecular sexual determinants in Pistacia genus by KASP assay

Pistacia is a genus of dioecious plant species whose trees can take 4–5 years to reach the economically valuable fruit-bearing stage. The fruits have great importance as raw material in the food, healthcare, and baking industries. For that reason, the identification of individual plants in the early juvenile period for the pollination and positioning of trees is crucial for growers. The objective of this study is to develop markers for each Pistacia species that can help in screening the sex of plant seedlings before they reach the reproductive stage, without waiting for morphological characteristics to appear. Within this context, by using the power of the kompetitive allele-specific PCR (KASP) assay technology as a marker screening system, we successfully discriminated seven out of eight Pistacia species: P. atlantica, P. integerrima, P. khinjuk, P. mutica, P. terebinthus, P. vera, and P. lentiscus. We used a high-throughput DNA sequence read archive (SRA) to assemble a reference genome that was employed in our studies as a de novo bioinformatics method. Four genomic regions from SRA and three single-nucleotide polymorphism (SNP) positions from Kafkas et al. BMC Genomics 16:98, 2015) were selected and sequenced with collected plant material from predominantly the Antepfıstıgı Research Institute Collection Garden, and eight species were aligned intraspecifically for SNP mining. In total, 12 SNP markers were converted to KASP markers, and 5 of them (SNP-PIS-133396, SNP-PIS-167992, P-ATL-91951–565, P-INT-91951–256, P-KHI-91951–115) showed clear allelic discrimination between male and female plants. SNP-PIS-167992 and P-ATL-91951–565 were identified as the best marker assays because they showed allelic frequency differences for all individuals and for both homozygous and heterozygous characters. These markers could be the most comprehensive ones for the whole genus because they showed discriminative power for several species. This study is the first one to use the KASP assay for sex discrimination in Pistacia species, and it can be regarded as a precursor study for sex discrimination by KASP for plants in general.


Introduction
Pistacia belongs to the Anacardiaceae family [1], and the genus has increasing economic value [2] for the food, drug [3], healthcare [4], and baking industries. The primary producers of Pistacia vera are, in order of importance, the USA, Iran, Turkey, and China; however, Turkey's yield is 2 times higher than the worldwide yield [5]. Pistacia species have different types of leaves, such as membranous, leather, deciduous, evergreen, stipule, and trifoliate leaves. The most known characteristics of the species are that they have unisexual flowers and a homeochlamydic perianth and that they are dioecious and wind-pollinated (anemophilous) [6].
One of the essential and common characteristics of the Pistacia genus is sex differences between individuals. These are the anatomical and structural traits of male and female individuals. Male (♂ staminate) and female (♀ pistillate) flowers occur on different individuals in dioecious plants [7]; therefore, fruit trees require the proximity of trees of the opposite sex to produce fruits, so identifying the sex of an individual plant is crucial for producers. Marker-assisted selection (MAS) is a promising process and may help to screen the sex and many other traits of plant seedlings before they reach the reproduction stage or show morphological characteristics in several years' time [8].

3
Molecular markers are nucleic acid (DNA or RNA) fragments from any gene region or associated with genes in the genome. These markers can be used for a variety of purposes, ranging from creating the genetic and character maps of families to identifying and defining species [9]. Each molecular marker has its advantages and disadvantages [10], so researchers can choose the best marker for a particular study. Some studies also used a combination of different molecular markers to interpret the genetic diversity between species [11].
A single-nucleotide polymorphism (SNP) is a single point mutation in the genome of an individual. There are two main advantages of the use of SNPs: their abundance in genomes (e.g., 1/36 bp [1 SNP per 36 base pairs] for Arabidopsis, 1/21 bp for potato, and 1/78 bp for barley) and the availability of high-throughput SNP genotyping technologies [12]. SNP markers are used in many plant species and on many platforms, such as quantitative trait locus (QTL) analysis in rice [13] and nested association mapping (NAM) in maize for studying the complex trait flowering time [14] or for finding gene-specific diagnostic markers against fungal pathogens such as leaf rust, stripe rust, and powdery mildew [15]. In addition, high-throughput SNP genotyping could be performed with fixed-array or genotyping-by-sequencing (GBS) technologies. Although fixed-array methods have been used for large numbers of samples, their higher cost is always a limiting factor. Flexible technologies such as TaqMan®, Douglas Array Tape, and KASP™ are preferable in certain fields, such as agriculture or personal medicine [16].
Kompetitive allele-specific PCR (KASP) is a PCR-based fluorescently labeled marker scanning system developed by KBioscience. The low-cost KASP assay platform is defined as breeder friendly [17], time effective, novel, and more reliable for routine screening than other similar technologies, such as TaqMan assay and Sanger sequencing [18]. KASP is a flexible genotyping assay that can be used in small research laboratories to obtain high-throughput large-scale results [19]. In addition, according to Semagn's review in 2014, KASP is 12.3-46.1% less expensive than GoldenGate or BeadXpress platforms. Via this uniplex SNP genotyping [20], one marker can be run and scanned in real time by fluorescent plate readers. The KASP assay is based on fluorescence resonance energy transfer (FRET) cassettes in combination with a PCR system. FRET cassettes originated from the fluorescent-tagged dyes FAM (6-carboxyfluorescein) and HEX (hexa-chloro-fluorescein) with specific oligo tails and their complementary quencher molecules to stop signal generation before matching. The preparation of two unlabeled allele-specific primers is another key point of the reaction. The quenchers are complementary to the extension parts of our primers. At the end of the PCR cycles, the used primer or primers are defined by signal frequency [21]. The KASP technology has a wide range of usage in different organisms, and it can be easily employed for genotyping at relatively low cost [19]. Cacao [22], apple [23,24], soybean [25,26], grain amaranth [27], ash tree [28], robus [29], dentrobium [30], watermelon [31], peanut [32,33], maize [20], rice [34], and wheat [17,35,36] are some of the plant species in which the KASP assay has been successfully applied for different purposes, such as breeding or identifying genetic diversity. The goal of this study is to develop KASP markers to for determining the sex of Pistacia species.

Plant material and genomic DNA extraction
Sixteen samples (8 male and 8 female) from 8 Pistacia species (Pistacia atlantica, Pistacia integerrima, Pistacia khinjuk, Pistacia mutica, Pistacia palaestina, Pistacia terebinthus, Pistacia vera and Pistacia lentiscus) were used as plant material that were collected from Antepfıstığı Research Institute Collection Garden (Gaziantep, Turkey) and from Çeşme Çiftlikköy region (Pistacia lentiscus L.) in İzmir, Turkey. CTAB method was used for DNA isolation [37]. Each of the extracted genomic DNA sample was quantified with the Qubit 2.0 fluorometer according to the manufacturer's instructions and evaluated by loading of DNA based on Qubit measurements on a 0.8% agarose gel electrophoresis. DNA bulks of 10 individuals representing each species and gender were created and DNA bulk samples were diluted to 2 ng µL-1. The plan of the study was designed as shown in the chart (Fig. 1) and practiced step by step.

SNP mining
In order to establish a broad range of molecular markers for sex discrimination in Pistacia species, we focus on the genomic regions shown to be involved in sex determination. We have utilized two sets of WGS reads (accession numbers: SRX2270199 and SRX2270198) that were available in the high-throughput DNA and RNA SRA database of the National Center for Biotechnology Information (NCBI). These reads were assembled by de novo assembly (Appendix 1) and used for examining the genomic areas of interest in order to develop further molecular markers for the sex discrimination of other Pistacia species. The data were subjected to quality control by the FastQC tool; the used adapter sequences and bad quality reads (Phred score < 30) were removed. SOAPdenovo Assembler (v2.01) was utilized to assemble the given WGS data to assemble the reads of long high-quality contigs and/ or scaffolds. The assembly tool consists of six modules or stages that run during the assembly, including the construction of a de Bruijn graph (DBG), paired-end (PE) read mapping, contig assembly, read error correction, gap closure, and scaffold construction. The process began with the use of preprocessed and quality-threshold reads, followed by the construction of a k-mer dictionary, generally with a k-mer length of 25. The default assembly parameters were used, with a default k-mer length for the creation of graphs. In order to obtain at least one confident assembly, three sets of scaffolds have been assembled: (i) using only PE 90 base reads, (ii) using only PE 150 base reads, and (iii) using both PE 90 and 150 reads. Depending on the data used, the k-mer size was estimated using an intermediate tool called KmerGenie. Using the estimated k-mer value, a large k-mer dictionary was created for the data. Within these k-mers, the tool selects the most frequent k-mer in the dictionary to seed a contig assembly, and this seed is then extended in each direction by finding the highest kmer with a k-1 overlap. Care was taken to remove any k-mer that was used for extension, ensuring uniquely assembled concatenating contigs. The extension procedure, in both directions, was performed until it could not be extended any further, and then the longest linear contig was reported. These steps were repeated with the next most highly abundant k-mer until the entire k-mer dictionary was exhausted. Finally, a Contigs.fasta file was created, including all the assembled contigs.
Corresponding contig statistics were also generated. The assembled contigs were further optimized by closing the gaps even further and increasing the coverage and length of the assemblies, generating high-quality scaffolds. As for the contigs, quality statistics were also generated for the scaffolds and provided in Fig. 2.

Sanger sequencing
The sequencing analysis has been performed by LGC Genomics GmbH Laboratory. Four target sites (Appendix 1) were amplified in 16 Pistacia bulk samples (8 male and 8 female) with specific primers (Table 1) to verify DNA sequence variants (substitutions or indels) detected by de novo sequencing at LGC Genomics GmbH Laboratory. Sequencing runs were performed on a 3730xl DNA Analyzer (Applied Biosystems™/ Thermo Fisher Scientific). PCR clean-up was performed with ExoSAP-IT™ PCR Product Cleanup Reagent (Thermo Fisher Scientific) according to the manufacturer's protocol.

Multiple alignment and new SNP's
By alignment of four regions (REG-PIS-136404, REG-PIS-174431, REG-PIS-133396, REG-PIS-167992), we selected 9 SNP among polymorphisms. For each species, female and male individuals were grouped and overlapped via Geneious Prime Program to identify gender-based SNP differences. For this program, Pairwise to Geneious / Multiple Alignment and alignment tool were utilized as a kind of global alignment features with free end gaps, 65% of which was chosen as a cost matrix. Nine selected SNPs were transformed into KASP assay to test for new functional markers (Table 2). Also, three SNPs positions were selected (SNP-PIS-133396, SNP-PISTACIA-176863-SNP-PISTA-CIA-167992) from Kafkas et al. [38] according KASP primer design principles. As a result, with 9 SNPs from de novo assembly, in total 12 SNPs has been translated into the KASP primer.

KASP (Kompetitive Allele Specific PCR)
The KASP assay was performed in 20 µL reaction volume consisting of 10 µL KASP master mix, 0.28 µL KASP primer assay mix and 10 µL of 2 ng µL-1 bulk DNA template. Thermocycling was performed in a ABI 9700 realtime with the following LGC standard 61-55 °C touchdown

Sanger sequencing and SNP sex association
Sequence analyses were performed with 75% success, and quality sequences between a minimum of 250 bp and a maximum of 1142 bp were selected. After the alignment of male and female samples for each species, SNPs were determined between the sexes (Fig. 3) for use in the KASP analysis. Consensus sequences are shown in Appendix 2.

Verification of sex association
In this study, 12 SNP markers were converted to KASP markers ( Table 2) for use in the assays. Five of them (SNP-PIS-133396, SNP-PIS-167992, P-ATL-91951-565, P-INT-91951-256, P-KHIN-91951-115) were discriminative for sex in seven out of eight Pistacia species, namely P. atlantica, P. integerrima, P. khinjuk, P. mutica, P. terebinthus, P. vera, and P. lentiscus (Fig. 4). According to intraspecific considerations, the SNP-PIS-167992 (A/T) marker showed female/male distinction for three species, which are P. vera, P. atlantica, and P. terebinthus. While the male samples showed a homozygous allele 2 character (T/T) for this assay and these species, the signal frequencies of the female samples indicated a heterozygote allele 1/allele 2 (A/T) feature, as shown in Fig. 4a, e. The marker P-INT-91951-256 had a discriminative result for P. integerrima, as shown in Fig. 4b. When the whole genus was examined, one of the best assay results was obtained with P-ATL-91951-565. The assay separated male and female samples of four species (P. atlantica, P. khinjuk, P. mutica, and P. terebinthus) in the Pistacia genus intraspecifically (Fig. 4c, d). The SNP-PIS-133396 marker had a discriminative result for P. vera (Fig. 4f). P-KHIN-91951-115 was the fifth marker that discriminated more than one species. The assay showed female/male distinction for three species: P. khinjuk, P. atlantica, and P. lentiscus (Fig. 4g).
Separation of male and female samples of P. atlantica was obtained with three different SNP assays: SNP-PIS-167992, P-ATL-91951-565, and P-KHIN-91951-115. The best result among these reactions was obtained with SNP-PIS-167992 (Fig. 4a). According to this assay, the value of the male samples showed a homozygous (T/T) character, and the female samples showed a heterozygous (AT) character, as shown in Fig. 4a. The same assay reaction was also the best one for P. terebinthus (Fig. 4e). P. integerrima was discriminated only with the P-INT-91951-256 assay (Fig. 4b). The clearest results for P. khinjuk and P. mutica were obtained with the P-ATL-91951-565 T/C base-pair substitution assay (Fig. 4c, d). With this assay, while both P. khinjuk and P. mutica males showed a homozygous (T/T) allele 1 genotype, their females had a heterozygous (T/C) feature. Analysis with SNP-PIS-133396 containing the C/G base-pair substitution revealed that P. vera females were heterozygous with respect to the mean value of allele frequencies, while all the individuals in the genus, including P. vera males, exhibited a homozygous allele 2 (G/G) character, as shown in Fig. 4f. P. lentiscus is discriminated only with the P-KHIN-91951-115 Fig. 3 Variant detection followed by multiple alignment of Sanger sequencing results using Geneious Prime software: Example of a SNP mutation for P-ATL-91951-565 assay in terms of sex (Fig. 4g). The result of the P-INT-91951-120 assay indicated that while males and females of P. integerrima showed a heterozygous (T/A) genotype, all other male and female samples of the species in the genus had a homozygous allele 2 (A/A) character. This marker can be used as a species-specific marker for the Pistacia genus.

Discussion
The first sex determination study on Pistacia with molecular markers was published by Hormaza [8]. One thousand random amplified polymorphic DNA (RAPD) markers were used for sex discrimination in that study. Bulk P. vera L. male and female samples were amplified with 700 decamer primers, and as a result, OPO08945 (OPO-08) was defined as a sex marker. A 945-bp size band was observed in the female samples, and this band was absent in the male samples. Yakubov et al. verified the primer and developed the OPO-08 RAPD primer with touchdown PCR, and they determined PVF1-2 sequence-characterized amplified region (SCAR) primers based on OPO-08. In that study, four point base deletions (PBDs) for the female samples and one PBD for the male samples were defined as mutations (again with P. vera) [39]. Another sex discrimination study using RAPD markers in wild Pistacia species was conducted by Kafkas et al. [40]. BC156 (1300 bp) and BC360 (500 bp) for P. eurocarpa and OPAK-09 (850 bp) and BC346 (700 bp) for P. atlantica were identified as markers. In the articles published by Esfendiyari et al. [41,42], the first 20 10-mer RAPD primers of 30 different 10-mer RAPD primers were tried on three different species: P. atlantica subsp. mutica, P. khinjuk, and P. vera. As a result, they reported that the BC1200 primer is a discriminative marker and designed the terebinthus (e). P. vera (f) and P. lentiscus (g) species with 5 SNP marker identified gender-based SNP differences primers PVF1 (forward) and PVF2 (reverse) from the PCRamplified region sequence (converting RAPD to SCAR). The SCAR primers, which had a 300-bp amplifying region, generated a band in all the female samples but not in the male samples.
Most studies are based on P. vera [38,39,[43][44][45] because of its worldwide economic value. On this basis, in 2008, P. vera L. cultivars were generated with inter-simple sequence repeat (ISSR) primers, and two of them were able to distinguish the sex of the plants. These markers are (AC) 8GC and (AC) 8TA 10-mer repeats, with band sizes of 2400 bp [43]. In the following year, the same research group studied 32 arbitrary primers for RAPD analysis and mentioned FPK 106 and FPK 105, but there were no data related to the primer sequence. Vedramin et al. carried out an SSR study covering four species [46]. Their analysis contained 50 P. vera, 4 P. integerrima, 12 P. mutica, and 16 P. terebinthus plants for a total of 82 accessions, but the tested markers were insufficient for sex discrimination.
To the best of our knowledge, the most reliable study on sex-linked markers in the Pistacia genus was published by Kafkas et al. [38]. In that study, 38 putative sex-associated SNP markers were identified as heterozygous in female individuals and homozygous in male individuals, using RAD seq., which suggests a ZW/ZZ sex determination system in P. vera.
Several sex discrimination marker studies have been performed in Pistacia species, and various molecular marker methods have been used in these studies, such as RAPD [42] or ISSR [43]. Compared with KASP, these previous methods contain extra steps, such as preparing gels, loading samples, or screening gels. These extra steps can be disadvantageous in terms of time, effort, and cost. Because each extra step increases the risk of making mistakes, KASP seems suitable for scaling up as it does not contain any of these steps.
It is generally known that this new and popular technique, the KASP assay, has a wide range of usage in different species, from the improvement of crops, such as maize [20], wheat [17], and tomato [47], to the detection of diseases in humans [18]. Nonetheless, sex discrimination studies using the KASP technique are not common, except for a few fish species, such as salmon [48] or halibut [49], and one insect species [50]. Thus, our study is not just the first KASP study on sex discrimination in Pistacia but also the first KASP study on sex discrimination in plants in general.
In our study, 12 SNP regions were screened for bulk DNA for each Pistacia species, with separate male and female pools. For the validation of the KASP marker results, the DNA of 10 male and female individuals per species were mixed in equal concentrations to form bulks. In this way, the same samples were used in both KASP and DNA sequencing studies to represent results at the species level. Five SNPs (SNP-PIS-133396, SNP-PIS-167992, P-ATL-91951-565, P-INT-91951-256, P-KHI-91951-115) showed clear allelic discrimination between the sexes in species. As a result of this allelic distinction, five KASP primers were identified as sex-linked markers for seven out of eight species. In the sex discrimination marker study conducted by Kafkas et al. [38] in P. vera, 8 of 13 SNPs differentiated between the sex of the plants. When these markers were tested in some other species (P. atlantica, P. terebinthus, P. eurycarpa, P. integerrima) in the genus, a distinction could not be observed in these wild types.
In terms of allelic frequency, the best marker assays were SNP-PIS-167992 and P-ATL-91951-565 (Fig. 5a, c). In the allelic graphs of these markers, frequency differences are apparent for all individuals, and the distinction is exact in terms of homozygosis and heterozygosis (Fig. 5). At the same time, these markers can be considered more comprehensive because they differentiate several species at once for the whole genus (Fig. 5a, b, c). Each SNP marker shows discrimination for three Pistacia species: SNP-PIS-167992 for P. atlantica, P. terebinthus, and P. vera (Fig. 5a); P-KHIN-91951-115 for P. atlantica, P. khinjuck, and P. lentiscus (Fig. 5b); and P-ATL-91951-565 for P. atlantica, P. khinjuck, and P. mutica (Fig. 5c).
When all the results are evaluated, the least discriminative result of allelic signals between males and females was obtained for P. lentiscus with the P-KHIN-91951-115 assay. While allele 2 (HEX) radiations were similar in this case and the indicative dots seem close to each other, allele 1 (FAM) had a more than 3 times stronger signaling difference in the female individuals. Some of the assays were irradiated after the reaction, but they did not display a difference in the allelic discrimination plot. In one marker (P-VERA-10733-132), no signal was received, which is in fact an indication of the KASP system's accuracy and the specificity of the primer sequences.
The assay P-INT-91951-120 discriminated only P. integerrima because the assay was designed specifically for the P. integerrima sequence. Regarding the results of the SNP-PIS-167992 assay, while the research by Kafkas et al. [38] discriminated only P. vera, our assay discriminated two additional species: P. atlantica and P. terebinthus.

Conclusion
This analysis provides insights into the sex discrimination of the Pistacia genus using sequencing-based assays. The SNP assay described in this paper can be used to determine and confirm the sex of Pistacia varieties, so this study underlines the benefits of using SNP marker technology in future breeding programs in terms of cost and time effectiveness. The KASP technique could be a reliable genotyping assay for routinely identifying the sex of plants in the Pistacia genus. Understanding sex determination in plant reproduction is of immense practical importance for biotechnology, the conservation of biodiversity, and the control of invasive species.