Genetic diversity of fig (Ficus carica L.) germplasm from the Mediterranean basin as revealed by SSR markers

Fig (Ficus carica L.) tree is cultivated worldwide and is highly appreciated for its fruit, which is consumed fresh or dried, having high nutritional and pharmaceutical value and for these reasons there is an increasing interest for its cultivation. In the present study, an ex situ collection of 60 fig accessions (41 indigenous Greek and 19 from other Mediterranean countries) was established and its diversity was analyzed using eight simple sequence repeat (SSR) loci. Greek fig genotypes showed relatively low allelic variation (the average number of SSR alleles per locus was 3.75), an excess of heterozygosity (mean He = 0.489 and Ho = 0.557), and extensive outbreeding (mean F index − 0.151). Cluster analysis showed that the established fig population exhibited weak genetic structure, with most of the genetic variation (89%) being present within individual members of the clusters. Both cluster and principal coordinate analysis confirmed that there is little correlation between genetic makeup and geographical origin of the fig accessions. Polymorphism information content with an average of 0.421 was reasonably informative. An identification key scheme for fig cultivars that will be useful in cultivar discrimination and intellectual property protection was developed. This work will contribute to a sustainable fig production regionally and worldwide, through the establishment and conservation of a reference fig collection, providing germplasm for future breeding efforts.


Introduction
Fig (Ficus carica L.) (2n = 26) belongs to the Moraceae family, known worldwide for its fruit and the presence of latex in all plant parts. Figs are eaten fresh or dried, are rich in phenolic antioxidants and nutrients with high fiber content (Vinson et al. 2005), being ideal for the human diet. According to Vavilov (1951) (Perez-Jimenez et al. 2012). Condit (1955) listed more than 700 fig cultivars, with a great deal of confusion in cultivar identification and their relationships still existing.
Plant germplasm characterization, aiming at its conservation, is traditionally carried out using morphological or agronomical traits. These criteria are often variable across years and locations since these phenotypic traits are influenced by the genotype-environment interactions. In contrast, DNA-based data are stable, reliable, and detectable in all tissues regardless of developmental and differentiation stage and are not confounded by environmental, pleiotropic, and epistatic effects (Mondini et al. 2009). Molecular markers such as microsatellites (simple sequence repeats, SSRs), RAPDs, ISSRs, RFLPs and others have been used in fingerprinting and assessing genetic diversity in various fig collections (Papadopoulou et al. 2002;Chatti et al. 2010;Perez-Jimenez et al. 2012;Ganopoulos et al. 2015;Boudchicha et al. 2018;Rodolfi et al. 2018;Ergül et al. 2021). Microsatellites provide adequate resolution of germplasm differences due to their high polymorphism and codominance, are simple, quick, relatively inexpensive, and exhibit high reproducibility among laboratories, and as a result are still used for fingerprinting in plant species (Gupta and Varshney 2000;Mondini et al. 2009).
In the present study, the genetic diversity in fig germplasm was evaluated for an established population in an ex situ collection (60 fig accessions, from four Mediterranean countries and the USA) and an identification key for fig cultivars was developed using SSRs. The work was also aimed at assigning each accession to a group based on genetic diversity, thus evaluating the structure of the fig population.

Plant material
In the present study the fig population comprised of 60 accessions categorized based on their geographical origin, with 41 accessions from Greece (central-eastern, central-western, northern Greece, as well as Crete, Peloponnesus, Lesvos, Syros) and 19 from other Mediterranean countries-Italy (12), Cyprus (6), Turkey (1), Spain (1); Suppl. Table S1). In particular, 33 of them were selected from different regions of Greece, based on their special agronomical, morphological, and fruit quality characteristics, based on information obtained from farmers and authors' personal field inspection. The name of the accession usually represents a geographical qualifier or sometimes it derives from a morphological or fruit quality characteristic.  . Table S1).

DNA isolation
Plant DNA was isolated from fig leaves of all 60 accessions using the CTAB method (Murray and Thompson 1980). The DNA concentration was estimated spectrophotometrically and its integrity was evaluated by electrophoresis on 0.8% agarose gel followed by ethidium bromide staining. DNA suitability as PCR template was checked by PCR reaction using primers for the ITS (Internal Transcribed Spacer) locus, following the methodology described by Roy et al. (2010).

SSR analysis
Eight SSR markers, namely MFC1 to MFC8, developed by Khadari et al. (2001) for fig, were used in this study. DNA amplification reactions were carried out in a total volume of 25 μl containing 0.5 mM of each PCR primer, 200 mM of each deoxynucleotide triphosphate, 1.5 mM of MgCl 2 , 1 U of Taq DNA polymerase (New England Biolabs, USA) and 50 ng of template DNA (Khadari et al. 2001), using a PTC-200/A100 thermocycler (BioRad, USA). No DNA negative control reactions were performed.
PCR products were resolved using a 12% nondenaturing PAGE in a 20 × 20 gel (Biorad Protean II, USA) at 60 V for 20 min, followed by 180 V for 5 h. Gels were subsequently stained with ethidium bromide and photographed under UV, with photos digitized for further analysis. A DNA ladder (50 bp GeneRuler, Thomas Scientific, USA) was loaded in three wells in each gel, in asymmetric locations, to avoid gel orientation problems when scoring and assist in allele size determination (Hoffman and Amos 2005).
The amplified bands per SSR were scored for each fig accession, using GelAnalyzer (2010a) (http:// www. GelAn alyzer. com). Only gels/lanes with unambiguous band patterns, after background subtraction using GelAnalyzer, were considered for allele assignment. Α band was accepted, when the corresponding fluorescence intensity value, from the digitized photos, was > 10 fluorescent units. Bands with a fluorescence intensity < 35% (i.e. stutter bands) of the main fluorescence intensity value were filtered out following previous recommendations (Ewen et al. 2000; UPOV/INF/17/1 2010). The microsatellite alleles were sized using a standard curve generated for each gel, employing the known molecular size DNAs of the DNA ladder. Since PCR products of a fig accession per microsatellite were electrophoresed two to four times (each time in a different gel), the size of an allele was estimated several times, with the mean value recorded as the allele size. The allele sizes, for each microsatellite and accession, were recorded in an Excel spreadsheet (Microsoft Inc., Redmond, USA) producing thus a data matrix for data storage and further processing. SSR allele size data were binned using Flexibin (Amos et al. 2007), following the methodology described by Ghosh et al. (1997). A final correction of the allele's size was done by visual gel inspection as previously reported (Pompanon et al. 2005;Hoffman and Amos 2005).

Estimation of genotyping error
MicroChecker v.2.2.3 (Van Oosterhout et al. 2004) was used to statistically estimate the percentage of null (nonamplified alleles due to nucleotidic changes in flanking sequences of the SSR) alleles per SSR, which is the main non-technical contributor to the genotyping error.
In order to estimate the genotyping error, a subset of 20% randomly selected genotypes of the fig population was reanalyzed (Pompanon et al. 2005) following the same methodology, except that a different PCR thermocycler (Eppendorf Mastercycler Gradient 5341, USA), DNA polymerase (Phusion ® High-Fidelity, NEB, USA), and a new fig DNA preparation were used. In addition, to strengthen the reliability of the obtained results: a) independent random PCRs, for each of the eight SSRs, were conducted again for approximately 20% of the fig accessions and were re-genotyped, b) for each PCR, PCR samples were re-electrophorized two to four times in different gels, and c) the allele sizes were scored twice by two different persons (Hoffman and Amos 2005).

Data and cluster analyses
Based on the SSR allele size data, genetic variability parameters (Allele per locus (Na), effective allele per locus (Ne), observed (Ho) and expected heterozygosity (He), Fixation index (F), χ 2 test for deviation from Hardy-Weinberg equilibrium (HWE) ( Table 1) per locus, private alleles summary (PAS) per country, and the number of genotypes for all SSR loci) were computed for all the 60 fig accessions originating from Greece and other Mediterranean countries using GenAlEx v.6.5 (Peakall and Smouse 2012).
In order to depict the genetic relationships among accessions of the established fig population the DARwin v6 (Perrier and Jacquemoud-Collet 2006) was employed. Missing allelic data were handled choosing the pairwise allele deletion option, at a threshold of 70%. A DARwin file with extension ".DIS" stores the dissimilarity lower semi-matrix (without the diagonal) as computed by the software. Dissimilarity re-sampling done with 10,000 bootstrap and each semi-matrix is successively recorded at the end of the file. Dissimilarity based cluster analysis was performed and dendrogram was done following the Weighted Neighbor-Joining (WNJ) method with 10,000 bootstraps. In the dendrogram, the scale defined the edge length. In order to determine genetic relationship of fig accessions per country of origin MEGA11 (Tamura et al. 2021) was employed using Nei's distance (Nei 1972). Analysis of molecular variance (AMOVA) was carried out by GenAlEx, with the analysis based on groups as revealed by the above CA analysis for all SSR loci.

Population structure
The population structure was investigated using non-Bayesian procedure, the Discriminant Analysis of Principal Components (DAPC) (Jombart et al. 2010) in the adegenet package for R software (R Development Core Team 2011), where variance in the sample is partitioned into a between-group and within-group component, without making assumptions on panmixia. The number of clusters was assessed using the find.clusters function, which runs successive K-means clustering with increasing number of clusters (k). The optimal number of clusters was selected using the Bayesian Information Criterion (BIC) for assessing the best supported model, and therefore the number and nature of clusters.

Establishment of an identification key for fig
The polymorphism information content (PIC) value per locus was estimated using PICCalc software (Nagy et al. 2012). To establish an identification (Id) key for fig, the methodology of Tessier et al. (1999) was followed. According to this method two parameters were estimated; the confusion probability Cj, and the discriminating power Dj (Dj = 1 − Cj) .   In total, the population of the 60 fig accessions under study resulted in 58 different genotypes for the eight SSR loci. The eight SSRs resulted in 30 alleles with a mean value of 3.75 alleles per SSR. The observed heterozygosity Ηο (0.557) was higher than Ηe (0.489). The Fixation Index (F) was negative − 0.152. In particular, six (ΜFC 2, MFC4, ΜFC5, MFC6, MFC7 and ΜFC8) out of the eight SSR have negative F value, while the remaining two (MFC1 and MFC3) have positive F value. Based on χ 2 -test, three SSRs (ΜFC5, MFC6, and ΜFC8) follow the HWE, while the remaining five deviated from the ΗWE, at a significance level α = 0.05 (Table 1).
Lastly, a comparison was made for the genetic parameters found in the literature where the MFC SSRs were utilized (Suppl. Table S2). The values of the genetic parameters obtained in the present study agree with the previously published work. From the dendrogram generated (Fig. 1) it appears that the fig population studied could be divided in three large groups, which were named Cluster I, II, and III, with each cluster subdivided into two subgroups, 1 and 2 (subgroup I-1, subgroup I-2 etc.), resulting in a total of 6 subgroups. Figs from different countries could be found in the same subgroup, except for subgroup III-2, which contains figs only from Greece. The AMOVA analysis for the 6 subgroups showed that 89% of the total variability within the population is due to genetic differences between the individual fig accessions of each subgroup and only 11% is accounted for by differences between the subgroups. Fst value of 0.132 indicates a small to negligible difference between the six subgroups. When the country of origin for the fig accessions was considered, based on the genetic distance of Nei, two major groups are created, one including the figs from Greece, Italy and Spain, and the second one figs from Cyprus and Turkey (Suppl. Fig. S2).
The genetic structure of the fig tree cultivars was investigated by non-Bayesian population assignment analysis. For DAPC analysis, 20 PCA axes and three discriminant functions were retained. The DAPC analysis indicates a partial sub-structuring of fig accession groups. The Cypriot and Italian accessions were grouped separately, while the remaining accessions were clustering together, most likely due to potential gene flows occurring between these populations (Fig. 2). The data were consistent when the analysis was conducted with clone correction (data not shown).
Allele patterns per SSR locus analysis, in fig accessions, revealed different genotypes, ranging from two genotypes for MFC4, MFC5, and MFC8, to nine genotypes for MFC3, as shown in Table 1 . Table S2).

Identification key for figs
In this study, an identification (Id) key was generated based on the discriminating power (Dj) of each SSR. The SSR loci were hierarchically ordered according to their Dj values. MFC6 ranked first, as it is the most discriminative among the eight SSRs. As a result, the Id key produced was: MFC6-MFC3-MFC1-MFC2-MFC7-MFC8-MFC5-MFC4. MFC4 and MFC5, having the smallest Dj, were not included in the Id key as they lacked discriminative power ( Table 2)  The information describing plant germplasm could be used for identification purposes, recognition of deficiencies of the collection and planning future efforts to strategically enrich it with new plant material. Such information includes morphological, agronomical features, biochemical and molecular data. In the present study, the genetic characterization of the established fig population was described using SSR markers. Such investigations with plant genetic resources are a prerequisite for breeding crops in order to face new challenges, including climate change. Data obtained were also used to propose an identification key scheme.
In a number of genetic studies using microsatellites, genotyping errors that are due to null alleles (nonamplified alleles), DNA degradation and low DNA concentrations are increasingly recognized as important factors that could render the conclusions doubtful (Hoffman and Amos 2005). Especially null alleles that usually result from changes in flanking region sequence of the SSR could alter the estimation of the genetic parameters of the population under study. In the present study, the frequency of null alleles was statistically estimated per SSR, and it ranged from 0 to 3.53%. This range is considered non-significant since frequencies of 5-8% introduce only a small bias in the genetic parameters investigated (Chapuis and Estoup 2007). Moreover, re-genotyping 20% of the members of the population studied reinforced the reliability of the results.
The dendrogram (Fig. 1) shows that in the subgroup I-1 two Italian varieties, namely Dottato   2002). However, these two varieties differ phenotypically in leaf shape and cavity size within the fruit (Ntanos et al. 2015). In addition, in the same subgroup two figs from Greece, namely Maurosykia (accession number 108) and Zakynthos (accession number 160) appear genetically and morphologically close, even though they were collected in very distant regions of Greece. Minor differences between Maurosykia and Zakynthos need to be further investigated. Finally, in subgroup II-1 the genotypes Vasilika Mellisi (140) from Greece and Rosso Dendro (214) from Italy are similar. However, these two varieties differ phenotypically in the size and shape of the leaf and fruit stalk length is significantly longer in the second one (Ntanos et al. unpublished data).
In the fig population studied in the present work, the parameters that quantify the genetic variability were calculated using 8 microsatellites.
Our results are consistent with published data for (Suppl . Table S1) (Khadari et al. 2003;Giraldo et al. 2008Giraldo et al. , 2005Saddoud et al. 2007;Achtak et al. 2009;Aradhya et al. 2010;Caliskan et al. 2012;Perez-Jimenez et al. 2012;Ganopoulos et al. 2015;Boudchicha et al. 2018;Rodolfi et al. 2018;Ergül et al. 2021). The observed differences in genetic parameters observed in published works may be due to the selection of different SSRs, the genotypes analyzed, the differences in the methodology followed for the examination of the samples. Our finding that observed heterozygosity is higher than expected, resulting in a negative F value, agrees with previous works (Suppl . Table S2). Negative F values are the result of a negative assortative mating (Lachance 2016), due to fig's entomophilous pollination, or to heterotic selection by man. Klekowski (1988) pointed out that perennial species tend to exhibit high heterozygosity as a mechanism to overcome the harmful effects of residual mutations. Despite the high observed heterozygosity in our fig population we observed a limited genetic grouping, as previously published (Aradhya et al. 2010;Caliskan et al. 2012;Perez-Jimenez et al. 2012).
From the present study it appears that the geographical origin could not be the main criterion for classification, as described also elsewhere (Giraldo et al. 2008;Aradhya et al. 2010;Boudchicha et al. 2018). Only Ikegami et al. (2009), using a population of eight Japanese and 11 foreign imports, suggested that figs from Japan are not genetically related to figs from other countries. Many authors attribute the absence of grouping of figs to the propagation of the fig tree, which favors the exchange of plant material between different geographical areas (Giraldo et al. 2008;Aradhya et al. 2010). The same partial lack of clustering was observed in the DAPC analysis. We opted to use this approach, instead of a Bayesian one, because the latter assumes that markers are not linked and that the population is panmictic. The DAPC analysis is a convenient approach for clonal or partially clonal populations (Jombart et al. 2010), where it aims at maximizing the discriminatory capacity of the between-groups variance.
The development of a reliable tool for the identification of the variety is necessary to ensure the identity of the plant material for the registration, the protection of the variety and the management of the propagating material. In the present study, we proposed a key identification scheme using six microsatellites that could distinguish 58 of the 60 fig genotypes in the ex situ collection (98% resolution). A study on the morphological characteristics showed that fig accessions that did not differ by molecular analysis could differ in the color of the fruit peal and of the fruit flesh (Ntanos et al. 2015). The latter suggests that a combined approach involving molecular and morphological analyses may be necessary to increase the resolution power.