Enriched library and microsatellite loci identification
Transferability of microsatellite primers between species or congeners is possible (Fagundes et al. 2016). However, the transferability rate decreases proportionally to the phylogenetic divergence, and the transference of primer-pairs tends to reduce the disclosure of polymorphisms (Alves et al. 2007). Since there was no microsatellite loci sequence available for C. phaea or any congener at the beginning of this work, we started by testing loci available for P. guajava (Risterucci et al. 2005). We tested a few loci for transferability for cambuci, but most of them did not amplify any product or amplified non-specific products despite attempts to optimize amplification conditions; only one locus (mPgCIR11) amplified a monomorphic allele with the expected size. Nonetheless, to be successful in primer transferability between congeners or more distantly related taxa as in this case, a large number of primers needs be evaluated. Nogueira et al. (2016) evaluated 158 microsatellite primer-pairs from P. guajava in 18 Myrtaceae fruit species, including C. phaea and another two Campomanesia (C. guaviroba and C. hirsute). Around 50% of the primer-pairs showed some amplification, with 64 loci considered to exhibit a good quality amplification in C. phaea (Nogueira et al. 2016). However, the transferability screening was based on detection of amplified products in 1.5% agarose gels stained with ethidium bromide, an insufficient condition to discriminate alleles or polymorphism. Primers originally developed for Eucalyptus urophylla and E. grandis (Grattapaglia et al. 2015) were tested in C. adamantinum and C. pubescens, two species from the Brazilian savanna biome (‘Cerrado’) (Miranda et al. 2016). From the 120 primer-pairs tested, 12 loci were successfully transferred to both Campomanesia species, and used to analyze the genetic diversity of two populations of each species.
Accordingly, we proceeded to develop a library enriched for microsatellite sequences of C. phaea using the protocol proposed by Billotte et al. (1999), with a few modifications. We obtained 192 clones, from which 96 were sequenced. Fourteen clones contained microsatellite sequences (14% enrichment), from which seven contained perfect dinucleotide and one perfect trinucleotide repeats (Table 1). From the 14 loci identified, we developed primers specific for each locus, and six revealed polymorphism (Table 1) for the five cambuci regional group of accessions analyzed. The loci were denominated Cam.ph03, Cam.ph04, Camp.ph05, Cam.ph06, Cam.ph09, and Cam.ph13. The remaining eight loci were monomorphic. The six loci disclosed 26 alleles for the 145 accessions tested, ranging from two (Cam.ph05) to six alleles (Cam.ph03 and Cam.ph06) per locus (Table 1). The number of alleles and the profile of polymorphism can be associated with the composition of the repeat motif, the number of repeats, and the size of the microsatellite sequence (Kelkar et al. 2008). The interruption of the repeat tends to reduce the DNA polymerase slippage, which affects the level of polymorphism disclosed (Estoup and Cornuet 1999; Lia et al. 2007). The monomorphic nature of the locus Camp.ph12 appears to derive from the type of the repeat motif (Table 1). On the other hand, loci with perfect di- and tri-nucleotide repeat motifs, with longer repeats, tend to reveal more and polymorphic alleles (Zalapa et al. 2012), such as Camp.ph03 and Camp.ph06 (Table 1), whereas loci with single nucleotide motif or di-nucleotides with fewer repeats (< 9) present monomorphic profiles.
The average PIC (Polymorphism Information Content) for the all the loci was 0.499, ranging from 0.341 (Cam.ph05) to 0.633 (Cam.ph06) (Table 2). Thus, all loci developed for C. phaea are considered informative, with Cam.ph04, Cam.ph05 and Cam.ph13 considered moderately informative (0.25 < PIC < 0.50), and loci Cam.ph03, Cam.ph06, and Cam.ph09 as highly informative (PIC > 0.50), according to the classification of PIC proposed by (Botstein et al. 1980). No linkage disequilibrium was detected among the six loci after the Bonferrroni correction for multiple tests (95%; α = 0.05). The genetic diversity of Nei (1973) is defined as the probability of two random gametes from a population having different alleles for a certain locus, which corresponds to He. Ho represents the real rate of heterozygous individuals from a specific population. The loci Cam.ph03, Cam.ph05, Cam.ph06 and Cam.ph13 revealed Ho lower than He, whereas, for Cam.ph04 and Cam.ph09, Ho was superior to He (Table 2).
Overall, in the C. phaea accessions, the mean observed heterozygosity (Ho) was 0.55 and the expected heterozygosity (He) was 0.64 (Table 2). The analysis of genetic diversity of C. adamantium and C. pubescens using 12 microsatellite loci from Eucalyptus disclosed 82 alleles in C. adamantium individuals, with an average of 6.8 alleles per locus, and 95 alleles in C. pubescens ones, with an average of 7.8 alleles per locus, ranging from 2 to 16 alleles per locus (Miranda et al. 2016). The average values of He and Ho were 0.517 and 0.504 for C. adamantium, and 0.579 and 0.503 for C. pubescens, respectively (n=80). Seven Eucalyptus microsatellite loci were also used to analyze C. adamantium individuals from six sites in Mato Grosso do Sul (MS) and Goias states (GO), and one from Paraguay (n=208). The analysis revealed 71 alleles, with an average of 10 allele per locus (ranging 3 to 21 alleles), and the mean He and Ho was 0.62 and 0.61 for the 207 individuals (Crispim et al. 2018). Both studies using microsatellite loci from Eucalyptus disclosed more alleles per locus than the number found here for C. phaea likely because the analyses of polymorphism were conducted in automatic sequencers using fluorescence, which can be more accurate in detecting alleles. In another study with C. adamantium based on 36 polymorphic microsatellite loci specifically developed for this species, the number of alleles varied from 2 to 14 per locus (mean 8.14), whereas mean values of He and Ho were 0.46 and 0.52, respectively, but based on a small sample (n=10) (Crispim et al. 2019). In general, the overall levels of expected and observed heterozygosity were comparable among the Campomanesia species, mostly with higher He than Ho, but not by a large margin. The values of He and Ho suggest an outcrossing mode of reproduction.
Population Genetic Structure
The five regional collections exhibited an acceptable level of genetic diversity estimated by the percent of polymorphic loci, mean number of alleles locus-1, and heterozygosity. Among the five regional cambuci collections, the average number of alleles per locus ranged from 3.33 (Salesópolis) to 4.33 (Mogi das Cruzes), with an overall mean of 3.83 alleles per locus (Table 3). The percent of polymorphic loci varied from 53% (Juquitiba) to 61% (Mogi das Cruzes), whereas the average among the collections was 57% (Table 3). All accession collections displayed higher He than Ho, except for the one from Ribeirão Pires (Table 3). The fixation index among the populations ranged from -0.13 to 0.20. According to the parameters, the genetic diversity of the populations ranged from 0.53 to 0.61 (Table 3). The higher number of polymorphic loci were detected in populations of Paraibuna and Mogi das Cruzes (Table 3). In these populations, a higher number of samples were collected from wild plants, in the forest, especially in Mogi das Cruzes, where collections were realized mostly within preserved or in an advanced stage of recovery fragments of the Atlantic Forest. These results clearly indicate the importance of a solid action of preservation and/or recuperation of this environment in a high scale, in order to conserve the biodiversity of this biome. On the other hand, in populations of Salesópolis, Ribeirão Pires, and Juquitiba, in spite of some plants were assessed in the wild, there was a more significant proportion of plants collected in backyards, what may explain the lower number of polymorphic loci. However, even with lower genetic diversity, it is important to emphasize the relevance of individual actions of the local resident populations, which with their economic and cultural value given to this species, also contributed for its conservation.
The studies conducted with populations of other Campomanesia species using microsatellite markers revealed similar levels of diversity among natural populations. The populations of C. adamantium from Mineiros and Três Ranchos (He=0.531 and 0.504; Ho=0.504 and 0.505, respectively), and the ones of C. pubescens from Santa Rita do Araguaia and Caiapônia (He = 0.629 and 0.529; Ho = 0.498 and 0.507), all exhibited comparable levels of genetic diversity (Miranda et al. 2016). In another study, seven populations of C. adamantium from the states MS (four populations) and GO (two populations), and one from Paraguay were analyzed for diversity; the values of He and Ho varied from 0.44 to 0.64 and from 0.44 to 0.73 (Crispim et al. 2018). Using a set of primers developed specifically for C. adamantium to analyze three populations from MS and Paraguay, the levels of He estimated ranged from 0.58 to 0.63 and of Ho from 0.50 to 0.62 (Crispim et al. 2019).
The genetic diversity parameter of Nei (1973) for all the C. phaea accessions analyzed here can be considered low (HT’ = 0.10) (Table 4). The genetic diversity among the population is low based on the statistics GST’ = 0.19 and Ө = 0.09. The fixation indexes within populations were 0.57 (Hs) and 0.13 (f). In general, the regional groups of accessions did not show evidence on inbreeding. In comparison, the analyses of two populations of C. adamantium and C. pubescens each from GO indicated a significant structured diversity, with more diversity found within (~75%) than between populations in both species (Miranda et al. 2016). No inbreeding was detected in any C. adamantium and C. pubescens population, but higher genetic diversity was registered among individuals of C. pubescens when compared with individuals of C. adamantium (Miranda et al. 2016). When seven populations of C. adamantium from MS, GO, and Paraguay were investigated for population genetic structure, the AMOVA analysis indicated that the structure showed significance but with low diversity among populations (FST = 0.06), and more diversity within populations (Crispim et al. 2018). Therefore, in all cases for the Campomanesia species, most of the genetic diversity occurred within populations, with little variation among populations. Similar pattern with high genetic diversity within populations has been described for other Myrtaceae species, such as in three populations of E. uniflora (Ferreira-Ramos et al. 2008), and E. dysenterica (Zucchi et al. 2002). Conversely, 13 populations of the Myrtaceae camu-camu (Myrciaria dubia), native to the Amazon, analyzed by seven microsatellite loci indicated a significant deficit of heterozygotes (He=0.218 to 0.680; and Ho=0.137 to 0.527), with a high genetic diversity among the populations, but also a high degree of inbreeding within the populations (Šmíd et al. 2017).
Considering that cambuci flowers are self-incompatible (Cordeiro et al. 2017; Tokairin et al. 2018), and therefore, dependent on cross-pollination for fruit set, it might have a high allele share among individuals within a given population. Flowering occurs during warm and humid months, and in natural stands of cambuci, the synchrony of flowering among individual trees can reach 50% of the individuals, which may contribute to allele dispersion within populations (Cordeiro et al. 2017). Bees with nocturnal or crepuscular habits, such as Megalopta sodalis, Megommation insigne, Ptiloglossa latecalcarata, Zikanapis seabrai, and Apis mellifera are responsible for pollination of cambuci flowers (Cordeiro et al. 2017). Seed dispersal in Campomanesia appears to be predominantly performed by animals, particularly small primates (Gressler et al. 2006). Seeds are dispersed during the favorable period for germination (Van Schaik et al. 1993; Morellato and Leitao-Filho 1996; Tabarelli and Peres 2002). The pattern of flowering and seed dispersal favor to have more diversity within populations than among.
Phylogeny of Campomanesia phaea by ITS sequence
The genus Campomanesia is part of the tribe Myrteae, which holds most of the New World Myrtaceae including all Brazilian species (Landrum 1986). Campomanesia is a well-defined genus within the Myrtaceae, with distinctive morphological features (Landrum 1986; Luber et al. 2020). However, the distinction among Campomanesia species is more elusive, with some species showing large variation, but others exhibit little infraspecific morphological diversity (Landrum 1986).
To place the Brazilian C. phaea in the context of the genus, and to try to define the potential relationship of the species with the other congeners, we sequenced the ITS region of the ribosomal gene. The ribosomal gene contains highly conserved sequences (18S, 5.8S and 26S), separated by two more variable transcribed regions, named ITS1 and ITS2), useful markers to elucidate the evolutive history at various taxonomic levels (Hsaio et al. 1994). We amplified, cloned and sequenced fragments from two accessions (#20 and #41), originally collected in Paraibuna, SP. The sequenced fragments had 721 bp, and upon analysis confirmed the identity against National Center for Biotechnology Information (NCBI) as Campomanesia (GenBank id# MT433815 and MT433816). The sequenced products from both accessions of C. phaea (#20 and #41) were analyzed with other ITS sequences available at NCBI: Campomanesia sp. (AM234078.1), C. laurifolia (MK313875.1), C. guazumifolia 1 (MK313874.1), C. guazumifolia 2 (MG708054.1), C. guazumifolia 3 (AM234076.1), C. ilhoensis (MH445990.1), C. xanthocarpa (KF421011.1), C. xanthocarpa 1 (MG708055.1), C. xanthocarpa 2 (MG708055.1), C. xanthocarpa 3 (KF421010.1), C. guaviroba (MG707974.1), C. hirsuta (MG707973.1), C. velutina (MF954026.1), C. adamantium (MF954025.1), C. pubescens (AM234077.1), and the sequence from Psidium guajava (AY487283.1) was used as an outgroup.
The evolutionary history was inferred using the UPGMA method (Schlee et al. 1975). The optimal tree is shown (Figure 2). The evolutionary distances were computed using the Maximum Composite Likelihood method (Tamura et al. 2004) and are in the units of the number of base substitutions per site. The proportion of sites where at least one unambiguous base is present in at least 1 sequence for each descendent clade is shown next to each internal node in the tree. This analysis involved 18 nucleotide sequences. Codon positions included were 1st+2nd+3rd+Noncoding. All ambiguous positions were removed for each sequence pair (pairwise deletion option). There was a total of 826 positions in the final dataset. Evolutionary analyses were conducted in MEGA X (Kumar et al. 2018).
The clustering analysis formed two groups with strong branch support (89%), plus the outgroup species P. guajava (Figure 2). The first group contained two subgroups, supported by robust bootstrap (95%). The first subgroup contained both C. phaea accessions, clustered with robust branch support (88%) together with the samples of C. hirsuta and C. laurifolia, both species with a wide distribution in the Atlantic Forest, similar to C. phaea (Landrum 1986). The second subgroup (branch support of 86%) included C. ilhoensis and C. guazumifolia, species with current distribution in the Brazilian dry area (‘caatinga’). This subgroup appears to contain species from the coast region of Brazil. The second subgroup showed a robust branch support (82%), and it contained an unknown Campomanesia species, C. xanthocarpa, C. adamantium, C. velutina, and C. pubescens, all Brazilian species that occur in the Brazilian savanna, denominated ‘cerrado’. The results suggest that C. phaea probably evolved in the Atlantic Forest. The clear distinction between the species was not resolved by the ITS phylogeny since there was an overlap of grouping between C. phaea and C. hirsuta.
Choice of accessions for core collection
Ex-situ germplasm collection represents an important source for plant species conservation and breeding (Odong et al. 2013). For species with recalcitrant seeds, germplasm collection must be kept as living individuals, which can represent a large use of limited investment, since collections can reach large planting areas. Seeds from some Myrtaceae species appear to be recalcitrant, not tolerating desiccation (Maluf and Pisciottano-Ereio 2005). Regarding C. phaea seeds, there is limited information about long-term seed storage. In one study, C. phaea non-desiccated seeds maintained 100% germination rate for 180 days when stored at 8°C in plastic bags (Maluf and Pisciottano-Ereio 2005). Seed germination viability of non-desiccated seeds was observed up to 240 days at 8°C, while naturally seed drying reduced seedling? vigor. Thus, the seeds of C. phaea can be considered, at least, as partially orthodox, because they can be stored under low temperatures for about 240 days (Maluf and Pisciottano-Ereio 2005). However, for long-term conservation of a heterozygous out-crossing genotypes, an active germplasm collection must be preserved to keep the genetic identity of selected cultivars.
The maintenance and management of large germplasm collections are expensive and inefficient due to possible genotype redundancies and/or duplications, and because of the difficulty to perform a detailed evaluation of all the conserved individuals (Grenier et al. 2000). To improve the efficiency for maintenance of collections, and to establish a representative source of the genetic diversity, the concept of core collections was established (Frankel 1984). A core collection represents the maximum diversity from all the studied populations with the least number of individuals and redundancy (Egbadzor et al. 2014). To define the core collection, we used the sub-tree with maximum length implemented in the Darwin package (Bernard et al. 2018). This method searches for a subgroup of genotypes, minimizing the redundancy among them and limiting the loss of diversity (Campoy et al. 2016).
The genotypes were chosen based on the genetic distance reflected in a diversity tree. The genotypes with higher genetic distances correspond to those that have higher quantities of uncommon characters, i.e., they are genetically distinct (Billot et al. 2013). The most distinct genotypes were identified using the removed edge value, the maximum length edge and the sphericity index. Campoy et al. (2016) used the reference value of 0.008 as the limit parameter for removed edge value to select individuals with minimal redundancy. Here, we were able to select 18 accessions to form the core collections (Table 5). This core collection represents 12% of the analyzed individuals, comprised of five individuals from Mogi das Cruzes (27, 42, 56, 75, and 77), four individuals from Juquitiba (101, 106, 109, and 112), four individuals from Paraibuna (8, 12, 14, and 18), three from Salesópolis (141, 145, and 146), and two from Ribeirão Pires (119 and 120).