Fruit Evaluation of 103 Acerola (Malpighia Emarginata D. C.) Phenotypes from the Subtropical Region of Brazil

Daniela Farinelli (  daniela.farinelli@unipg.it ) Universita degli Studi di Perugia Dipartimento di Scienze Agrarie Alimentari e Ambientali https://orcid.org/0000-0002-7791-6987 Portarena Silvia Istituto di Ricerca sugli Ecosistemi Terrestri Consiglio Nazionale delle Ricerche Daniel Fernandes Silva UNIOESTE: Universidade Estadual do Oeste do Parana Traini Chiara University of Perugia: Universita degli Studi di Perugia Silva Giordana Menegazzo UNIOESTE: Universidade Estadual do Oeste do Parana Silva Edvan Costa UNIOESTE: Universidade Estadual do Oeste do Parana Veiga Joice Ferreira UNIOESTE: Universidade Estadual do Oeste do Parana Pollegioni Paola Istituto di Ricerca sugli Ecosistemi Terrestri Consiglio Nazionale delle Ricerche Villa Fabiola UNIOESTE: Universidade Estadual do Oeste do Parana


Introduction
Acerola (Malpighia emarginata D.C.) is a tropical species, native to the Caribbean Islands and is adapted to the Northeastern region of Brazil (Ritzinger and Ritzinger 2011). The fruit is considered a super-fruit due to its high ascorbic acid content (vitamin C) that can reach up to 5% in the esh (Mezadri et al. 2008;Prakash and Baskaran 2018), which is about 80 times more than the amount in oranges and lemons (Rekha et al. 2012;de Ancos et al, 2016).
In addition to vitamin C, acerola contains many other functional substances such as phenols, anthocyanins and carotenoids that make it a healthy food (de Assis et al. 2008;Xu et al. 2020). The fruit may be consumed fresh or used in the processing of several food products, especially in the pharmaceutical industry for vitamin C and phenol extraction, and as a foodstuff supplement (Segtowick et al. 2013;Reis et al. 2017;Belwal et al. 2018;Prakash and Baskaran 2018). The amount of vitamin C and phenols in acerola are generally the result of a complex combination of multiple factors such as cultivar, environment, and conditions of cultivation and storage (Gomes et al. 2000;Semesato and Pereira 2000;Chitarra and Chitarra 2005;Hanamura et al. 2008;Maciel et al. 2010;Mariano-Nasser et al. 2017;Ribeiro and de Freitas 2020).
To date, Brazil is the world's largest producer of acerola (de Assis et al, 2008) with more than 7000 hectares under cultivation. Among the various States, the most important are: Bahia (1466 ha), Paraná (919 ha), Rio Grande do Norte (800 ha), Rondnia (723 ha), Pernambuco (604 ha), Minas Gerais (466 ha), San Paulo (423 ha), Paraaba (400 ha), Cear (320 ha) and Para (300 ha) (Calgaro and Brandão 2012). (Alvares et al. 2013). Annual rainfall varies from 1600 to 1800 mm, with rainfall well distributed throughout the year and hot summers. The annual average temperature in the region is between 22° C and 23° C (Caviglione et al. 2000). Acerola plantations were established using a few cultivars selected from other geo-climatic areas, such San Paulo or other foreign States. Its propagation from seedlings, resulted in highly heterogeneous orchards in terms of fruit quality and yield (Ritzinger et al. 2018). This led to some disadvantages such as the segregation of plant and fruit characteristics, making certatin farming practices di cut, in particular the harvesting systems (Gomes et al. 2000; Moura et al. 2007;Oliveira et al. 2009).
Although the fruit quality parameters of acerola from different States of Brazil have been reported (Table 1), only one study reported the biochemical and morphological characterization of 14 clones grown in the commercial orchards of Northern Paraná, usable for a breeding program (Carpentieri-Pípolo et al. 2000).  Corrêa et al. (2017) No studies have reported the biochemical and morphological caracterization of acerola phenotypes for the region of Western Paraná, the area where the commercial production of acerola is expanding in Brazil.
In this context, it is of great importance to assess the variability in fruit quality among different individuals of acerola to aid pre-breeding programs of non-domesticated species (Moura et al. 2013;Almeida Júnior et al. 2014;Silva et al. 2017). Hence, the objectives of this study were to: (1) characterize 103 acerola phenotypes using morphological and biochemical parameters; (2) select the best phenotypes suitable for the pharmaceutical industry or as a foodstuff supplement that can be used in future breeding programs for the subtropical Western Paraná region.

Sampling
Completely ripe acerola fruits were harvested from 103 trees (phenotypes) located in different gardens of the Marechal Cândido Rondon city, in Western Paraná State, Brazil, located at a latitude of 24°33'23.26"S and a longitude of 54°3'28.33"W, 420 meters above sea level, during the months of October -December 2019. The meterological data during the sampling period was recorded. Rainfall ranged from 25 mm, in October, to 248 mm in December (Fig. 2). The average temperature varied from 23.3°C to 25.9°C; the maximum temperature ranged from 35°C, in December, to 39.9°C in October (Fig. 2).
The fruits were harvested randomly by hand at the full ripening stage, from all parts of the canopy, thus obtaining a representative sample from each tree. Approximately one kilogram of fruit was harvested from each of 103 trees, each one called a phenotype and numbered from 1 to 103.
The freshly harvested fruits were immediately taken to the Food Technology Laboratory at Western Paraná State University (Unioeste) where they were analyzed physically and chemically.

Fruit size, weight, color parameters and pulp yield
For each phenotype, four repetitions, composed of ten fruits, were used to determine fruit parameters. A digital caliper was used to measure the length between the point of peduncle insertion and the apex of the fruit (longitudinal diameter) and the the largest length measured perpendicularly to the peduncle (transverse diameter).
The fresh weight of the fruits was also determined on a precision analytical balance and results are expressed in grams (g). The volume of the fruits was estimated by the water displacement method, based on immersing the fruits in a known volume of water in a graduated cylinder and determining the difference between the nal and initial volume of water. For these two analyses, four repetitions of ve fruits each were used. The fruit volume is experessed in cm 3 .
The same fruits were analyzed for color parameters using a colorimeter (Konica Minolta brand, model Sensy CR 400, Osaka, Japan). Color is expressed using the rectangular coordinate system L* a* b *, according to the Commission Internationale de E'clairage (CIE 1986 Finally, the fruits were depulped and the seeds from each repetition were weighed to determine the pulp yield, expressed as percentage for each phenotype. The pulp was stored in small plastic bags for later use in chemical analysis and kept in a freezer at a temperature of -10°C until the time of analysis. The data are expressed as means ± standard error (Table S.1 and Table S.2).

Biochemical parameters: soluble solids, pH, titratable acidity, vitamin C and polyphenols
Biochemical parameters were determined on three replicates of ve fruits from each phenotype. Soluble solids content (SSC) was determined using a digital refractometer (MA871 Milwaukee, WI, USA) with automatic temperature compensation as described by AOAC (2005). The results are expressed as °Brix (concentration of sucrose w/w). The ratio between soluble solids and titratable acidity (SSC/TA) was also calculated.
Titratable acidity (TA) was determined with phenolphthalene (Ali 2008) and the results are expressed in grams of malic acid per 100 grams of pulp, carried out according to the methodology proposed by the Adolfo Lutz Institute (Lutz 1985). The vitamin C (ascorbic acid) content was determined by titration (Tillmans modi ed method) based on the reduction of 2,6-dichloro-phenol-indophenol by ascorbic acid (Benassi and Antunes 1988). One gram of pulp was diluted in 100 mL of 0.5% oxalic acid and homogenized. Then, 5 mL of this solution was diluted to 50 mL with distilled water and titrated. The results are expressed as mg 100 g − 1 FW (fresh weight).
The content of total phenolic compounds was determined according to the conventional Folin-Ciocalteu spectrophotometric procedure developed by Georgé et al. (2005). Extracts were added to 1 mL Folin Ciocalteu reagent (1 N), 2 mL Na 2 CO 3 at 20% and 2 mL of distilled water and absorbance was monitored at 700 nm. Results were calculated from a standard curve of 98% gallic acid (0-50 µg) and are expressed as gallic acid equivalents (GAE) mg 100 g − 1 FW.
The analyses of pH, titratable acidity, vitamin C and total phenolics were performed in triplicate and data are expressed as means ± standard error (Table S.2).

Statistical data analysis
To study the relevant data of the acerola phenotypes, the Box plot data analysis method was applied, using SigmaPlot®8.0, which provides an effective summary of a potentially large amount of data. In the box plot method, the input data set is split into quartiles. A box plot has a minimum value, lower quartile (10th ), median, upper quartile (90th ), and maximum value. The box plot goes from the lower quartile to the upper quartile. The difference between the upper quartile and the lower quartile is the length of the box. Inside the box of the box plot, one horizontal line is drawn, which is the median of the dataset. On the outside of the box of the box plot, two more horizontal lines are drawn, one horizontal near upper quartile is called the upper whisker and another line near the lower quartile is called the lower whisker. The end points of the whiskers are typically de ned as the most extreme data points (Streit and Gehlenborg 2014).
The Hierarchical Cluster Analysis (HCA) was performed using the chemical and physical parameters of the phenotypes as input variables.
Since the variables had different scales and units, PCA was calculated on autoscaled variables. Autoscaling consists in transforming each variable by subtracting its average value and then dividing it by its standard deviation. This transformation allows the data to be translated to the origin of the reference system since each variable will have an average value equal to zero, and this also makes the variability of each variable equally important (Wise and Gallagher 1996). The primary goal of HCA is to display the data in such a way so as to emphasize their natural clusters and patterns in a two-dimensional space. The results, qualitative in nature, are usually presented as a dendrogram, making it possible to visualize the clusters and correlations among samples or variables. In HCA, the Euclidean distances between samples or variables are calculated and transformed into a similarity matrix whose elements are similarity indexes ranging from 0 to 1; a smaller distance means a larger index and therefore, greater similarity (Granato et al. 2010). For hierarchical cluster analysis the dataset was treated with the Ward's method of linkage with squared Euclidean distance as a measure of similarity. The quality of the dendrogram obtained after HCA was evaluated by the co-phenetic correlation coe cient, which represents a statistical criterion widely used, selecting the hierarchical clustering method when there is no prior knowledge of the pattern of clustering (Matta et al. 2015).
Principal component analysis (PCA) was also performed on the same input variables used for HCA analysis to explore the variability among samples and to detect the most discriminating coordinates (principal components, PCs). PCA summarizes the information contained in the data matrix in fewer independent PCs, obtained as linear combinations of the original variables, lying in the direction of maximum variance (Portarena et al. 2019). The data were statistically evaluated using r.

Morphological parameters
The acerola fruits from the 103 phenotypes presented a large variability in terms of fresh weight, fruit volume, pulp yield, size and color parameters (Fig. 3, Table S.1).
Fruit fresh weight varied from 1.53 g to 8.85 g with a mean value of 4.6 g and a median of 4.55 g (Fig. 3, Table A.1).
In 66% of the phenotypes, the mean fruit mass was over 4 g, the limit required by the industry (Semensato and Pereira 2000) ( Fruit volume varied from 2.2 cm 3 to 9.40 cm 3 with a mean value of 4.85 cm 3 and a median value of 4.70 cm 3 (Fig. 3, Table S.1). The range was within that reported by Magalhães et al. (2018) even though maximum values were higher (Table S.3).
Pulp yield ranged from 57.2-90.1% with a mean value of 76.05 % and a median of 76.14% (Fig. 3 and Table S.1). These values are within the range reported by Brunini et al. (2004), Magalhães et al. (2018) and Maciel et al. (2010), but with higher maximum values than those recorded by Carpentieri -Pipolo et al. (2000) in Northern Paraná State and in other areas of Brazil (Cavalcante et al. 2007) (Table S.3); they were even higher than the range recorded by Gomes et al. (2000). Table S.1 shows that 6.8% of the phenotypes had a pulp yield of around 84%, which is a good amount for the pulp processing industry (Magalhãe et al. 2018). According to these authors, this is an essential quality characteristic of acerola destined for processing because it directly affects the cost/bene t ratio.
The longitudinal diameter of the fruit ranged from 13.0 to 22.7 mm with a mean value of 17.5 mm and a median value of 17.8 mm. The transverse diameter ranged from 14.3 to 27.8 mm with a mean value of 20.4 mm and a median value of 20.5 mm ( Fig. 3 and Table S.1). These ranges were slightly greater than those reported by Carpentieri-Pípolo et al. (2000) who reported a mean longitudinal diameter between 9.40 and 18.60 mm and a transverse diameter between 8.53 and 17.40 mm, but they were within the ranges reported by others (Table S.3). Only 2% of the phenotypes had transverse diameters less than 15 mm (numbers 67 and 75), the value which, according to Semensato and Pereira (2000), is recommended for industrial use (Table S.1). The analysis of longitudinal and transverse diameters showed that the fruit is wider, than long. According to Gonzaga Neto et al. (1999), the larger the fruit, the easier and quicker is harvesting because there is less labor involved and, consequently, production costs are reduced, and the fruit is more attractive for consumption (Magalhãe et al. 2018).
The colors of the acerola phenotypes varied greatly. The brightness index L ranged from 19.4 to 55.9 with a mean value of 31.2 and a median value of 30.4 (Fig. 3, Table S.1). Similar high L values were observed by Brunini et al. (2004) and Mariano-Nasser et al. (2017), while the lower values were lower than those reported in the literature (Table S.3). Looking at the box plot, it can be seen that the L parameter of half of the phenotypes was over the median value, and the other half below, which is in agreement with data reported by Godoy et al. (2008) (Fig. 3).
Parameter color a, corresponding to red, ranged from 11.3 to 46.6 with mean value of 28.3 and a median value of 26.6 ( Fig. 3 and Table   S.1), which was a wider range with respect to those reported by Figueiredo  Parameter color b, corresponding to yellow, showed an even wider range, ranging from 3.0 to 48.9, with a mean value of 14.9 and a median value of 11.9; both the mean and the minimum values were within the ranges reported in the literature (Table S.3), but the maximum values were much higher ( Fig. 3 and Table S.1).
The comparison of the two parameters color a and color b showed that yellow is predominant with respect to red, in agreement Godoy et al.
, except for two phenotypes (numbers 15 and 102) in which red is dominant, as indicated by the round symbols above the box (Fig. 3, Table S.1). With the exception of phenotypes numbers 15 and 102, all the phenotypes were within the speci cations required by the pharmaceutical industry that prefers orange colored acerola (Semesato and Pereira 2000), discarding purple and yellow ones (Lima et al., 2014); in fact, in agreement with Loápez (1963), fruits are harvested when they begin to turn a pinkish-orange or light-red color.

Chemical characters of acerola
The acerola fruits had a high variability in terms of vitamin C content, soluble solids, titratable acidity, pH, soluble solids/ titratable acidity ratio and total polyphenols (Fig. 4, Table S.2).
The fruit soluble solids content (SSC) varied from 6.4° Brix to 12.1° Brix, with a mean value of 8.7 and a median of 8.5, in agreement with Musser et al. (2004), who reported a range of 5-12° Brix in relation to different edaphoclimatic conditions in Brazil (Fig. 4). The variations in the present study were similar to those described by Moura et al. (2007) Table 1). These last authors reported the highest SSC values with means between 8.0 and 15.8°B rix in acerola accessions from the Northeastern region of Paraná. High values of SSC are important for consumption as fresh or processed fruit. Loápez (1963) showed the SSC increased as the fruit ripened, or as the season progressed and that it can be used as a ripening index. According to Alves et al. (1995) normal maturation occurs when fruits are harvested with at least 6.5 % soluble solids. All the phenotypes had values over this threshold except for number 16, which was slightly lower (6.38 °Brix) (Table S.2).
The pH varied from 2.72 to 4.36 with a mean value of 3.43 and a median value of 3.44 ( Fig. 4; Table S.2). This range of variation is wider than that found in the literature (Table 1).
It is noteworthy that, except for number 14, all phenotypes are in accordance with the speci cations for fruit quality pulp determined by MAPA (Ministério da Agricultura Pecuária e Abastecimento), which requires at least 5.5 °Brix of soluble solids and a pH of 2.8 (Brasil 2000) (Table A.2). In fact, the pH is a very important parameter in uencing the quality and safety of the fruit because it gives an indication of its storage potential, (which is indicated by the development of acidity) and its assessment is very important in the industrial processing of fruit pulp.
Titratable acidity (TA) varied from 0.4-3.8%, with a mean value of 3.07 and a median value of 3.20% (Fig. 4; Table S.2). The lower values are similar to those reported in the literature, while the upper values are slightly higher than those described by Magalhães et al. (2018), who reported values between 0.86 and 3.13% (Table 1).
According to Nascimento et al. (2018), high TA is important for fruit industrial processing, as it reduces the need to add arti cial acidic substances, although this is not a limiting factor in genotype selection where other fruit quality parameters are satisfactory. On the other hand, low titratable acidity is relevant for consumption as fresh fruit (Seymour and Tucker 1993;Godoy et al. 2008).
The ratio between soluble solids and titratable acidity ranged considerably from 1.86 to 18.62, with a mean of 2.95 and a median of 2.70 ( Fig. 4 and Table S.2). This wide range of variation was closer to those described by Cavalcante et al. (2007) (5.44-19.39) and Souza de et al. (2014) (5.98-15.42). The minimum values were slightly lower than those reported in the literature, but higher than those reported by Semensato and Pereira (2000) ( Table 1). The ratio between SSC and titratable acidity indicates the degree of balance between the sugar and organic acid content of the fruit, which is directly related to fruit avor (Chitarra and Chitarra 2005). Therefore, it is an important variable in the selection of table varieties, even for acerola because the higher is the ratio, the sweeter is the fruit (Estevam et al. 2018;Magalhae et al. 2018).
The vitamin C content of the 103 phenotypes varied from 425 mg to 2625 mg/100 g pulp, with a mean value of 1240 mg/100 g pulp and a median value of 1260 mg/100 g pulp ( Fig. 4 and Table S.2). In fact, 75.7% showed a mean content over the minimum (800 mg/100 g) recommended for breeding programs by Brazilian law (Brasil 2000;Godoy et al., 2008) (Fig. 4 and Table S.2). Moreover, 52.4% of the phenotypes reached a mean content of vitamin C above the threshold of 1200 mg/100 g pulp recommended by the Instituto Brasileiro de Frutas (1995) for industrial use and 68.9% exceeded the limit of 1000 required for export to Europe and Japan (Table S.2) (Maciel et al. 2010). Phenotypes numbers 14, 29, 4, 66, 60, 37 and 99 had the highest vitamin C content, ranging from 2025 to 2625 mg/ 100 g pulp (Fig. 4, Table S.2).
Our results gave the highest range of vitamin C content with respect to those reported in the literature, except for those reported by Figueiredo Neto et al. (2014) in Petrolina (Pernambuco State) in two commercial cultivars (Okinawa and Sertaneja), with a maximum value of 3597 mg/ 100 g pulp (Table 1). However, our results are close to those reported by Carpentieri-Pípolo et al. (2000), who analyzed fourteen genotypes of acerola in Northern Paraná State, obtaining values between 471and 2404 mg/100g pulp in ripe fruits (Table 1); by Oliveira et al. (2009), who analyzed 48 accessions in Itaocara (Rio de Janeiro State), obtaining values between 1116 and 2575 mg/100 g pulp in ripe fruits and by Nasser and Zonta (2014) and Mariano-Nasser et al. (2017), who obtained values between 825 and 2580 mg/100 g pulp in Adamantina (San Paulo State) ( Table 1).
The polyphenol content in the acerola fruits was also quite variable, ranging from 84 mg/kg to 3196 mg/kg with a mean value of 1397 mg/kg, a median value of 1367 mg/kg ( Fig. 4 and Table S.2). Phenotype number 40 gave an exceptionally high value, equal to 3196 mg/kg. However, nearly 50% of the phenotypes had a polyphenol content ranging from 1173 to 1704 mg ( Fig. 4 and Table S.2). Moreover, as vitamin C and polyphenol contents are of great interest to the pharmaceutical industry, these phenotypes have potential for use as clonal varieties. Although polyphenols are important in this fruit, there have been very few studies in acerola (Table 1). In fact, they are an excellent source of antioxidant activity and contribute to fruit color and avor quality, producing astringency and bitterness (Vendramini and Trugo 2000). Souza et al. (2014) found values from 1562 to 2631 mg of gallic acid 100 g − 1 of pulp; Mariano-Nasser et al. (2017) reported values from 914 to 2428 mg of gallic acid 100 g − 1 of pulp and França et al. (2020) reported values from 1016 to 1801 mg of gallic acid 100 g − 1 of pulp. However, the range of variation in these three studies was lower than the current study.
We postulate that the physical and biochemical variations among the 103 phenotypes are all due to intrinsic genetic differences (Nakazone et al., 1966;Gomes et al., 2000;Hanamura et al., 2008;Maciel et al., 2010;Mariano-Nasser et al., 2017;Estevam et al., 2018) rather than to environmental, ripening and storage conditions since all samples were collected in the same area of the Marechal Cândido Rondon city, at the same maturation and storage conditions (see Sect. 2.1).

Correlations among variables
To examine the relationships among the fruit quality parameters, Pearson's correlation analysis was performed. The results are shown in Fig. 5.
Color traits showed the highest number of correlations (Fig. 5). In particular, L, a and b were positively correlated with fruit size, but were negatively correlated with percentage of pulp, titratable acidity and pH (r=-0.55 for b color). Color is one of the most important factors in consumers' decisions and hence affects the price of the fruit. pH in uences the color of anthocyanins and also their stability and enzymatic coloration is favored by lower pH values Wahyuningsih et al. 2017).
There were no signi cant correlations between vitamin C and fruit size (Fig. 5). pH and SSC were negatively correlated with longitudinal diameter (r = -0.45), transverse diameter (r = -0.46), fruit weight and fruit volume (r = -0.43 and − 0.44 respectively), so the larger the fruit size, the lower were the pH and SSC, probably due to a dilution effect (Baldicchi et al. 2015;Famiani et al. 2020).
Fruit size, in terms of longitudinal and transversal diameter, was negatively correlated with pH and SSC. Furthermore, vitamin C and polyphenols were low, but signi cantly (p < 0.05) correlated (r = 0.24). This correlation, also reported by França et al. (2020), can be explained as the fruit response to progressive oxidative stress. In fact, with ripening there is a reduction in oxygen scavenging enzyme activities and an increase in membrane lipid peroxidation, indicating that acerola ripening is characterized by progressive oxidative stress (Prakash and Baskaran 2018). Moreover, enzymes such as ascorbate oxidase or peroxidase may accelerate ascorbic acid oxidation during ripening and therefore lead to its reduction (França et al. 2020).
In a recent review, Prakash and Baskaran (2018) also reported that with ripening, vitamin C decreases and phenols degrade; the decrease in total vitamin C and total soluble phenol content reduction determine lower total antioxidant activity. Even if the reduction in vitamin C has a much more signi cant in uence on acerola antioxidant capacity than phenolic compounds (Xu et al. 2020), the high capacity of ripe acerola fruits to sequester free radicals is due to their high content of vitamin C, in agreement with França et al. (2020).. Although vitamin C content had a wide range of variation in the 103 phenotypes, there was no correlation with other fruit quality parameters (Fig. 5).

Classi cation of acerola phenotypes
The heat map row and column dendrogram, based on hierarchical clustering (Euclidean distances and Ward's method) classi cation provided a framework for exploring how the parameters may explain phenotypical differences among samples. It revealed information about how the samples and variables cluster together and provided insights into potential sample biases.
The results revealed two distinct sample and variable clusters (Fig. 6).
As observed from the color gradient (red = low intensity; yellow = high intensity), Ward's linkage revealed that fruit size and color parameters were the main sample clustering factors. Biochemical parameters presented a mixture of high and low values in both clusters.
Cluster 1 encompassed 37 phenotypes. It was mostly characterized by relatively high intensity of color parameters and fruit size. Moreover, two sub-groups could be traced. They differed based on color intensity and exhibited a pattern of divergence with weight and size parameters. Cluster 2 contained the largest number of samples (66). It was mainly characterized by low intensity color parameters that mostly overlapped with small fruit size and weight. It included a great variability in the biochemical compounds with many samples having a high content of vitamin C, polyphenols and SSC. Samples from Cluster 2 were the most similar in terms of weight, volume and color.
Biochemical parameters presented no clear clustering and were mixed inside each subgroup. However, the lowest values of pH and titratable acidity were correlated with the sub-cluster with high intensity color parameters, con rming the negative correlation between pH and color parameters (Fig. 5). The soluble solids/titratable acidity ratio displayed relatively stable values in all the clusters, except for sample 1 that had the highest ratio (Table S.2).
The co-phenetic correlation coe cient calculated for the two main clusters was 0.7101, indicating the adequacy of the clustering method (Rohlf and Fisher 1968).
PCA was performed on the complete dataset to explore the variability among samples by combining all physical and biochemical parameters and to quantify the contribution of each parameter in determining the two main clusters obtained by HCA. The scatter plot shown in Fig. 7 shows the geometrical distances among the 103 phenotypes within the bi-dimensional plane de ned by the PC1 and PC2 variables. The two sample groups, differing by colors, corresponding to the clusters de ned by HCA.
Almost 55 % of the total variance is explained by the rst 2 PCs. The main separation between the two clusters was obtained through PC1. The descriptors contributing the most in PC1 (35.4 % of the total variance) were fruit size, color parameters and pH, con rming the results of the heat map cluster analysis (Fig. 6). In particular, samples from cluster 1 (red symbols) showed higher values of fruit size and color parameters.
Samples from cluster 2 (blue symbols) basically presented higher values of soluble solids, pH, polyphenols and pulp yield (Fig. 7).
The descriptors contributing the most in PC2 (18.8 % of the total variance) were titratable acidity, and soluble solids/ titratable acidity ratio, the rst being positively correlated to vitamin C (Fig. 5).
Wide variability was also observed within each cluster. In particular, according to PCA values, phenotypes from the upper-left position of the graph (Fig. 7) showed the highest content of soluble solids. Phenotypes that gathered negative values of PC1 and PC2 were characterized by the combination of high polyphenols and vitamin C content. Cruz et al. (2004) suggested performing crosses between parent accessions with the greatest possible divergence and good performance to increase heterosis and the chance of generating superior individuals; in fact, crosses between divergent phenotypes allows the heterotic effect to be exploited and segregating populations with greater variability in crosses to be obtained (Oliveira et al. 2009;Bianchi et al. 2017). From the industrial point of view, acerola fruits from cluster 2 were the most interesting for their biochemical constituents, even if they had low average fruit size values (Fig. 7

Selection of phenotypes
This characterization will make it possible to propagate clonal phenotypes with desirable commercial characteristics, and will aid in the selection of useful traits desirable for the development of new, industrial-driven cultivars.
The high performance of phenotypes 99 and 66, belonging to cluster 2, in terms of vitamin C and pulp yield could be decisive for increasing the vitamin C content in the progeny. On the other hand, the high pulp yield, fruit size and vitamin C content in phenotype 37, belong to cluster 1, indicated that this phenotype from the two classes of samples could be useful for crosses aimed at improving both the quality and yield of acerola (Fig. 6, Fig. 7 The suggested combination in breeding programs will produce promising progenies from which superior lines could originate. Anyway, phenotype 37, due to its superior chemical and physical characteristics, could already be propagated and compared with the most important acerola cultivars (Fig. 5, Fig. 6 Four other phenotypes could be selected for their high content of vitamin C: 4, 14, 29 (belonging to cluster 1) and 60 (cluster 2). However, their fruit weight and pulp yield were outside the required parameters (Table S.2).
Therefore, together with the analysis of the performance of the phenotypes, studies on genetic divergence are of great importance for breeding programs because they assess the variability among phenotypes ) and provide information for the identi cation of mother plants that can be used in crosses with a higher probability of obtaining superior progeny in the segregating generations Magdales et al. 2018). However, it is not possible to capture the combining ability among parents based solely on their individual performance. The breeder must obtain crosses and evaluate the progenies or use techniques that allow a speci c genotype combination to be predicted before the cross is performed (Mihaljevic et al. 2005).
In perspective, the use of DNA markers to estimate genetic distances within the selected individuals will allow acerola phenotypes to be discriminated and the divergent phenotypes should be useful in genetic breeding (Oliveira et al. 2009;Reis et al. 2017).

Conclusions
The 103 phenotypes of acerola were classi ed as belonging to two main clusters. Phenotypes from cluster 1 showed higher values of fruit size and color parameters, while phenotypes from cluster 2 basically had higher values of soluble solids, pH, polyphenols, vitamin C and pulp yield.
The results of the study make it possible to select the best phenotypes suitable for the pharmaceutical industry or as a foodstuff supplement that can be used in future breeding improvement programs for the Western Paraná region. Speci cally the high performance of phenotypes 99 and 66, belonging to cluster 2, in terms of vitamin C and pulp yield, and phenotype 37, belonging to cluster 1, with high pulp yield, fruit size and vitamin C content, could be decisive for increasing vitamin C content in the progeny. Hence, these phenotypes from the two classes of samples could be useful for crosses aimed at improving both the quality and yield of acerola. Four other phenotypes could be selected for their high content of vitamin C: 4, 14, 29 and 60.
Since genetic improvement depends on the correct choice of the best individuals that will be used as parents, the phenotypes selected will also be characterized from the genetic point of view using DNA markers as the next step in the acerola breeding program at the University of West Paraná.  Scatter plot of the scores of acerola samples on the two-dimensional plane de ned by the rst three PCs as calculated from the complete dataset by PCA. Samples are grouped according to the HCA results (the con dence level of 0.95 de nes the ellipses in such manner that approximately 95% of the new observations from that group fall inside the ellipse).

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.