3.3. Multivariate statistics
PCA analysis was performed for the matrices obtained from the positive ionization mode from LC-MS, where the variable “region of production” was informed to the system to assist in the interpretation of the data. The matrix has samples of leaf extracts in rows (X variables) and data from areas of chromatographic bands are in columns (Y variables). Based on the data obtained, PC1 corresponds to 55.5% and PC2 to 10.1%, thus, the two PCs together correspond to 65.6% of the total PCA model (Fig. 4 below).
As showed in Fig. 4, the PC1 was able to separate the samples by Coffea species. On the left side of PC1 are the C. arabica samples, while the C. canephora samples stand on the right side. Furthermore, PC2 distinguished C. canephora samples in the two varieties, in the first quadrant we see samples of the conilon variety (light green triangles), and in the fourth quadrant the samples of the robusta variety (orange squares) are observed. According to the results, there is evidence to support that coffees’ genetic factor has a great influence on the production of secondary metabolites, and that the genetic variability between C. canephora var. robusta and C. canephora var. conilon is higher than the genetic variability among C. arabica varieties (Souza et al., 2013; Ky et al., 2001).
In accordance with Akpertey et al. (2022), the conilon and robusta varieties are considered two heterotic groups with distinct and complementary characteristics within the C. canephora species. Furthermore, using the technique of single nucleotide polymorphisms of molecular markers, Alkimim et al. (2018) showed higher genetic distance between groups of conilon and robusta (C. canephora) than among varieties within C. arabica species (Akpertey et al., 2022; Alkimim et al., 2018). In the case of C. canephora, allogamy and gametophytic self-incompatibility are responsible for the high heterogeneity and genetic variability of the species (Machado et al., 2022).
It is important to mention that one of the samples behaved as an outlier in the statistical analysis (i.e., the individuals are not contained within the region of the Hotelling T2 ellipse). As expected, it was the case of the commodity coffee represented by magenta stars in Fig. 4. This coffee was not scored as highly as the others by the professional tasters (final score bellow 80 points) and, consequently, it was distinguished as a commodity coffee (a uniform product that is interchangeable with other regular coffees). Similar to other studies (see Maeztu et al., 2001; Bressanello et al., 2021) this result evidences that the sensory evaluation of professional tasters can be associated with the coffee’s metabolomic profile, once the commodity coffee does not present the grain quality and flavor (metabolites) that were responsible for a higher quality in other samples.
In this case, a loading analysis of the PCA (Supplementary Material, Figure S1) was proposed. The presence of possible markers was verified in the groupings shown by PC1 and PC2. The analysis is performed by superimposing the quadrants of the score and the loading plot to correlate the clusters with the metabolites. With the value of the loading plot on the axes, it is possible to verify which substances are more relevant to the observed clusters. Based on PC1 data, caffeine, DIMBOA-Gl, roemerine, and cajanin were determined as markers for C. canephora samples, while toralactone, cnidilide, LysoPC(18:2(9Z,12Z)), lysophosphatidylcholine(16:0/0:0), and 2,3-dehydrosilybin for C. arabica samples.
Next, a supervised analysis by OPLS-DA was carried out with the purpose of classifying the coffee samples and explore a possible terroir effect by identifying what substances were correlated in the formation of clusters. As shown by Vezzulli et al. (2022), production practices and environmental conditions can influence the plant’s organoleptic properties. In the case of coffee, the terroir is the result of the unique combination of the Coffea species and variety planted, environmental and agricultural parameters, and the harvest and post-harvest methods employed (Lucini et al., 2020; Williams et al., 2022). The OPLS-DA using positive ionization mode on LC-MS is presented in Fig. 5.
The OPLS-DA model in Fig. 5 shows a very high goodness-of-fit (R2X = 0.929; R2Y = 0.841 and Q2 = 0.720). R2Y and Q2 parameters higher than 0.5 and close to 1.0 revealed the high correlation (R2Y) and predictive power (Q2) of the model. Validation of the OPLS-DA models was carried out through the obtention of a significant CV-ANOVA p-value (p-value = 1.00 × 10− 11 and p-value = 2.77 × 10− 26 respectively for the OPLS-DA model in Fig. 5) and the absence of overfitting was confirmed by permutation tests (200 permutations). Figure S2 (see the Supplementary Material) present the permutation test model for the OPLS-DA using LC-MS positive mode of ionization.
Based on the OPLS-DA score plot illustrated in Fig. 5, it is possible to group individuals according to the metabolic fingerprints of their respective terroir. Once again, the commodity coffee sample is identified as an outlier. Different from the C. arabica variety (left side of Fig. 5), whose groups were close to each other, the groups of the C. canephora variety (right side of Fig. 5) are located far away from each other, occupying two different sections.
Coffee is a climate-sensitive perennial crop likely to be susceptible to changes in climate, soil, hydric, altitude, and human conditions (Pham et al., 2019). On the purpose of identifying the impact of terroir in both species, two separate analyses were conducted, one solely with the C. arabica samples, while the other, only with C. canephora samples.
The C. arabica samples were proven to be chemically distinct among themselves. Figure 6 illustrates the results obtained from the OPLS-DA analysis performed only with the C. arabica specie.
The OPLS-DA score plot in Fig. 6 shows independent clusters of C. arabica coffee that could be due to terroir effects of the different Brazilian regions. Despite acknowledging that a complete discussion about all possible factors comprising the terroir of each region is virtually impossible, we examined some important aspects of it.
Starting with soil composition, the coffee cultivation in Goiás, Minas Gerais and São Paulo lie on a latosol base, relatively high in iron and aluminum oxides (do Amaral et al., 2004; IAC, 2023). The C. arabica coffee samples of the Brazilian state of Goiás (blue hexagons) come from the agricultural area of the city of Cristalina, in the east portion of the state. This area lies in a highland above 1,000 m of altitude, in a biome known as “cerrado” or Brazilian savannah. The climate of this region varies according to the altitude, from the tropical Aw climate to the subtropical Cwa and Cwb climates according to the Köppen climate classification. Because of the severe droughts and high temperatures, much of the vegetation is of tall bushes and small trees with twisted branches scattered among carpets of grasses (Almeida et al., 2005).
The C. arabica coffee production in Minas Gerais (orange squares in Fig. 6) is carried out in higher altitudes (ranging from 700 to 1,100 m) and in a variation of the “cerrado” climate with temperatures averaging between 22 and 27°C. This coffee production region is known as “Cerrado Mineiro” (Cerrado Mineiro, 2023).
The samples from São Paulo come from the Alta Mogiana region (purple stars in Fig. 6), which lies above 800 m of altitude with milder weather and excellent thermal and hydric conditions for coffee flowering and growth (de Souza Rolim et al., 2020). According to the Köppen climate classification, the climate in Alta Mogiana varies between subtropical climates (Cwa and Cwb) with a quite narrow annual temperature averaging between 18 and 24°C. The flora is mostly of tropical forest, reminiscent of the Atlantic Forest (de Souza Rolim et al., 2020).
Alta Mogiana and the Cerrado Mineiro were registered as Brazilian geographical indications (GI). The GIs are collectively owned and act as a marketing tool in product branding and differentiation (important assets to fight price fluctuations) (Cassago et al., 2021; Cerrado Mineiro, 2023).
The Mexican C. arabica coffee samples originated in the southern state of Chiapas (dark green circles and magenta pentagon in Fig. 6). The coffees analyzed are cultivated in the rural areas of Santa Cruz and Jaltenango cities, which are only 150 km apart. However, the samples constitute two independent groups in the OPLS-DA score plot, meaning that their chemical composition differs. The landscape of Chiapas is mostly composed of mountains and highlands, which interfere in wind, temperature, and cloud conditions (Ochoa-Gaona & González-Espinosa). This means that even neighbor cities may present completely different edaphoclimatic features and, consequently, agronomy practices (Cassago et al., 2021).
Similarly, differences in the metabolomic profiles of the C. canephora coffee samples were observed in results of the OPLS-DA analysis using only C. canephora samples. Results are illustrated in Fig. 7.
Different from the C. arabica samples, two distinct C. canephora varieties (robusta and conilon) were analyzed. Hence, besides the aforementioned genetic effects, the two Brazilian federation states where these coffees were produced are located in regions with completely different environmental and human conditions, which indicate a possible terroir effect in the formation of metabolites (Cassago et al., 2021; Artêncio, Casssago, et al., 2022; Williams et al., 2022). Both regions are registered as geographical indications (Ministério da Agricultura e Pecuária, 2022).
Precisely, the C. canephora var. robusta coffee samples (orange squares in Fig. 7) from the state of Rondônia are cultivated in a transition area between two morphoclimatic zones, the Amazonian and the Cerrado, a tropical rainforest and savannah climates, respectively (Marques et al., 2020). In this area, coffee is cultivated in latosol, acrisol and nitisol soils, all found in tropical regions with differences in their contents, particularly clay (IUSS Working Group WRB, 2015). Coffee cultivation in Rondônia is situated in North of Brazil, while coffee production in Espírito Santo is located in Southeast, in the Brazilian Atlantic coast, in a morphoclimatic zone called “Mares de Morros”, where a subtropical highland variety of the oceanic climate and tropical flora prevails (Ab’Sáber, 2012). The predominant soil in Espírito Santo is the red or yellowish-colored oxisol, which presents a high concentration of minerals such as iron and aluminum (de Cunha et al., 2016).
In addition to the natural differences mentioned, the cultivation practices applied to the coffee samples also vary among the states. The C. canephora var. robusta of Rondônia is cultivated around the natural vegetation of the West portion of the Amazonian Forest, in a sustainable agroforestry system. The C. robusta samples used in our analysis were cultivated by indigenous producers of the Cinta-Larga, Tupari e Suruí ethnicities, whose cultivation practices focus on sustainability and involve a close relationship with the environment. The engagement of indigenous people is essential for social inclusion and prosperity of local communities in Brazilian agriculture and food market.
On the other hand, the production of C. canephora var. conlion coffee in Espírito Santo is carried out in medium and large family properties, with the use of mechanization and different irrigation systems to prevent bean development and harvest from draught (Venancio et al., 2020). Coffee is the primary agricultural product of Espírito Santo and the main contributor to employment opportunities (Incaper, 2023).
These metabolomic differences impact the sensory perception of foods and beverages (Cassago et al., 2021; Artêncio et al., 2023). For instance, although professional tasters used the broader term “sweet” to describe all coffees from Minas Gerais (represented by the purple pentagons in Fig. 6), differences in the use of more specific flavor descriptors were observed. In this case, some samples were described as “caramelly” and “honey” while others as “chocolatey”. In the case of C. canephora, the flavor nuances sensed in the C. canephora var. robusta coffee from Rondônia were described as “fine wine” and “buttery”; while the flavor notes of the C. canephora var. conlion variety from Espírito Santo were defined using terms like “dry berries” and “bright”. It is also worth mentioning that the distance between the commodity coffee and the other groups regardless of species and variety, is reflected in sensory quality.
In the OPLS-DA, the value of importance that each variable must explain X and its correlation with Y is called VIP (Variable Importance in the Projection). VIP values greater than 1.0 are more relevant to explain the answer (Y), as they provide greater reliability in determining the most important variables in separating groups (SIMCA-P Manual 13.0.3). Based on the OPLS-DA analysis carried out with the coffee sample extracts (Fig. 5) it allowed us to determine the variables correlated to the clusters, based on their VIP values.
Based on the OPLS-DA loading plot (Figure S3, Supplementary Material) and the samples’ VIP values, the variables (substances of the TIC) that most influenced the formation of groups were identified. Moreover, the variables that presented a VIP value greater than 1.0 were determined as possible markers for clusters. Table 2 details the variables that were identified and determined as markers for different species of Coffea.
Table 2
Identified variables (compounds) responsible for the groupings in the OPLS-DA analysis of all coffee samples and their respective VIP values.
Compound
|
VIP
|
caffeine
|
4.72583
|
chlorogenic acid
|
3.33315
|
HMBOA-Glc
|
2.28673
|
LysoPC(18:2(9Z,12Z))
|
2.1005
|
2,3-Dehydrosilybin
|
1.84786
|
3-Indoleacetic acid
|
1.73559
|
4-methoxybenzyl glucoside
|
1.66728
|
3,3',4',5,5',8-Hexahydroxyflavone
|
1.63826
|
toralactone
|
1.53175
|
2-(hydroxymethyl)-4(3H)-quinazolinone
|
1.51197
|
calystegine A6
|
1.29546
|
2-O-glucopyranosyl-4-hydroxy-7-methoxy-1,4-benzoxazin-3-one
|
1.28939
|
cnidilide
|
1.22662
|
lysophosphatidylcholine(16:0/0:0)
|
1.19352
|
(R)-roemerine
|
1.19333
|
cajanin
|
1.04053
|
[Table 2]
According to the results of the OPLS-DA analysis, caffeine is one of the main markers to group C. canephora samples. Whereas, for the C. canephora var. robusta, the most relevant markers are roemerine, cajanin, and 2-(hydroxymethyl)-4(3H)-quinazolinone. In the case of C. canephora var. conilon, calystegine A6, DIMBOA-Glc and HMBOA-Glc are the main markers in group formation. Finally, for the C. arabica samples, the compounds most responsible for the formation of groups are toralactone, cnidilide, LysoPC(18:2(9Z,12Z)), lysophosphatidylcholine(16:0/0:0), and 2,3-edhydrosilybin.