Data Source
We used data from the official Brazilian National Dietary Survey of a sub-sample of respondents in the Household Budget Survey (NDS-HBS) conducted by the Brazilian Institute of Geography and Statistics (IBGE) between July 2017 and July 2018.
The 2017-2018 IBGE dietary survey sampling was defined by clusters, in two stages: in the first, census tracts were drawn; in the second, households were drawn within each extract. The final sample included 57,920 Brazilian households. The NDS-HBS assessed the food intake using 24-hour food recalls, on two non-consecutive days, in a random subset of 34.7% of households, totaling 46,164 individuals aged 10 years or over. More information about the sampling process can be found in official publications of the IBGE.13
Food groups and variables
Food groups
All subjects reported all foods and beverages consumed the day before both interviews, including information on ingredients, preparation, and quantities. We categorized food reported into three groups: UFP, edible mushrooms, and wild meat. We excluded edible algae from our analysis due to the lack of sufficient observations (n=1) in the database.
UFP
Since there is no consensus list of UFP, we used the consensus of experts to classify the food plants reported by our sample. Initially, we selected all the plants consumed by the survey participants (219 plants) and sent them to six researchers with recognized scientific production on UFP from different Brazilian biomes. The invited researchers classified the 219 plants according to the following criteria: 1. limited use, either in geographical or cultural terms. 2. potential to contribute to the food and nutritional security of human populations. 3. potential to contribute to the sustainable use of biodiversity. We considered UFP those plants which more than 60% of the invited experts considered meeting criterion 1, and alternatively the criteria 2 or 3.
Mushroom
As there was no report of edible mushrooms by species in the dietary survey, we included the following reported foods: uncooked mushrooms, preserved mushrooms, and fungi risotto meals.
Wild meat
We considered wild meat or bushmeat as meat derived from any wild animals, especially non-aquatic vertebrates, harvested for subsistence or trade, excluding fish6. In this sense, we included in the bushmeat category wild animals used as a food resource, as well as preparations including meat from these animals as an ingredient.
We collected the vernacular name, regions of occurrence, and frequency of consumption for UFP and wild meat. We used this information to classify the UFP and the bushmeat by genus and species (if possible). Using vernacular names and the location of consumption provided by the dietary survey and the location of the species occurrence provided by the Flora do Brasil database (https://floradobrasil.jbrj.gov.br/), we produced taxonomic clues that are proxies of scientific names of the species consumed.
We considered the mean of consumption in two days for all food groups. We only included recipes that mentioned foods in one of the three groups, and we used the Table of Reference Measures for Food Consumed in Brazil14 to estimate the amounts consumed in grams or milliliters of each food or beverage.
Socioeconomic and demographic variables
We used the variables sex (male or female), age (years), states of Brazil (names), degree of urbanization of households (urban or rural), education (years of schooling), per capita income (USD), ethnicity (white, black, Asian, multiethnic, or indigenous), and household state of food insecurity. The IBGE survey measured food insecurity according to the reduced eight item version of the Brazilian Food Insecurity Scale (EBIA), the official Brazilian tool to determine food insecurity levels in the population. We classified degrees of food insecurity based on the final scale score, with the following cutoff points: Food security (0), mild food insecurity (1-3), moderate food insecurity (4-5), and severe food insecurity (6-8).15
Data analysis
Descriptive analysis
We performed descriptive analysis to describe the food groups and socioeconomic and demographic variables, using relative frequencies, means, and 95% confidence intervals. We accounted for the sample weights in order to accurately represent the study population according to the sample design of the research. We conducted these analyses using the R language through the RStudio interface with the assistance of the Survey package.
Identifying socioeconomic predictors of unconventional food plants consumption
In addition to analyzing the overall consumption of biodiverse foods, we specifically analyzed the data on UFP to identify the socioeconomic predictors of their consumption. Unfortunately, we were unable to perform a similar analysis on wild meat and edible mushrooms due to the lack of observations related to the consumption of these food resources (consumption frequency <1%).
To choose the classifier with the greatest ability to model the phenomenon, we tested various machine learning architectures. To conduct the tests, we transformed the independent variables to normalize them, so that the values ranged from -1 to 1. Data normalization is a common requirement for many machine learning estimators. To normalize, the standard score of the sample was calculated as follows: z = (x - u) / s, where x represents the independent variables, u is the mean of the training samples, and s is the standard deviation of the training samples. The entire dataset was then used in a stratified K-fold validation strategy, where K=10; in other words, the data was divided into ten groups of similar size, stratified by the dependent variable. The predicted values for each of the ten groups were obtained by training the remaining nine groups. The chosen classifier was the one that showed the best Matthews Correlation Coefficient (MCC). The MCC measures the differences between expected and predicted values, being similar to the chi-square statistic for a 2x2 contingency table. After choosing the classifier, we trained a new instance of the model using the entire dataset for SHAP (SHapley Additive exPlanations) value analysis. SHAP is a cooperative game theory-based method used to increase the transparency and interpretability of machine learning models. In summary, the idea behind the procedure adopted is (1) to choose the best classifier to approximate the phenomenon, and then (2) to evaluate the importance of each independent variable for the result of the model. We chose the Logistic regression + Catboost architecture, which combines a logistic regression model with non-linear analysis, as the best classifier for approximating the phenomenon because it has the best MCC (Supplementary Table 1). All analyses were developed using the Python programming language in the Jupyter Notebook interface.
To verify our results, we used the multiple zero-inflated Poisson (ZIP) regression model to analyze the skewed distribution of UFP consumption and its large proportion of zeros. This is a mixture model that estimates the distribution of the outcome by combining two distributions: a logistic regression model for the zero portion of the model and a Poisson regression for the count portion of the model (Atkins, 2007). The results of the ZIP model are presented as (log)β regression coefficients, their standard errors, and their p values, all related to the count portion of the model.
To use the ZIP model, we transformed the continuous variable of UFP consumption (g/day) into a count variable (number of UFP servings adjusted by kilocalories). We defined a serving size of UFP as the intake of 30g/1000kcal of UFP, based on the average intake of the population. We included the same covariates as in the SHAP model in our ZIP model: area, ethnicity, age, food insecurity, per capita income, sex, and educational level. We also accounted for the complexity of the sample to represent the entire population.