The main purpose of this study was to compile, describe and test a reclassification of grocery product groups (LCFC) that could serve as a well-grounded basis for future examination of associations between grocery purchase data, dietary quality, sustainability and health outcomes. The LCFC hierarchy contains four levels, of which the broadest is called Class 1, including food groups such as ‘Vegetables’. The division to the more detailed three sub-classes were done based on the grocery product group’s type, quality (e.g. fibre or fat content), purpose of use, processing, carbon footprint and national food culture. As expected, the nutrient profiles (defined by NRFI) showed that there was a lot of within-group variation in the nutrient quality of the food groups at Class 1. The variation declined in the sub-classes, and this indicates that the subtle sub-classes are more suitable and a prerequisite for examining associations with grocery purchases and dietary quality.
Only a few studies have carefully described their justification and the process of classifying food purchase data for the purpose of using it for studying diet quality and health-related outcomes (9, 10, 22–26). The Food Price Database created by the Center of Nutrition Policy and Promotion of US for the National Food Plans (22) is one of the most extensive and oldest classification systems for grocery purchase data. The Food Plans provide representative healthful market baskets at three different cost levels in the United States, and it merges information about food consumption from the National Health and Nutrition Examination Survey (NHANES) with national data on food prices from the Nielsen Homescan™ Panels. The Panels contain the prices paid for food items by 16,821 households, reflecting the US population. The food classification divides 4152 individual foods under 58 food categories and five broad food groups that are based on similarity of nutrient content, food costs, number of cup or ounce equivalents in MyPyramid (27) and use in meals.
The Quarterly Food-at-Home Price Database (QFAHPD) was developed after the Food Price Database to fill the gap in available food price data and to support research on the economic determinants of diet quality and health outcomes (23). The database contains quarterly market-level prices for food at home. The principles of food classification resemble the Food Price Database. Foods in the Nielsen Homescan™ Panels have been categorised to seven main food groups identified in the dietary guidelines for Americans: grains, vegetables, fruits, milk, meat and beans, oils and discretionary calories. The seven main food groups were further classified into 26 separate categories based on the 2005 Dietary Guidelines (28) and other factors relevant for food shopping and preparation (premiums paid for preparation and other processing). The finest level including 52 categories defines the processing level of the food (e.g. the food is sold fresh, canned or frozen).
The principles and solutions of the Food Price Database (22) and QFAHPD (23) are closest to our classification. They reflect healthy or unhealthy eating by categorisation based on dietary guidelines. The reports openly discuss many of the challenges of the classifications. For example, QFAHPD pointed out the difficulty of classifying foods that are composed of several ingredients, which many researchers trying to classify food purchases are facing. Classifying mixed foods was also one of our main challenges, and our solution was the same as in QFAHPD: creating a ‘Miscellaneous’ class. However, we tried to minimise the number of grocery product groups in this class. This may have resulted in greater variation in the overall nutrient quality of the food groups at each class compared to the QFAHPD. This cannot be ascertained, as the nutritional quality of the QFAHPD has not been examined.
Despite the extensive classifications done in the Food Price Database and QFAHPD, we decided to create a new classification for our purposes. The main reasons for this were cultural and research purposes. Namely, although Finland – like the US – is a high-income economy, there are still differences in our food cultures and grocery food supply (e.g. type of bread and oil used). Moreover, the primary purpose of our LoCard grocery purchase data is to study interactions between food healthiness, environmental impact and price within the context of socio-demographic background and intentional (e.g. new taxation of foods) and sporadic (e.g. COVID-19, Ukraine crisis) transformation; hence, the classification needed to support this research context.
Other classifications that have been well described are the NOVA classification (10) and the Convenience Food Classification Scheme (9). However, these classifications differed quite a lot from our principles, and these classifications would not have suited our purposes to link purchases primarily with health.
We chose to use NRFI to examine how well we succeeded in classifying the data based on dietary quality (11). There would have been other options to use for nutrient profiling, such as the Grocery Purchase Quality Index-2016 (GPQI-2016) (12), which has been shown to associate with the Healthy Eating Index both on food group and total diet levels. There is also a scoring system developed for the QFAHPD to measure the overall quality of grocery purchases, which has been tested against the Healthy Eating Index (29). However, NFRI is well known and widely used in nutrient profiling and allows the examination of all food groups that could be connected to the food composition database. We do not claim that NRFI would be any better than other profiling systems, and whatever is chosen will always affect the results. However, based on the profiling results, our classification was logical, meaning that food classes that are assumed to have relatively better nutritional quality, such as fruits and vegetables, got higher index values than foods considered to have low nutritional quality, such as sweets or chocolate. Further, our results imply that, on the more detailed levels, food classes became more homogeneous by their nutrient profiles.
Of the 11 nutrients that we included in NRFI, intakes of fibre, PUFA and vitamin D have been identified to be relatively low at the population level in Finland (30). In contrast, the high intake of SFA and salt have been public health concerns for decades among the Finnish population. Intakes of iron and folate have been low among Finnish women who are in fertile age. Including these nutrients in the NRFI was therefore justified. An interesting question is whether protein should be in the equation. Although protein per se is needed for health, most of the protein in the Finnish diet comes from animal sources (31). Hence, the environmental impact is not optimal (32).
Strengths and limitations
Our starting point was automatically collected customer loyalty card data, which were provided to us on a grocery product group level. Thus, the most obvious limitation affecting our classification method was that we could not classify on the most detailed level (brand level). This leads to some compromises, as well as making more assumptions of the grocery items under the grocery product groups. For example, frozen pizzas were classified under cereals because we had no information about whether they were meat or vegetarian pizzas.
In our study, we used 11 nutrients to profile all food groups, but one could have also looked at food groups at Class 1 and created separate indices for each food group with relevant nutrients included. This may have resembled the nutritional quality of the food groups better. For example, vegetables are generally perceived as very healthy, but they are not the main sources of iron, vitamin D, fibre or protein. Thus, judging vegetables by how much they include these nutrients is not relevant. The class ‘Vegetables’ had relatively low NRFI, which does not resemble the true nutritional quality of this group.
In our evaluation of the nutrient quality of the classifications, there are possible weaknesses that need to be mentioned. First, although the NRFI is a well-established method to profile groceries based on their nutrient content, it has its methodological weaknesses. The nutrients that are included in the index are subjectively selected by the researchers who choose to use them. Moreover, ranking of the foods by NRFI varies depending on which nutrients have been selected in the index.
Second, since we had the grocery purchase data on the grocery product group level, we had to select one food from the Finnish Food Composition Database to represent the nutrient content of that grocery product group. Again, since the limitation was that we did not have comprehensive knowledge on which type of grocery items were in some of the grocery product groups, the selected food from the composition database may have not always been the most optimal reference food. An improvement to this approach in the future could be selecting 3–5 of the most purchased foods that represent the grocery product group and that are also among the most consumed ones among the Finnish population and assigning the average nutrient values of those foods to represent the nutrient content of a grocery product group.
Last, NRFI does not have any upper or lower limits, meaning that the underlying assumption of the index is ‘the more nutrients the better’. This in practice is not true. Nutrient intake that exceeds the recommended value does not bring additional health benefits. This becomes relevant especially when the nutrient profiling is examined together with environmental impacts. For example, in our results, plant-based protein products received a relatively low NRFI value even though the use of these products may be advisable from an environmental perspective (32).