5.1 Big Data in Food Analytics
Since 2013, BD has been a study emphasis, mostly in food safety, food security, and agriculture. Although its application to food safety is still in its early stages, technology is affecting the entire supply chain. Regulatory, food enterprise (encompassing data generated at every link of the industrial chain from farms to restaurants), and media data are the most common food BD sources (including food-related news, video, pictures, and audio). High-quality BDA can help the food business thrive, however low-quality DA can harm managers' forecasting of market demand and social stability (Tao et al., 2021).
The recent introduction of advanced technologies has changed practitioners' and academics' attention to the deployment of enormous amounts of created data as a critical and efficient instrument for addressing modern food supply chain management concerns. With food and beverage firms increasingly focusing on acquiring, processing, and analyzing useful information derived from different sources within their respective food systems, data management has grown into a crucial tool in today's food supply chain. Within this paradigm, BD has received a lot of attention in recent decades, referring to massive amounts of data assets that are highly dispersed and heterogeneous in nature. Surprisingly, the annual added value that could be realized using BD in food supply chains is projected to be between $120 and $150 billion (Margaritis, Madas & Vlachopoulou, 2022). BD is mostly used in observing the concerns controlling global problems such as food security, safety, feasibility, and growth in efficiency, which expands the scope of BD beyond farming and includes the entire food supply chain. The interconnection of agricultural and supply chain devices results in real-time data evolution via hassle-free wireless IoT. To predict insights in agricultural and business process improvements, the data sets are process-oriented, machine-oriented, and human-sourced (Sharma et al., 2021).
Sailing to the other harbor, the analytics use, and monetization of data are increasingly crucial for the profitability and sustainability of multi-sided platforms (MSPs). Previous literature shows that apart from revenue growth and cost optimization, DA can decrease customer acquisitions costs, retain valuable customers, help predict customer behavior, improve customer experience, reduce fraud, provide real-time offers, and enhance DM. However, data on its own is not a source of competitive advantage since all firms can collect hordes of data from a variety of sources. Rather, data must be purposely analyzed, and activated. Nonetheless, firms face a host of issues - organizational, financial, physical, and human resources - in their attempts to create a competitive capability from the use of data and may easily fail to exploit the benefits of DA. Regarding the growth of data, along with the trends in digital business models and expected benefits from DBMS, a recent global survey of ~ 400 companies showed that 77% of companies do not have strategies to use BD effectively. Many companies are thus failing to benefit from integrating BD into their business models (Isabelle et al., 2020). The literature offers several reasons for such failures. According to Morabito (2015), BD emphasizes “utility from” data rather than “ownership of” data. This means that access to purposeful data is key. Further, raw data is useless unless it is purposely analyzed. Jones (2019) notes that there is a difference between data that can be recorded and data that actually gets recorded, as well as between the results from data analyses that get extracted, understood, and exploited for business benefits.
Further, by integrating information on environmental parameters with pathogen development and/ or hazard occurrence, BD can be utilized to forecast the existence of diseases or contaminants. For example, by monitoring agricultural conditions in the field, locations with a high frequency of aflatoxins can be discovered before they enter the food chain. Another study used a range of models and databases, including weather data, to construct quantitative models to forecast the contamination of the mycotoxin deoxynivalenol (DON) on wheat in northwestern Europe. The presence of Listeria monocytogenes could be anticipated by defining the presence of pathogens on farm fields and integrating this with environmental and meteorological data (Marvin et al., 2017).
This brings us to the realization that BD is a largely underutilized instrument in the food safety and quality sectors. Researchers stated that “big data is high volume, high velocity, and high variety information assets that require new processing forms to enhance DM, insight discovery, and process optimization”, which corresponds to the description of sensory data generated in agriculture (Sharma et al., 2021). Ciccullo et al. (2022), claimed that approximately 80% of the world’s data is unstructured. Hence, information and insights can be derived from BD by applying advanced analytics (i.e., BDA), which brings unprecedented opportunities to benefit from it. In fact, previous literature has found that firms using BDA are 36% more likely to surpass their competitors in revenue growth, operating efficiency, and can decrease their customer acquisition costs by 47% (Isabelle et al., 2020).
To conclude, cloud computing, IoT, BD, and blockchain can transform food supply chain discrete production lines into data-driven interconnected intelligent systems. Each action is automatically integrated by semantic active technology, increasing the efficiency of precision agriculture and company administration. Farmers may improve crop planting and animal growth cycles by using sensors and drones to collect data on weather, topography, and animal and crop behavior. Intelligent gadgets collect actionable data and make decisions that reduce downtime for equipment. By 2026, smart agriculture will save 4–6% of agricultural costs while increasing market value by 3%. The use of BD can help firms not only deal with issues in food production, but also find more economical raw materials, lowering manufacturing costs. It also encourages the development of smart agriculture, which saves water, preserves soil, reduces carbon emissions, and increases output (Tao et al., 2021).
5.2 Data Science in Food Analytics
According to literature, BDA can be used to improve sustainability performance of supply chains (Margaritis, Madas & Vlachopoulou, 2022). Studies also demonstrate how BD collected along all phases of agricultural supply chains, from pre-production to distribution, and analyzed using ML approaches can improve supply chain performance and environmental sustainability. For instance, accurate weather forecasts can help with water resource management. Furthermore, the capacity to extract meaningful insights from BD may increase supply chain visibility and facilitate collaboration for long-term sustainability. In general, supply chain environmental sustainability can be improved by employing BDA to estimate the environmental impact of supply chain-related choices (Ciccullo et al., 2022).
Additionally, several BD collecting and analytics platforms, such as SemaGrow (http://www.semagrow.eu/), have been developed to assist farmers in DM. This system makes use of algorithms and tools to efficiently query large-scale data collections and independent data sources. It concentrated on the agriculture domain and its cases by combining and integrating massive and heterogeneous spatiotemporal data sets. Regional agro-climatic modeling in the context of climate adaptation is one of the use cases investigated by SemaGrow. Food tracking and tracing is required in the supply chain to ensure fast recalls. This can be expanded by incorporating GPS, sensor-based, and RFID technology. This allows for the collection of near-real-time data on the location and other characteristics of the food. Such monitoring data is supposed to aid in the early detection of a problem, allowing for prompt preventive steps and, as a result, the prevention of an outbreak (Marvin et al., 2017).
Climate change, population growth, water shortages, soil degradation, and food security are all important concerns confronting traditional agriculture (Pitts et al., 2020; Sadiku et al., 2020). The data collected from sensors, such as the pH sensor and the temperature sensor, is processed using ML techniques to determine the system's health. Countries with a diverse land distribution and scarcity of water for home and agriculture use require a viable solution to food and environmental issues. The Aquaponics farming method approach promises to be a prominent alternative. For effective growth and information usage, cloud-based sustainable smart aquaponics farming employs IoT-based predictive analytics. Aquaponics is a closed-loop symbiotic system that allows aquatic life and plants to coexist. Aquatic animals produce a nutrient-rich byproduct (ammonia), which plants utilize as fertilizer. This system involves several mechanical and biological operations like heating, pumping, filtering, and so on (Paul et al., 2022). DA and sensor technology enable the creation of optimal conditions for both plants and fish, ensuring high output and improving resource efficiency by regularly monitoring the water quality, water level, temperature levels, fish health, salinity, pH level, humidity, and sunlight. Accurate data can boost fish and plant yields by allowing the nutrients transfer between plants and fish. The data can also be utilized to automate tasks that require less human participation (Elijah et al., 2018).
Ciccullo et al. (2022) analyzed the business models of a couple of startup companies in the food industry to know how they leverage BDA for food waste prevention and management. In the long run, BD could be an asset to the company in terms of ensuring business stability and high operational performance since BDA is a potential supporter in addressing food waste issues. These businesses tend to reduce food waste by leveraging other solutions, even though BDA may have theoretically provided extra assistance, even allowing them to advance to a higher stage in the FWH. Companies with a high BDA exploitation level attempt to maximize material efficiency. Companies that create value from waste, on the other hand, have poor BDA capabilities today, but if they start implementing it in the future, they may be able to tap unexploited food waste prospects. In the advanced BDA and food waste leverage cluster, firms prevent the formation of food surplus, hence preventing food waste at its source. Despite their high performance, these organizations are interested in making their food waste avoidance methods more efficient, as obtaining 100% efficiency is an ideal and demanding goal. Hence, by leveraging BDA, companies can make better decisions to combat food waste from the source, improve forecasting and management of recycling input flows, and address higher stages of the food waste hierarchy.
Food, like any living entity, demonstrates huge diversity among samples, even within the same batch of products, and this issue is exacerbated when the environmental impact, contamination impact, storage and processing conditions, and others are considered. As a result, huge volumes of data are clearly necessary for the construction of robust and efficient models. Nychas et al. (2021) have proposed an idea where the development of a distributed data repository, which will integrate a vast amount of heterogeneous data (such as microbiological, metabolomics, and multi/ hyperspectral fingerprints taken from a wide range of handling, storage, and distribution conditions), can be based on the quantification and correlation of changes associated with agri-food products. A robust and reliable monitoring service can only be produced and used profitably by correlating different food quality criteria, therefore create reliable models of product quality and safety. This information could be tracked through production, supply, and distribution, then uploaded on a cloud data storage, and be made accessible to people via live tracking systems combining mobile and web technologies by using uniquely designed QR codes on food packages or food safety apps that provide personnel with the necessary product information, such as the product origin, nutritional value, food miles, safety profile, and predictive hazards. Hence, the incorporation of such apps drives the agri-tech industry to depend on DS, a multidisciplinary field which combines data, mathematics, statistics, algorithms, and computing to model discrete aspects of the whole supply chain to identify and improve the current industry practices as inefficient.
A major driver for food producers to alter food items and ingredients unlawfully, is to enhance the quantity of the finished product with inexpensive replacements, keeping net costs low and net revenues high. Sorrowfully, vulnerabilities in the food supply chain still exist, and with elevating pressures to maintain food both cheap and in stock to meet demands, the risk of adulteration increases (Nychas et al., 2021).
There are various innovative but underutilized DS technologies available today that have the potential to revolutionize and impact the future of food safety, such as BD methods, AI, blockchain, IoT, and the digital twin (DT) (a digital representation of a physical object such as a city or factory) (Margaritis, Madas & Vlachopoulou, 2022; Nychas et al., 2021; Tao et al., 2021). The combination of these technologies would have a significant impact on the food sector's so-called Industry 4.0. Each technology, while not necessarily complimentary to the others, has essential limitations and qualities that, when combined, result in an improved system, an example is AI coupled with DT for predictive modeling risk assessment. When combined with IoT technology, AI will be boosted by the volume and intrinsic variability in IoT data; additionally, combining blockchain with AI will strengthen the transparency and traceability of such data. IoT is critical for considerably improving traceability, food safety, and waste reduction since it is built on the interconnection of all things (sensors, devices, machines, computing devices) via communication mediums (e.g., WiFi, Bluetooth, RFID). Embedded sensors, low-power wireless communications, and signal processing techniques have recently received attention, allowing devices such as mobile phones to collect and transfer data to repositories via transferring channels such as WiFi, allowing for real-time monitoring and control. In other words, the linked sensors could be put in, on, or near the production and distribution lines, where multidimensional and multivariate data is transferred over the air to a cloud-based central or distributed data repository. Validated mathematical models on the cloud server can accept POST (power-on self-test) requests from sensors as input variables and return the safety profile and risk parameters to the sensor in real-time (Nychas et al., 2021).
5.3 Artificial Intelligence & Machine Learning in Food Analytics
AI’s significance has been deemed deserving though poorly implemented. Monitoring optimum environmental settings avails the world’s growing population. AI helps in detection using sensors and sets optimum farming conditions, for greater and better crop yields, which is critical to meeting the needs of the growing inhabitants of the world. In addition, BDA helps farmers track down problems early and hence take necessary constructive and/ or preventive actions, to enhance production (Sadiku et al., 2020).
BDA and allied technologies in food science and related domains improve performance through IoT for food safety and supply chain traceability to solve food security challenges. Modern digitalization creates large amounts of data at a breakneck pace to make better decisions about agricultural produce, investigate complicated agrarian potential, and monitor equipment performance. Some AI developments in the supply chain are AI-based demand forecasting, risk-management, resilience, transportation, supplier selection, inventory management, etc. Data collection and implementation can aid in the efficient and responsible use of resources, the improvement of DM, and the application of a circular economy strategy in the food chain. Automation and intelligent systems can also be utilized to choose appropriate chemicals for food safety, and multitasking robots can help to speed up the process while maintaining quality (Sharma et al., 2021).
Blue River Technology, a start-up recently acquired by John Deere, has invented yet another sort of weed management system that utilizes cameras, computers, and AI to differentiate crops from weeds. The tractor-propelled machine, which is now functioning on a limited scale in cotton weeding, uses chemical methods to attack weeds by spraying herbicides particularly on weed-infested areas. The fundamental advantage of this method is that it reduces the amount of chemicals required in agriculture, which has both economic and environmental benefits (Misra et al., 2022).
According to Garbero, Carneiro & Resce’s (2021), a number of ML algorithms were used: those from Natural Language Processing (NLP), which encompasses algorithms capable of understanding the content of documents, extracting insights, as well as organizing and categorizing the information itself; and LASSO (least absolute shrinkage and selection operator), which is a regression analysis method in statistics and ML that performs both variable selection and regularization to enhance prediction accuracy and interpretability of the statistical model. The initial stage in data preparation was to make any non-suitable documents appropriate for analysis. This involved transforming the project report files into machine-readable documents so that data extraction could take place. The food systems perspective has become increasingly important as meeting the targets of SDG 2 to end hunger, achieve food and nutrition security, improve nutrition, and promote sustainable agriculture necessitates more holistic approaches to understanding food insecurity and nutrition deficiencies that critically analyze interactions between food production and consumption and their socioeconomic and institutional contexts. One significant challenge is exploiting integrated data that can talk about the growth of food systems and the interconnections between their components. Through project-level data analysis, machine-driven analytics can play an important role in expediting evidence generation around strategic themes, hence reinforcing the usefulness of these methodologies for portfolio systematization (Garbero, Carneiro & Resce, 2021).
In recent years, the success of AI in other scientific areas has led to deep NN and convolutional NN (CNNs) providing superior results in machine vision compared to more traditional ML approaches such as support vector machines, genetic algorithms, and partial least-squares discriminant analysis. CNNs have been used to recognize and classify food categories. While this technology is still in its primitive stages, it has yielded encouraging results with food type recognition utilizing image databases such as Food101 and UECFood (Zhou et al., 2019). Howbeit, one obstacle to the next step of applying DL image-based ML algorithms is the lack of reference photos in the libraries for all food types in varied levels of quality, adulteration, and contamination. To summarize, CNNs mimic the human visual system to extract information from visual material, where color and form are the major aspects of an image that assist in identifying its subject. These can be extracted using mathematical approaches such as edge detection and image segmentation, both of which can be automated using ML methods to gather data relating to regions of interest within an image. Edge detection distorts an image by boosting specific features within it, such as pixel intensity gradient shifts in specific directions, using convolutional kernels. These characteristics are utilized to define edges, and various kernels can extract various types of visual data, such as corners, straight lines, textures, and blank spaces. The expression of these features creates a profile of an image subject, which a NN can utilize to predict the subject's class/ type (Shelhamer, Long & Darrell, 2017). See Fig. 7 below.
5.4 IoT in Food Analytics
The introduction of advanced IoT technology has led to the adoption of more efficient and sustainable agricultural processes, progressively changing traditional agriculture into precision agriculture (PA). PA is considered one of the fastest developing fields of technology for both large and small-scale farming operations. It is now a major participant in the drive to increase global food supply. Because the central focus of this concept is focused on the collection and effective data management in the context of DM, PA is essentially the cornerstone for smart agriculture development (i.e., data-driven agriculture activities) and naturally the use of BDA in agriculture. Indeed, the application of BDA fosters opportunities in the agriculture supply chain, not only in terms of process and results benchmarking, tracking, and tracing of agricultural products throughout the entire supply chain, and eventually predictive modeling for agricultural product production and distribution. According to Wolfert et al. (n.d.), BDA are incorporated into the agricultural BD value chain, which is essentially a system comprised of sequential activities that are interconnected with each other from data collection and storage to data transformation and utilization through DA, eventually leading to data visualization and presentation to agriculture value chain stakeholders. Data in smart agriculture is typically recorded, enriched, upgraded, or combined as materials and information travel through the downstream supply chain using wireless sensor network (WSN) technology. The latter is based on interconnected sensor units that collect data from the environment and transmit digital signals via a radio frequency (RF) communication unit relevant to agricultural product storage, monitoring, and tracking across the IoT network. RFID technology, which could also be considered a wireless node application because it formulates the very basic example of interconnected “things”, is probably the most common technology that fosters the capitalization on data collection within tracking and tracing networks. RFID tags contain electronic product code (EPC) data that is accessible via RFID readers and provide information on item identification, inventory management, quality tracking, and agricultural product lifecycle evaluation. Essentially, the operation of RFID tags is related to the upgrading of barcodes, allowing for the tracking of agricultural products while encouraging information enrichment anytime RFID is recorded along the supply chain network. The use of RFID technology within a WSN via BD usage has been investigated in automated irrigation systems to identify the optimal allocation of water. Another intriguing RFID use within a WSN is smart monitoring of agricultural infrastructure via scalar sensor units that enable remote picture capture to analyze agricultural production processes and spot the appearance of pest problems or plant diseases in real-time (Margaritis, Madas & Vlachopoulou, 2022). Consequently, the use of IoT networks involving humidity, temperature, light, and microbiological and product quality sensors for real-time monitoring of products in transit is useful for the food industry in rescheduling, recalling, or taking appropriate actions.
In the agricultural sector, IoT and DA are employed to improve operational efficiency and productivity. The usage of WSN as a main driver of smart agriculture is giving way to the use of IoT and DA. IoT aspires to integrate the physical and virtual worlds by utilizing the internet as a communication and information exchange platform. IoT is defined as a networked system of interconnected computing devices, mechanical and digital machinery, items, animals, or people with unique identifiers and the ability to transfer data without requiring human-to-human or human-to-computer interaction. As the world population is expected to reach 9.7 billion in 2050, there will be a higher demand for food. To meet the world's food demands in the next years, the globe is adopting the use of IoT paired with DA. Smart agriculture will be possible using IoT and DA, which is predicted to produce great operational efficiency and yield (Elijah et al., 2018).
There are numerous instances of IoT use in agriculture (Misra et al., 2022). Crop and livestock, tracking, machinery, irrigation and water quality monitoring, weather monitoring, soil monitoring, disease and pest management, automation, greenhouse production, and precision are examples of such applications. Several environmental conditions influence crop yield in agricultural cultivation (See Fig. 8). Acquiring such data aids in understanding the farm's trends and processes. Rainfall, leaf wetness, temperature, humidity, soil moisture, salinity, climate, dry circle, solar radiation, pest migration, and human activities are examples of such data. The acquisition of such extensive records enables optimal DM to increase farm produce quality, reduce risk, and optimize profitability (Elijah et al., 2018).
Smart farming enables farmers and growers to increase output while decreasing waste, from the amount of fertilizer used to the number of trips made by farm vehicles. Food waste is a major concern to food security, with waste estimated to account for up to 50% of all food produced due to poor management from production to retail. The advent of wireless communication technology, mobile devices, and ubiquitous services has paved the road for widespread internet connectivity. The amount of linked agricultural equipment is predicted to increase from 13 million at the end of 2014 to 225 million by 2024, according to a Machina research report. Data may be accessed whenever and wherever by connecting IoT devices to the Internet. However, data transfer through the internet necessitates security measures, real-time data support, and flexibility of access. The internet managed to clear the path for cloud computing, which collects massive amounts of data for storage and processing. Cloud computing entails managing user interfaces, services, organizing and coordinating network nodes, computation, and data processing. The usage of cloud IoT platforms enables the storage of large amounts of data collected from sensors in the cloud. However, there are several critical technical criteria to consider while deploying IoT devices. The following factors should be examined for wireless connectivity: communication distance, data rate, battery life, mobility, latency, security and resilience, and gateway modem cost (Elijah et al., 2018).
Lately, a vast amount of data is generated from discrete sources, such as food safety monitoring systems, IoT, media, and other devices (Nychas et al., 2021; Sadiku et al., 2020; Tao et al., 2021). These data ought to be used for the betterment of the food supply chain and DS, undoubtedly, plays a vital role to fulfil this. Along with poor communication and coordination between the various participants in the food supply chain, customers’ attitudes regarding existing food standards are other major contributors to food loss and waste. The food industry extensively relies on regulatory inspection for monitoring the quality and safety of food where analyses are performed via conventional methods (e.g., International Organization for Standardization for total viable count). To maintain the safety standards of food, such processes are costly, timely, destructive to foods, retrospective, and preclude real-time information providence as per remaining shelf-life and safety throughout the foods’ lifecycle (Nychas et al., 2021).
Sensing is the source of all data in the IoT. With the deployment of numerous IoT devices, the agri-food sector generates a vast number of heterogeneous datasets in terms of content, structure, and storage type. BD is characterized by variability, variety, unstructured nature, noise, and excessive redundancy. To extract important information from such massive amounts of data, complex methods for data curation and storage, as well as expensive statistical methodologies and programming models, are required. Conditioning and preprocessing of primary data yield the knowledge needed to comprehend the state of the (agri-food) system. A system can be made capable of making independent localized judgments and taking appropriate actions by employing advanced algorithms and measuring the system's performance in relation to the desired outcome. This amount of autonomy in sensing, DM, and actuation is what distinguishes an IoT system as intelligent (Misra et al., 2022).
Millions of global monitoring data entries are stored in the Global Environment Monitoring System (GEMS/ food) database. Given the relatively high number of entries (600–800 entries per month), the data is organized logically and is easily retrievable. Information on chemical properties, microorganisms’ growth conditions, and weather data can be useful for food safety studies, for it can be utilized in models to anticipate the presence of hazards, such as mycotoxins in wheat, for instance. Data is retrieved in numerous ways, food safety authorities and food-related groups are already using social media platforms like Facebook, Twitter, and YouTube to communicate with the general public about food safety issues. Food agencies will have a better understanding of their audience and may discover new challenges by monitoring consumers' social media interactions. Approaches to web mining and social media analysis are being developed to use the massive amounts of data as an early warning system for identifying potential health and food safety hazards that could lead to a crisis. As per data storage, it generally is accomplished using data management systems such as MySQL, Oracle, and PostgreSQL. However, such systems are insufficient to handle large amounts of data. In these instances, much more speed, flexibility, and dependability are required than standard systems can provide. As a result, next-generation databases known as NoSQL have been developed that are nonrelational, open source, and horizontally scalable. Examples of such systems are MongoDB, Cassandra, and HBase. Following storage, the next difficulty is to move large amounts of data from many sources into a NoSQL cluster for processing. Transfer software is required for this, and examples of such software used to manage massive amounts of data include Aspera and Talend (Marvin et al., 2017).
To conclude, BDA assists companies in the food industry to understand customers’ needs and preferences by gathering real-time information about consumers, products, brands, and competitors. As a result, it plays a vital role in the expansion of the food industry. DA is a complicated process where raw data is translated into meaningful insights, that drives business decisions. It is just a matter of employing the proper tools and understanding what to do with the gathered data (Trends, 2021). For this, decision-makers leverage new technologies to further support their businesses. McCurdy (2022), claims that DA unlocks great value by increasing the employees’ efficiency, reducing waste, ensuring consistent quality, optimizing product portfolios, and finally becoming a DD organization.