This section describes the comparison of data between pre and post-pandemic using the Prediction method and Clustering approach. The prediction of data on essential commodities is carried out based on their categories. There are 8 major categories in the dataset like Flours, Rice, Sugar, Grains, Pulses, Oil, Seasonal food, and Dried nuts. Each category consists of subcategories as shown in Fig 1. Only a few subcategories are mentioned in the diagram whereas the dataset includes other subcategories also.
The proposed work is implemented on the dataset collected from one of the B2B industry. The proposed approach includes 2 methods: Prediction/Forecasting and Data Clustering.
3.1 Prediction: Regression analysis helps to forecast the dependent variable based on one or more independent variables . One of the important part of Business Intelligence in the current period is Sales prediction .
In this work, value generated is used as a dependent variable whereas independent variables are considered as individual store number, month, category and quantity sold. The equation generated using dependent and independent variables is termed as a Regression model. Value of dependent variable changes based on the month wise execution.
3.2 Clustering: since the forecasted data is varying from pre and post covid after comparing actual and predicted values, k-means algorithm is used to analyse:
- what categories of data is varying?
- How much amount of different categories are varying in month wise?
K-Means: This helps us to analyse the sales of essential commodities month wise. Three clusters are formed as shown in the Fig 2, Lowest, Average and High sales. Out of 8 categories, K-Means is used to group the categories based on the 3 clusters formed for each month. Steps of the proposed work is as follows:
Step 1: Input of half yearly Dataset of 2019.
Step 2: Forecasting on the analysis of category and month wise separately.
Step 3: Input of half yearly Dataset of 2020.
Step 4: Comparison between predicted values of 2019 and 2020
Step 4: Major variation seen in the month of March to May, that is post covid prediction compared to pre pandemic.
Step 5: K-Means is applied for 2020 data to check the sales of various categories separately.