Results of Regression modeling
The aim of the research by using the multiple regression models is to find out whether it is possible to describe the relationship between prices and the influencing indicators through some equations. Interpretation of the results includes the issues of 1) analyzing the data, 2) estimating the model, i.e. fitting the line, and 3) evaluating the validity and efficiency of the model. SPSS software was used to analyze the data. The actual price is the (Y) dependent variable in the regression analysis. The independent variables (X) that have been assigned are shown in Table.1
Analyzing The Data
This study aims to create prediction models using only significant indicators i.e., indicators with strong t-statistics and a significance value of less than 0.05 are used in the prediction process. As a result, the final model may not include all of the indicators you selected. These tests of significance are useful for determining if each explanatory variable is required in the model, assuming that the others are already present. As a result, the "P-value" column in Table 2 represents the significant level. In the case of steel as an example, indicators inflation rate, GDP-construction, GDP, revenue, expenditure, industrial production, import, export, external reserve, and balance of payment have p-value of (0.058, 0.635, 0.983, 0.983, 0.313, 0.52, 0.322, 0.444, 0.801, 0.983) > 0.05 respectively. The test tells us that these indicators are not significant for the modeling process, while the other indicators which have p-value (0.00) < 0.05 add a significant contribution to explaining the change in steel prices as indicated in Table 2.
Estimated Models Coefficients
General forms of the equations for predicting material prices for the fore-mentioned types are obtained from Table 2. When all other independent variables are held constant, coefficients show how much the dependent variable varies with an independent variable. The regression coefficient provides the prospective change in the dependent variable for an increase of one unit in the independent variable.
Table 2. Equations’ coefficients for the Predicted models.
|
Model
|
Steel
|
cement
|
Brick
|
ceramic
|
Gravel
|
coefficient
|
P-value
|
coefficient
|
P-value
|
coefficient
|
P-value
|
Coefficient
|
P-value
|
coefficient
|
P-value
|
constant
|
-6842.9
|
|
-74.664
|
|
834.04
|
|
11.653
|
|
59.7
|
|
Inflation rate
|
_
|
0.058
|
_
|
0.876
|
7.85
|
0.00
|
_
|
0.479
|
_
|
0.583
|
Exchange rate
|
140.05
|
0.00
|
_
|
0.639
|
_
|
0.808
|
1.253
|
0.00
|
_
|
0.123
|
Interest rate
|
370.7
|
0.00
|
_
|
0.605
|
6.8
|
0.00
|
- 0.763
|
0.00
|
_
|
0.642
|
GDP-construction
|
_
|
0.635
|
_
|
0.741
|
_
|
0.690
|
_
|
0.902
|
_
|
0.202
|
Producer Price Index
|
15.8
|
0.00
|
_
|
0.538
|
_
|
0.646
|
0.162
|
0.00
|
0.144
|
0.00
|
Gross domestic product (GDP)
|
_
|
0.983
|
_
|
0.736
|
_
|
0.854
|
_
|
0.782
|
_
|
0.076
|
EGX index
|
2.3E-09
|
0.00
|
3.2E-10
|
0.00
|
_
|
0.796
|
-3E-11
|
0.00
|
_
|
0.096
|
Employment rate (ER)
|
184.9
|
0.00
|
_
|
0.549
|
_
|
0.547
|
_
|
0.477
|
_
|
0.671
|
Revenue
|
_
|
0.983
|
_
|
0.992
|
-0.002
|
0.00
|
_
|
0.988
|
_
|
0.397
|
Expenditure
|
_
|
0.313
|
0.601
|
0.00
|
1.64
|
0.00
|
0.029
|
0.00
|
_
|
0.342
|
wages
|
-1.8
|
0.00
|
0.211
|
0.00
|
_
|
0.258
|
0.009
|
0.00
|
_
|
0.438
|
industrial production (IP)
|
_
|
0.520
|
-26.023
|
0.00
|
24.95
|
0.00
|
_
|
0.193
|
_
|
0.346
|
Import
|
_
|
0.322
|
_
|
0.967
|
_
|
0.854
|
_
|
0.915
|
_
|
0.939
|
Export
|
_
|
0.444
|
_
|
0.985
|
_
|
0.854
|
_
|
0.886
|
_
|
0.915
|
External Reserve
|
_
|
0.801
|
3.046
|
0.00
|
_
|
0.333
|
0.274
|
0.00
|
0.303
|
0.00
|
Money supply (MS)
|
1404.9
|
0.00
|
63.382
|
0.00
|
144.163
|
0.00
|
_
|
0.775
|
19.64
|
0.00
|
External Debt
|
-2E-08
|
0.00
|
-6E-09
|
0.00
|
-7E-09
|
0.00
|
-2E-10
|
0.00
|
-2E-10
|
0.00
|
Unemployment rate
|
-270.7
|
0.00
|
13.751
|
0.00
|
-49.7
|
0.00
|
_
|
0.374
|
-1.7
|
0.00
|
Balance of payment
|
_
|
0.983
|
_
|
0.969
|
-4E-09
|
0.00
|
-3.193E-10
|
0.00
|
_
|
0.915
|
Determine The Suitability Of The Models
The models' summaries are indicated in Table 3. This table provides the values of R, R square (R2), and adjusted R2for the estimate, which can be used to determine the appropriateness of the regression models for the data
Table 3
Material type
|
Model summary
|
R
|
R Square
|
Adjusted R Square
|
Steel
|
0.995
|
0.990
|
0.990
|
Cement
|
0.981
|
0.963
|
0.960
|
Brick
|
0.994
|
0.987
|
0.986
|
Ceramic
|
0.994
|
0.988
|
0.987
|
Gravel
|
0.997
|
0.993
|
0.993
|
The value of R, the multiple correlation coefficients, is represented in the "R" column. R can be thought of as a metric for the accuracy of the dependent variable's prediction. For the steel model, a value of 0.995 implies a good level of predictability. As displayed in the "R Square" column, the R2 value (also known as the coefficient of determination) indicates the proportion of variance in the dependent variable that can be explained by the independent variables. Our steel model's result of 0.99 shows that our independent variables account for 99 percent of the variability in our dependent variable. R-squared appears to be a simple statistic that measures how well a regression model fits a set of data. However, it does not provide us with a good ending. R2 value must be associated with residual plots, other statistics, and an in-depth understanding of the topic area to get the entire picture. Another key issue is to appropriately provide the data interpretation of "Adjusted R Square" (adj. R2). In this example, a result of 0.99 (coefficients table) shows that the predictors that should be kept in the model explain true 99 percent of the variance in the outcome variable. A large difference between the R-squared and Adjusted R Square values suggests a poor model fit. Any superfluous variable introduced into a model reduces adjusted R squared. Adjusted R squared, on the other hand, will rise when more beneficial variables are included. R2 will always be less than or equal to adjusted R2. As a result, adjusted R2 compensates for the number of terms in a model.
The histogram of residuals for the constructed model of the steel as an example is shown in Fig. 2. (a) The histogram displays a plot of the regression standardized residuals versus the regression standardized predicted values, demonstrating that the residuals are normally distributed. The points on the plot are roughly randomly distributed, indicating that the assumption of homoscedasticity or equality of variances has been realized.
Results Of Arima Modeling
Stationary test
A general ARIMA modeling and predicting methods are outlined in this section. This procedure is clearly depicted in (Fig. 1). It's worth noting that this isn't a straightforward sequential procedure it can contain repetitive loops based on the results of the diagnostic and forecasting stages. ARIMA model is used to examine stationary time-series data, the data must first be determined to be stationary in terms of mean and variance. The steel, cement, brick, gravel, and ceramic historical price data is plotted in Fig. 4. The result shows that for all types of material used, the data was non-stationary in the first inspection. Taking the natural logarithm of the material type’s data to eliminate its non-stationary, and taking the augmented dickey-fuller test (ADF) for the logarithm, it was found that it is still greater than the critical value of the significance level of 0.01, 0.05. Further, the first-order difference is performed and a DLsteel, DLcement, DLgravel, DLceramic, and DLbrick sequence are obtained. After taking the logarithm and the first difference for the above-mentioned types of materials, ADF became smaller than the critical value. That is to say, the series became stationary and the significance test for stationary was passed as shown in (Fig. 3).
Model Identification
The next step is to develop a suitable ARMA form to model the stationary series after determining the correct order of differencing required to make the series stationary. The Box-Jenkins procedure is used in the classic method, which involves an iterative process of model identification, model estimation, and model evaluation. The Box-Jenkins process is a semi-formal approach that relies on subjective evaluation of plots of auto-correlograms and partial auto-correlograms of the series to identify models. Plotting the auto-correlogram of a time series is another technique to investigate its characteristics. The auto-correlogram shows the autocorrelation between time series with different lag lengths. The auto-correlogram must be plotted before the Box-Jenkins model can be identified. A Box-Jenkins technique includes evaluating plots of the sample auto-correlogram, partial auto-correlogram, and inverse auto-correlogram and inferring the correct type of ARMA model to use from patterns detected in these functions. This section outlines the theoretical auto-correlogram for various orders of AR, MA, and ARMA models. EViews software had been used to conduct the Auto-Correlation (ACF) and Partial Auto-Correlation (PACF) for all aforementioned material types. Figure 4 shows the ACF, and PACF for steel and cement as an example for the model identification process.
Identify The Most Significant Model
Time series analysts have sought alternate objective approaches for finding ARMA models due to the extremely subjective nature of the Box-Jenkins methodology. The Akaike Information Criterion [AIC] or Final Prediction Error [FPE] Criterion (Akaike, 1974), the Schwarz Criterion [SC] or Bayesian Information Criterion [BIC] are examples of the identifying criteria, time series analysts have used them to resolve the need to minimize mistakes. For this study, eight models were done for each type of material and then the best model was chosen based on the value of adjusted R-squared, Akaike info criterion (AIC) value, and Schwarz criterion (SC) value. The least AIC value and the SC value, on the other hand, are insufficient requirements for the best ARMA model. The procedure followed in this study was to first create a model with the lowest Root Mean Square Error (RMSE), AIC and SC values, and then execute a parameter significance test and a residual randomness test on the estimation result. If the model passes the test, it can be considered the best model. If it fails the test, the second least AIC and SC values are chosen, and the appropriate statistical test is run. And so on, until you've picked the best model. Table.4. shows the most significant model chosen for each type of material. The criteria used for the judgment take the following form:
Where N = total number of data points; yt = actual material price; yˆt = forecasted material price; y¯t = mean of actual material prices; and k = total number of estimated parameters.
Table 4
Significant model for each type of material.
Criteria
|
Steel
|
Cement
|
Gravel
|
Brick
|
Ceramic
|
ARIMA (1,1,0)
|
ARIMA (2,1,0)
|
ARIMA (1,1,0)
|
ARIMA (1,1,1)
|
ARIMA (1,1,0)
|
AIC
|
-3.31
|
-2.96
|
-4.6
|
-5
|
-4.02
|
SC
|
-3.24
|
-2.89
|
-4.5
|
-4.9
|
-3.9
|
R2
|
.05
|
.089
|
.009
|
.04
|
.001
|
RMSE
|
.22
|
.12
|
0.1
|
.05
|
.07
|
Model Diagnostic
The formal evaluation of each of the time series models will be the next stage. This will entail a thorough examination of each model's diagnostic tests. A variety of diagnostic techniques are available to ensure that an acceptable model is created. A useful diagnostic check is plotting the estimated model's residuals. This should highlight any outliers that may have an impact on parameter estimations, as well as any potential autocorrelation or heteroscedasticity issues. Plotting the auto-correlogram of the residuals provides the second test of model appropriateness. The residuals should be 'white noise' if the model is appropriately described. As a result, a plot of the auto-correlogram should die out after one lag.
Comparison Of Arima And Regression Prediction Models
To validate the proposed time series models, the predictive accuracy of the Box–Jenkins model (ARIMA) was compared to that of structural multiple regression models. The actual material prices series were used as a basis. The validity of each model was tested using the actual and the predicted values for the six-month out-sample from January 2021 to June 2021. We found that the predicting accuracy level of the regressions model compared to that of the ARIMA model is not very significant, as shown in Table 5 and Fig. 6. Given the small forecast error of both models, it may be stated that both models performed well in terms of predicting. However, from the test data, the ARIMA model outperforms the regressions model in terms of forecasting accuracy as shown in Fig. 6. This finding demonstrates that in the case of material prices that have time-series data, time series models able to predict well. The recommended model for the prediction of each type of material is explained in Table 5. According to the value of the mean absolute percentage error (MAPE) donated by *
Table 5
Best fitting model for each type of material according to MAPE value.
Material type
|
ARIMA
|
ARIMA
|
Regression
|
(10 years history data)
|
(5 years history data)
|
|
Steel
|
7.8
|
2.8*
|
12.3
|
Cement
|
1.8
|
1.7*
|
7.1
|
Brick
|
1.5*
|
3
|
21
|
Ceramic
|
2
|
1.9*
|
9
|
Gravel
|
1.4*
|
2.6
|
3
|
(*) Best model for prediction
|