Prediction of construction material prices using ARIMA and multiple regression models

doi:10.21203/rs.3.rs-2481703/v1

Construction Material Prices (CMP) variations have become a major issue in properly budgeting construction projects. Inability to accurately forecast CMP volatility can also lead to price overestimation or underestimation. Enhancing the accuracy of predictions of CMP can also enhance the accuracy of predictions of total construction costs. The purpose of this study is to present a model for predicting construction material prices that assist decision-makers to make better decisions over the life cycle of a project. The price records for CMP namely; steel, cement, brick, ceramic, and gravel, and the indicators affecting them in Egypt were used for the prediction procedures. The practical methods for using the Box-Jenkins approach Autoregressive Integrated Moving Average (ARIMA) time series and multiple regression models for forecasting building material prices are outlined in this research. Out-of-sample predictions are used to evaluate the provided model's performance in predicting future prices. The models are compared according to the Mean Absolute Percentage Errors (MAPE). The generated models show good results in predicting month-to-month variations in material prices, with MAPE ranging from 1.4 to 2.8 percent for the selected models. This research can assist both owners and contractors in improving their budgeting processes, and preparing more accurate cost estimates.

Construction materials

Prices

Forecast

Model

Box-Jenkins

Regression

The number of large-scale construction projects for residential, commercial, and government structures has recently surged around the world. Construction costs for mega projects have become a major source of concern under current conditions, due to their high prices and numerous design modifications during their long construction durations. Contractors have also been impeded to create accurate cost estimates as a result of this issue. Since the material prices can account for up to one-fourth of overall project costs (Hwang et al, 2012). The wide variation in the CMP makes accurate planning and cost estimating difficult for both owners and contractors. Contractors may lose bids or revenues owing to cost overestimation or underestimation (Ashuri and Lu, 2010). Many scientists attempt to accurately forecast cost increases, but predicting prices for a variety of construction materials requires a simple and automated procedure. Enhancing the efficiency of material price predictions can also improve the accuracy of total cost estimates. Various project stakeholders might benefit from predicting short- and long-term variations in construction material prices. Contractors can avoid losing bids or profits by enhancing the accuracy of their cost forecasting. This avoidance of losses leads to fewer hidden price contingencies postponed or canceled projects, budget irregularities, and erratic project flows. Owners of projects might profit from avoiding these undesirable consequences.

To account for probable changes in future material prices, cost estimators, for example, raise the estimated material costs to the planned construction date's midpoint (Anderson et al, 2006). Another cost estimators followed the method of adding a fixed percentage of the overall estimated cost as a risk premium to account for material prices increases, such as asphalt cement (Laryea and Hughes, 2009). These simplistic solutions have ignored the fact that CMP varies significantly even over short periods of time. Considerable uncertainty regarding the rate of escalation for material prices, a probabilistic approach based on Monte Carlo simulation was utilized to assess the project cost range (Back et al, 2000). Monte Carlo simulation was used to generate random values for the escalation rate of material price. Monte Carlo simulation does not address the impacts of autocorrelation in historical CMP, which is a critical flaw in this approach (autocorrelation represents the relationship between a time series of variables over various time intervals). The results show that the suggested model performs similarly to present practice in terms of expectation while also offering theoretical uncertainty bounds that are well suited to future volatility, which is possibly more relevant.

Literature review

Numerous studies have sought to address cost escalation factors by concentrating on rapidly fluctuating construction material costs in an effort to make cost planning more feasible. The primary problems here are identifying escalation drivers and properly and simply calculating project costs. Shiha et al. (2020) represented three models that employ artificial neural networks (ANNs) to estimate future costs of major building materials, such as steel reinforcing bars and Portland cement, six months ahead in the Egyptian construction sector. The three models were included Genetic Algorithm (GA), Neural Tools software, and the Python programming language. Historical data on steel and cement prices, as well as macroeconomic indices, were used in Egypt to train, test, and validate the suggested models. Marzouk and Amin (2013) formulated a fuzzy logic to assess the degree of importance of each material type through the three main criteria, 1. The percentage of the elements participating in the total cost items; 2. The difference in the calculating price index of the elements during the research period; 3. The percentage difference in the price of the cost elements. In this research, they also made comparisons between the Artificial Neural Network and Regression Analysis. Results showed that the technique of Neural Networks surpasses regression analysis according to the estimated error.

Lee et al. (2019) suggested a technique for forecasting raw material prices with the purpose of inspiring more accurate predictions. The prediction approach is a multivariate analysis of the time series, and the prediction goal is iron ore price, which is the primary driver of steel raw material prices. The accuracy of the prediction results over a specified period was compared with past average values. The results show that the proposed method is 2.3 times more accurate than previous average values. Faghih and Kashani (2018) introduced a vector error correction (VEC) model for estimating construction material prices in the United States. The association between construction material pricing and a collection of key explanatory factors was studied using this model. The use of VEC models to anticipate construction material prices filled a gap in the current literature caused by the necessity of forecasting both short and long-term movements of particular construction materials being overlooked.

Kissi et al. (2018) modelled the tender price index (TPI), in Ghana using an autoregressive integrated moving average with exogenous factors. The results showed that the ARIMAX model outperformed the single method in terms of predictive ability. The study backs up prior research by emphasizing the importance of using an integrated model technique to forecast TPI. Oshodi et al. (2017) studied the accuracy of employing univariate models for tender price index predictions. The modeling tools used in this study were Box-Jenkins and neural networks. In terms of accuracy, the results show that the neural network model outperforms the Box Jenkins model. Ilbeigi et al. (2017) defined and analyzed the observed fluctuations in actual asphalt and cement prices over time in order to create time series, forecasting models. This study investigated whether and how time series prediction models can expect future prices with higher accuracy compared to established approaches. Four univariate time series forecasting models, namely Holt Exponential Smoothing (ES), Holt-Winters ES, Autoregressive Integrated Moving Average (ARIMA), and seasonal ARIMA (SARIMA), are generated to study the short-term variation in the future prices. The forecast results show that all four models of the time series can predict prices with better accuracy than the current approaches, such as the Monte Carlo simulation. The ARIMA and Holt ES models were among the four most reliable predictive models with errors of less than 2%. To forecast future values for the Engineering News-Record (ENR) CCI over a 12-month period, Ashuri and Lu (2010) employed an ARIMA model that took seasonality into account. The mean absolute error (MAE), mean square error (MSE), and mean absolute percent error (MAPE) were used to assess forecasting accuracy. Sonmez et al. (2007) researchers employed regression analysis to develop a model that included 14 potential independent components to anticipate cost contingency in international projects. Abu Hammad et al. (2010) used many explanatory parameters, such as project area and duration, to design a probabilistic regression model to predict the cost of public building projects.

These models were beneficial for addressing cost escalation issues and preliminary estimating in the early design phase, but they have certain limitations in time-varying variables and representing different time lags between influence elements. In fact, a lot of time-related data is dependent or has an autocorrelation. Applying time-related techniques to anticipating trends in material prices is one way to address these restrictions. Time-series approaches, which predict the future increase of a variable based on historical values of the variable and other relevant factors, have been used to handle time-related problems in the aforementioned methodologies. Time-series models are used to forecast trends in a systematic and time-related manner. Such that, based on historical trends, it is possible to generate useful projections (Wong et al, 2005).

Research Objective

To make the research presented in this paper suitable to accurate and updatable material price predicting, an automated forecasting system is developed on the basis of both ARIMA, and regression modeling process using historical Egyptian data. A time series forecasting model identifies relevant traits in the past of a variable and predicts future values using those traits and earlier observations. Regression models account for the fact that price fluctuations are influenced by a variety of independent variables. As a result, this paper's aims is to show that price projection models may be created that perform well in terms of expectation and, produce great estimates of future material price volatility (even when data availability is limited). The data utilized in this analysis is accessible public databases, and the techniques used in this analysis are available in several statistical software packages (the analysis is conducted in SPSS and EViews). Making this research practical and implementable for both practitioners and academics, the goals of this study are to:

Discover and analyze fluctuations in actual material prices.

Apply this information to develop CMP forecasting models.

Evaluate if the proposed models can estimate future prices more accurately.

To satisfy the research objectives, the remainder of the study is organized as follows. The subsequent section is the recommended research approach and the steps taken in this study. The material pricing time series data set is introduced, and its main features (autocorrelation and stationary) are studied. The most important indicators influencing the CMP are listed. ARIMA and regression models were constructed based on the stated properties. Each model's predictability is assessed and compared.

Accurate forecasting of construction material prices is an essential practice, particularly in developing countries where high price fluctuations can adversely affect the success of projects and even their viability. To avoid this, a system that can predict the size of the change in material prices with acceptable accuracy is required. As a result, a technique is used, with univariate time-series (ARIMA) and regression approaches used to forecast material prices. Figure 1 presents the process map of the procedures employed in this study. The methods include all important information about the required data, where and how it was obtained, and how a sample was chosen. This method entails four high-level processes, which are briefly outlined below and expanded upon later.

Determine the long-term price trend of construction materials, as well as the most relevant indicators influencing price change over time. Conducting the ARIMA models using the material prices historical data, and regression model using the historical data as a dependent variable and the indicators which has a significant relationship with material prices as independent variables.

Validate relative model performance to existing practice, out-of-sample projections are used to assess the accuracy of price projection models to current assumptions. Price forecasts are established previous to the present and compared to what really happened in this step.

Compare results.

Recommend the best fitting model for each material type.

Prediction Of Construction Material Prices

Data description (input data)

Models were created using publicly available price data from CAPMASS Egypt. The types of material used in this search are steel, cement, brick, ceramic, and gravel. Which represent an important part of all the construction work items.

Arima Model

The Box-Jenkins approach (ARIMA) model forecast is a time series prediction approach that is relatively advanced. It is capable of describing the dynamic change rules realistically. Under certain conditions, it can be utilized to do statistical analysis and forecast for time series. The model is particularly well suited to short-term forecasting. When the predicting time scale is long, large variances occur. There are three stages to conduct the ARIMA model: (1) model identification, (2) parameter estimation, and (3) diagnostic checking. Stationary time-series data are appraised in the model identification step, while non-stationary data are turned into stationary data using normal differencing or logarithmic transformation. Transformed data can be used in the next modeling stage. In prediction forms, the ARIMA model can be expressed using Eq. (1). The order of the AR model (p) represents the order of the autoregressive component, and the order of the MA model (q) represents the order of the moving average element are then established by examining the autocorrelation coefficient (ACF) plot and partial auto-correlation coefficient (PACF) plot to determine ARMA (p, q) models using the provided lag numbers, while the order of differencing I (d) is identified through the model identification stage as the number of differencing to make data stationary. Fitted ARIMA models are recognized during the parameter estimation stage. For ARIMA models, EViews software version 10 was utilized. EViews is a software application that is specifically built to process time-series data. The models were made based on the published monthly prices data of materials during the last 10 years (January 2011 to December 2020) and 5 years (January 2016 to December 2020). After testing the appropriate model for each type of material, the model was used to predict price values during the first six months of 2021. The least values of mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE) criteria can be used to choose the best model. Most researchers suggested MAPE as a method for judgment. In prediction models, the MAPE is commonly agreed to be 10% (Fan et al, 2010; Hwang et al, 2012). MAPE is calculated using Eq. (2).

$$Yt=c+{\sum }_{i=1}^{p}\alpha {Y}_{t-i}+{\sum }_{j=1}^{q}\theta {E}_{t-j}+\epsilon t \left(1\right)$$

C is constant; p is the order of the autoregressive component (AR); q is the order of the moving average component (MA); $\alpha$ is the coefficient of the autoregressive model; $\theta$ is the coefficient of the moving average model; $\epsilon t$ is the error Term.

Mean absolute percentage error equation:

MAPE=$\frac{1}{n}\sum _{t=1}^{n}\left|\frac{{Y}_{t}-{f}_{t}}{{Y}_{t}}\right|*100 \left(2\right)$

${Y}_{t }$ is the actual value at any specified time; f_t is the forecasted value at any specified time; n is the number of forecasts.

Multiple Regression Models

Multiple Linear Regressions (MLR) is a linear statistical strategy for investigating the relationships between a dependent variable and two or more independent variables. When the focus is on the link between a dependent variable and one or more independent variables, it encompasses numerous approaches for modeling and evaluating multiple variables. When one of the independent variables is changed while the other independent variables remain constant, regression analysis can help you understand how the typical value of the dependent variable (or 'criterion variable') varies. As a result, it provides a strong basis for predicting price changes. For the regression model, IBM Corporation's SPSS statistical program version 25 was used to analyze the data. A set of nineteen indicators that can potentially influence the CMP were identified through literature review extensive study. The information collected was split into two categories: independent input variables and dependent output variables. When various inputs are used to predict an output, the primary assumption is that these inputs are independent variables that predict the output dependent variable. Raw prices are considered as a dependent variable, and indicators affecting construction material prices are used as independent variables. For this study, the indicators used have publicly published data on one of the official websites indicated in Table 1. If y is a dependent variable and X₁,..., X_n are independent variables, the multiple regression model predicts y from x in the following manner:

Y=$C+{b}_{1}{X}_{1}+{b}_{2}{X}_{2}+ \dots \dots +{b}_{n}{X}_{n} \left(3\right)$

Where Y denotes the output of the dependent variable, C denotes the constant, b denotes regression coefficients, and X denotes the input of independent variables.

The following assumptions govern the multiple regression models:

Linearity: The dependent variable y is a linear combination of the independent variables x_1……….., x_n

Independence: Observations are chosen from the population independently and randomly.

Normality: Observations are distributed regularly.

Variance homogeneity: All observations have the same variance.

Regression models were created for each type of material. Then each model was used to predict the future values of prices using out-sample method. The prediction period is the first six months of the year 2021. Results of the prediction process are then compared with the actual values. The error rate was calculated for each value separately, and then the MAPE was calculated for each model following Eq. 2.

Table 1

Indicators, references and their data source.
No.	Indicator	Paper	Year	source
1.	Unemployment rate	Faghih and Kashani	(2018)	CAPMAS^(a)
2.	Employment rate (ER)	Shahandashti and Ashuri	(2016)	CAPMAS^(a)
3.	industrial production (IP)	Grum and Govekar	(2016)	Trading Economics ^(b)
4.	Exchange rate	Akanni et al.,	(2014)	Central bank of Egypt ^(c)
5.	Interest rate	Akanni et al.,	(2014)	Central bank of Egypt ^(c)
6.	EGX index	Shahandashti and Ashuri	(2013)	The Egyptian Exchange ^(d)
7.	wages (Average hourly earnings (AVE))	Shahandashti and Ashuri	(2013)	CAPMAS^(a)
8.	Balance of Payment	Oladipo and Oni	(2012)	World bank ^(e)
9.	Export	Oladipo and Oni	(2012)	World bank ^(e)
10.	External Debt	Oladipo and Oni	(2012)	World bank ^(e)
11.	External Reserve	Oladipo and Oni	(2012)	World bank ^(e)
12.	Import	Oladipo and Oni	(2012)	World bank ^(e)
13.	Inflation rate	Windapo and Cattell	(2012)	CAPMAS^(a)
14.	National revenue	Ashuri et al.,	(2012)	The global Economy ^(f)
15.	Producer Price Index	Ashuri et al.,	(2012)	CAPMAS^(a)
16.	National expenditure	Chen, H. L.	(2010)	The global Economy ^(f)
17.	GDP-construction	Ng et al.,	(2000)	CAPMAS^(a)
18.	Gross domestic product(GDP)	Ng et al.,	(2000)	CAPMAS^(a)
19.	Money supply (MS)	Ng et al.,	(2000)	Trading Economics ^(b)
(a) https://www.capmas.gov.eg/ (b) https://tradingeconomics.com/egypt/ (c) https://www.cbe.org.eg/ar/Pages/default.aspx		d) https://egx.com.eg/en/homepage.aspx e) https://data.worldbank.org/ f) https://www.theglobaleconomy.com/Egypt/

Results of Regression modeling

The aim of the research by using the multiple regression models is to find out whether it is possible to describe the relationship between prices and the influencing indicators through some equations. Interpretation of the results includes the issues of 1) analyzing the data, 2) estimating the model, i.e. fitting the line, and 3) evaluating the validity and efficiency of the model. SPSS software was used to analyze the data. The actual price is the (Y) dependent variable in the regression analysis. The independent variables (X) that have been assigned are shown in Table.1

Analyzing The Data

This study aims to create prediction models using only significant indicators i.e., indicators with strong t-statistics and a significance value of less than 0.05 are used in the prediction process. As a result, the final model may not include all of the indicators you selected. These tests of significance are useful for determining if each explanatory variable is required in the model, assuming that the others are already present. As a result, the "P-value" column in Table 2 represents the significant level. In the case of steel as an example, indicators inflation rate, GDP-construction, GDP, revenue, expenditure, industrial production, import, export, external reserve, and balance of payment have p-value of (0.058, 0.635, 0.983, 0.983, 0.313, 0.52, 0.322, 0.444, 0.801, 0.983) > 0.05 respectively. The test tells us that these indicators are not significant for the modeling process, while the other indicators which have p-value (0.00) < 0.05 add a significant contribution to explaining the change in steel prices as indicated in Table 2.

Estimated Models Coefficients

General forms of the equations for predicting material prices for the fore-mentioned types are obtained from Table 2. When all other independent variables are held constant, coefficients show how much the dependent variable varies with an independent variable. The regression coefficient provides the prospective change in the dependent variable for an increase of one unit in the independent variable.

Table 2. Equations’ coefficients for the Predicted models.
Model	Steel		cement		Brick		ceramic		Gravel
Model	coefficient	P-value	coefficient	P-value	coefficient	P-value	Coefficient	P-value	coefficient	P-value
constant	-6842.9		-74.664		834.04		11.653		59.7
Inflation rate	_	0.058	_	0.876	7.85	0.00	_	0.479	_	0.583
Exchange rate	140.05	0.00	_	0.639	_	0.808	1.253	0.00	_	0.123
Interest rate	370.7	0.00	_	0.605	6.8	0.00	- 0.763	0.00	_	0.642
GDP-construction	_	0.635	_	0.741	_	0.690	_	0.902	_	0.202
Producer Price Index	15.8	0.00	_	0.538	_	0.646	0.162	0.00	0.144	0.00
Gross domestic product (GDP)	_	0.983	_	0.736	_	0.854	_	0.782	_	0.076
EGX index	2.3E-09	0.00	3.2E-10	0.00	_	0.796	-3E-11	0.00	_	0.096
Employment rate (ER)	184.9	0.00	_	0.549	_	0.547	_	0.477	_	0.671
Revenue	_	0.983	_	0.992	-0.002	0.00	_	0.988	_	0.397
Expenditure	_	0.313	0.601	0.00	1.64	0.00	0.029	0.00	_	0.342
wages	-1.8	0.00	0.211	0.00	_	0.258	0.009	0.00	_	0.438
industrial production (IP)	_	0.520	-26.023	0.00	24.95	0.00	_	0.193	_	0.346
Import	_	0.322	_	0.967	_	0.854	_	0.915	_	0.939
Export	_	0.444	_	0.985	_	0.854	_	0.886	_	0.915
External Reserve	_	0.801	3.046	0.00	_	0.333	0.274	0.00	0.303	0.00
Money supply (MS)	1404.9	0.00	63.382	0.00	144.163	0.00	_	0.775	19.64	0.00
External Debt	-2E-08	0.00	-6E-09	0.00	-7E-09	0.00	-2E-10	0.00	-2E-10	0.00
Unemployment rate	-270.7	0.00	13.751	0.00	-49.7	0.00	_	0.374	-1.7	0.00
Balance of payment	_	0.983	_	0.969	-4E-09	0.00	-3.193E-10	0.00	_	0.915

Determine The Suitability Of The Models

The models' summaries are indicated in Table 3. This table provides the values of R, R square (R²), and adjusted R²for the estimate, which can be used to determine the appropriateness of the regression models for the data

Table 3

Model summary
Material type	Model summary
Material type	R	R Square	Adjusted R Square
Steel	0.995	0.990	0.990
Cement	0.981	0.963	0.960
Brick	0.994	0.987	0.986
Ceramic	0.994	0.988	0.987
Gravel	0.997	0.993	0.993

The value of R, the multiple correlation coefficients, is represented in the "R" column. R can be thought of as a metric for the accuracy of the dependent variable's prediction. For the steel model, a value of 0.995 implies a good level of predictability. As displayed in the "R Square" column, the R2 value (also known as the coefficient of determination) indicates the proportion of variance in the dependent variable that can be explained by the independent variables. Our steel model's result of 0.99 shows that our independent variables account for 99 percent of the variability in our dependent variable. R-squared appears to be a simple statistic that measures how well a regression model fits a set of data. However, it does not provide us with a good ending. R2 value must be associated with residual plots, other statistics, and an in-depth understanding of the topic area to get the entire picture. Another key issue is to appropriately provide the data interpretation of "Adjusted R Square" (adj. R²). In this example, a result of 0.99 (coefficients table) shows that the predictors that should be kept in the model explain true 99 percent of the variance in the outcome variable. A large difference between the R-squared and Adjusted R Square values suggests a poor model fit. Any superfluous variable introduced into a model reduces adjusted R squared. Adjusted R squared, on the other hand, will rise when more beneficial variables are included. R2 will always be less than or equal to adjusted R2. As a result, adjusted R2 compensates for the number of terms in a model.

The histogram of residuals for the constructed model of the steel as an example is shown in Fig. 2. (a) The histogram displays a plot of the regression standardized residuals versus the regression standardized predicted values, demonstrating that the residuals are normally distributed. The points on the plot are roughly randomly distributed, indicating that the assumption of homoscedasticity or equality of variances has been realized.

Results Of Arima Modeling

Stationary test

A general ARIMA modeling and predicting methods are outlined in this section. This procedure is clearly depicted in (Fig. 1). It's worth noting that this isn't a straightforward sequential procedure it can contain repetitive loops based on the results of the diagnostic and forecasting stages. ARIMA model is used to examine stationary time-series data, the data must first be determined to be stationary in terms of mean and variance. The steel, cement, brick, gravel, and ceramic historical price data is plotted in Fig. 4. The result shows that for all types of material used, the data was non-stationary in the first inspection. Taking the natural logarithm of the material type’s data to eliminate its non-stationary, and taking the augmented dickey-fuller test (ADF) for the logarithm, it was found that it is still greater than the critical value of the significance level of 0.01, 0.05. Further, the first-order difference is performed and a DLsteel, DLcement, DLgravel, DLceramic, and DLbrick sequence are obtained. After taking the logarithm and the first difference for the above-mentioned types of materials, ADF became smaller than the critical value. That is to say, the series became stationary and the significance test for stationary was passed as shown in (Fig. 3).

Model Identification

The next step is to develop a suitable ARMA form to model the stationary series after determining the correct order of differencing required to make the series stationary. The Box-Jenkins procedure is used in the classic method, which involves an iterative process of model identification, model estimation, and model evaluation. The Box-Jenkins process is a semi-formal approach that relies on subjective evaluation of plots of auto-correlograms and partial auto-correlograms of the series to identify models. Plotting the auto-correlogram of a time series is another technique to investigate its characteristics. The auto-correlogram shows the autocorrelation between time series with different lag lengths. The auto-correlogram must be plotted before the Box-Jenkins model can be identified. A Box-Jenkins technique includes evaluating plots of the sample auto-correlogram, partial auto-correlogram, and inverse auto-correlogram and inferring the correct type of ARMA model to use from patterns detected in these functions. This section outlines the theoretical auto-correlogram for various orders of AR, MA, and ARMA models. EViews software had been used to conduct the Auto-Correlation (ACF) and Partial Auto-Correlation (PACF) for all aforementioned material types. Figure 4 shows the ACF, and PACF for steel and cement as an example for the model identification process.

Identify The Most Significant Model

Time series analysts have sought alternate objective approaches for finding ARMA models due to the extremely subjective nature of the Box-Jenkins methodology. The Akaike Information Criterion [AIC] or Final Prediction Error [FPE] Criterion (Akaike, 1974), the Schwarz Criterion [SC] or Bayesian Information Criterion [BIC] are examples of the identifying criteria, time series analysts have used them to resolve the need to minimize mistakes. For this study, eight models were done for each type of material and then the best model was chosen based on the value of adjusted R-squared, Akaike info criterion (AIC) value, and Schwarz criterion (SC) value. The least AIC value and the SC value, on the other hand, are insufficient requirements for the best ARMA model. The procedure followed in this study was to first create a model with the lowest Root Mean Square Error (RMSE), AIC and SC values, and then execute a parameter significance test and a residual randomness test on the estimation result. If the model passes the test, it can be considered the best model. If it fails the test, the second least AIC and SC values are chosen, and the appropriate statistical test is run. And so on, until you've picked the best model. Table.4. shows the most significant model chosen for each type of material. The criteria used for the judgment take the following form:

Where N = total number of data points; yt = actual material price; yˆt = forecasted material price; y¯t = mean of actual material prices; and k = total number of estimated parameters.

Table 4

Significant model for each type of material.
Criteria	Steel	Cement	Gravel	Brick	Ceramic
Criteria	ARIMA (1,1,0)	ARIMA (2,1,0)	ARIMA (1,1,0)	ARIMA (1,1,1)	ARIMA (1,1,0)
AIC	-3.31	-2.96	-4.6	-5	-4.02
SC	-3.24	-2.89	-4.5	-4.9	-3.9
R²	.05	.089	.009	.04	.001
RMSE	.22	.12	0.1	.05	.07

Model Diagnostic

The formal evaluation of each of the time series models will be the next stage. This will entail a thorough examination of each model's diagnostic tests. A variety of diagnostic techniques are available to ensure that an acceptable model is created. A useful diagnostic check is plotting the estimated model's residuals. This should highlight any outliers that may have an impact on parameter estimations, as well as any potential autocorrelation or heteroscedasticity issues. Plotting the auto-correlogram of the residuals provides the second test of model appropriateness. The residuals should be 'white noise' if the model is appropriately described. As a result, a plot of the auto-correlogram should die out after one lag.

Comparison Of Arima And Regression Prediction Models

To validate the proposed time series models, the predictive accuracy of the Box–Jenkins model (ARIMA) was compared to that of structural multiple regression models. The actual material prices series were used as a basis. The validity of each model was tested using the actual and the predicted values for the six-month out-sample from January 2021 to June 2021. We found that the predicting accuracy level of the regressions model compared to that of the ARIMA model is not very significant, as shown in Table 5 and Fig. 6. Given the small forecast error of both models, it may be stated that both models performed well in terms of predicting. However, from the test data, the ARIMA model outperforms the regressions model in terms of forecasting accuracy as shown in Fig. 6. This finding demonstrates that in the case of material prices that have time-series data, time series models able to predict well. The recommended model for the prediction of each type of material is explained in Table 5. According to the value of the mean absolute percentage error (MAPE) donated by *

Table 5

Best fitting model for each type of material according to MAPE value.
Material type	ARIMA	ARIMA	Regression
Material type	(10 years history data)	(5 years history data)
Steel	7.8	2.8*	12.3
Cement	1.8	1.7*	7.1
Brick	1.5*	3	21
Ceramic	2	1.9*	9
Gravel	1.4*	2.6	3
(*) Best model for prediction

The difficulty of construction project partners to precisely estimate future material prices in the market, especially in the face of economic volatility, is a common challenge. This can often prevent developers from investing, reduce contractor profitability, and cause owners to delay payments. The techniques proposed in this research will help construction contractors and owners accurately estimate material prices. These prediction models take advantage of ARIMA's predictive power by learning from historical trends and the power of regression models, which take the affecting indicator into consideration. Although much research has presented numerous prediction models, the value of this study is the power to forecast price fluctuations even in economically unstable circumstances, which other approaches would not have been able to capture.

In the context of the Egyptian construction sector, this research proposes six-month price prediction models for steel reinforcing bars, Portland cement, brick, ceramic, and gravel. As a consequence, relevant Egyptian indicators were collected and associated with material prices during the study period, which ran from November 2018 to January 2020. ARIMA models were built based on the previous historical data of each type of material using the available data of CAPMAS Egypt. Regression models were built using the historical price data as dependent variables and the most important quantifying indicators as independent variables. The mean absolute percentage error (MAPE) of each generated model's predictions was used to evaluate it. Results of this search indicated that Construction material prices have time-series data, therefore, ARIMA models outperformed in predicting future prices of materials with a very small error rate.

Conflict of interest

The authors report no conflicts of interest. The authors alone are responsible for the content and writing of this article. There are no declared competing interests of the authors that are pertinent to the subject matter of this study. No specific grant from funding organizations in any areas was given to this research.

Abu Hammad, A. A., Ali, S. M. A., Sweis, G. J., & Sweis, R. J. (2010). Statistical analysis on the cost and duration of public building projects. Journal of Management in Engineering, 26(2), 105-112.‏
Akanni, P. O., Oke, A. E., and Omotilewa, O. J. 2014. Implications of rising cost of building materials in Lagos State Nigeria. SAGE Open, 4(4), 2158244014561213.‏
Anderson, S. D., Molenaar, K. R., and Schexnayder, C. J. (2006). “Guidance for cost estimation and management for highway projects during planning, programming, and preconstruction.” NCHRP Rep. No. 574, Transportation Research Board, Washington, DC.
Ashuri, B., and Lu, J. (2010). “Forecasting ENR construction cost index: A time series analysis approach.” Construction Research Congress 2010, ASCE, Reston, VA, 1345–1355
Ashuri, B., and Shahandashti, S. M. 2012, April. Quantifying the relationship between construction cost index (CCI) and macroeconomic factors in the United States. In Proceedings of the 48th ASC Annual International Conference, Birmingham City University, Birmingham, April 11 (Vol. 14).‏
Back, W. E., Boles, W. W., & Fry, G. T. (2000). Defining triangular probability distributions from historical cost data. Journal of Construction Engineering and Management, 126(1), 29-37.‏
CAPMAS (Central Agency for Public Mobilization and Statistics). Monthly bulletin of average retail prices of major important building materials. Arab Republic of Egypt: Public Information Center, Printing Office
Chen, H. L. (2010). Using financial and macroeconomic indicators to forecast sales of large development and construction firms. The Journal of Real Estate Finance and Economics, 40(3), 310-331.‏
Faghih, S. A. M., & Kashani, H. (2018). Forecasting construction material prices using vector error correction model. Journal of Construction Engineering and Management, 144(8), 04018075.‏
Fan, R. Y., Ng, S. T., & Wong, J. M. (2010). Reliability of the Box–Jenkins model for forecasting construction demand covering times of economic austerity. Construction Management and Economics, 28(3), 241-254.‏
Grum, B., & Govekar, D. K. (2016). Influence of macroeconomic factors on prices of real estate in various cultural environments: Case of Slovenia, Greece, France, Poland and Norway. Procedia Economics and Finance, 39, 597-604.‏
Hwang et al. (2012) Automated Time-Series Cost Forecasting System for Construction Materials
Hwang, S. (2011). Time series models for forecasting construction costs using time series indexes. Journal of Construction Engineering and Management, 137(9), 656-662.‏
Ilbeigi, M., Ashuri, B., & Joukar, A. (2017). Time-series analysis for forecasting asphalt-cement price. Journal of Management in Engineering, 33(1), 04016030.‏
Kissi, E., Adjei-Kumi, T., Amoah, P., & Gyimah, J. (2018). Forecasting construction tender price index in Ghana using autoregressive integrated moving average with exogenous variables model. Construction Economics and Building, 18(1), 70-82.‏
Laryea, S., and Hughes, W. (2009). “How contractors in Ghana include risk in their bid prices.” Proc., 25th Annual ARCOM Conf., Association of Researchers in Construction Management, Nottingham, U.K., 1295–1304
Lee, C., Won, J., & Lee, E. B. (2019). Method for predicting raw material prices for product production over long periods. Journal of Construction Engineering and Management, 145(1), 05018017.‏
Lowe, D. J., Emsley, M. W., & Harding, A. (2006). Predicting construction cost using multiple regression techniques. Journal of construction engineering and management, 132(7), 750-758.‏
Marzouk, M., & Amin, A. (2013). Predicting construction materials prices using fuzzy logic and neural networks. Journal of construction engineering and management, 139(9), 1190-1198.‏
Oladipo, F. O., & Oni, O. J. (2012). Review of selected macroeconomic factors impacting building material prices in developing countries–A case of Nigeria. Ethiopian Journal of Environmental Studies and Management, 5(2), 131-137.‏
Oshodi, O. S., Ejohwomu, O. A., Famakin, I. O., & Cortez, P. (2017). Comparing univariate techniques for tender price index forecasting: Box-Jenkins and neural network model. Construction Economics and Building, 17(3), 109-123.‏
Shahandashti, S. M., & Ashuri, B. (2013). Forecasting engineering news-record construction cost index using multivariate time series models. Journal of Construction Engineering and Management, 139(9), 1237-1243.‏
Shahandashti, S. M., & Ashuri, B. (2016). Highway construction cost forecasting using vector error correction models. Journal of management in engineering, 32(2), 04015040.‏
Shiha, A., Dorra, E. M., & Nassar, K. (2020). Neural networks model for prediction of construction material prices in Egypt using macroeconomic indicators. Journal of Construction Engineering and Management, 146(3), 04020010.‏
Sonmez, R., Ergin, A., & Birgonul, M. T. (2007). Quantiative methodology for determination of cost contingency in international projects. Journal of Management in Engineering, 23(1), 35-39.‏
Thomas Ng, S., Cheung, S. O., Martin Skitmore, R., Lam, K. C., & Wong, L. Y. (2000). Prediction of tender price index directional changes. Construction Management and Economics, 18(7), 843-852.‏
Williams, T. P. (1994). Predicting changes in construction cost indexes using neural networks. Journal of construction engineering and management, 120(2), 306-320.‏
Windapo, A., & Cattell, K. (2012). Examining the trends in building material prices: built environment stakeholders’ perspectives. Manage Construct Res Pract, 1, 187-201.‏
Wong, J. M., Chan, A. P., & Chiang, Y. H. (2005). Time series forecasts of the construction labour market in Hong Kong: the Box‐Jenkins approach. Construction Management and Economics, 23(9), 979-991.‏

No competing interests reported.

Prediction of construction material prices using ARIMA and multiple regression models

Status:

Version 1

Abstract

Figures

Introduction

Research Objective

Research Methodology

Prediction Of Construction Material Prices

Data description (input data)

Arima Model

MAPE=\(\frac{1}{n}\sum _{t=1}^{n}\left|\frac{{Y}_{t}-{f}_{t}}{{Y}_{t}}\right|*100 \left(2\right)\)

Multiple Regression Models

Y=\(C+{b}_{1}{X}_{1}+{b}_{2}{X}_{2}+ \dots \dots +{b}_{n}{X}_{n} \left(3\right)\)

Data Analysis And Results