Data
The current study used secondary data from the Rwanda National Healthcare Account (NHA) produced by the Rwanda Ministry of Healthcare (MOH) in partnership with the World Health Organization (WHO). The data give detailed information on Rwanda’s Healthcare expenditure by both public and private sides. Since the result of the this study depend on the precision and accuracy of the data, this study used a quarterly time series data from 2006 to 2018 with the following variables: Total Healthcare expenditure current price, Government Healthcare expenditure and households Healthcare expenditure.
Models
In the analysis of historical data, time series forecasting is a very famous technique for forecasting a country’s Healthcare expenditure however, Healthcare expenditure data contains a mixture of linear and non-linear characteristics. Therefore, due to its complexity, a single model cannot capture all the patterns in Healthcare expenditure data. As result, a variety of methods including linear and non-linear time series methods are used in the prediction (Jianqing Fan, 2003).
Autoregressive Integrated Moving Average (ARIMA) models
ARIMA model is among the famous methods in modeling non-stationary data. The ARIMA model, rt is expressed as a function of lagged values and stochastic error components while this is not possible in linear regression models. The general and standard form of the ARIMA model is ARIMA (p, d, q). An ARIMA is presented as:
If\({w}_{t}={\varDelta }^{d}{r}_{t}={(1-B)}^{d}{r}_{t} \left(1\right)\)
Then\({w}_{t}={\phi }_{1}{w}_{t-1}+{\phi }_{2}{w}_{t-2}+\dots {\phi }_{p}{w}_{t-p}+{\epsilon }_{t}-{\theta }_{1}{\epsilon }_{t-1}-{\theta }_{2}{\epsilon }_{t-2}-\dots -{\theta }_{q}{\epsilon }_{t-q} \left(2\right)\)
The seasonal autoregressive integrated moving average model is presented as (p, d, q) (P, D, Q), where p represents the number of auto-regressive terms, q denotes the number of moving average terms and d denotes the number of times a series must be differenced to induce stationarity. And P is the number of seasonal autoregressive components, Q: number of seasonal moving-average terms, and D is the number of seasonal differences required for the series to be stationary (Box & Jenkins, 1994), (Brockwell & Davis, 1996).
Identification
When a positive d is set to transform a nonstationary series, the order p and q) of the corresponding intermediate terms must be identified to represent the dominant features of the information. Therefore, the filtering graph procedures (ACF and PACF schemes) were used.
Estimation
when estimating θ1...θp and φ1...φq of AR and MA processes respectively, the estimation is based on the PACF and ACF, respectively. If the series is an Auto-regressive process, the coefficients are estimated by the least-squares method. If there is a MA or ARMA process in this array, then non-linear estimation techniques, such as maximum likelihood, can be used to estimate parameters using numerical optimization algorithms.
Diagnostic Checking
after the model fits perfectly with the forecasting procedures mentioned above, the step to follow is to carry out the diagnostic checking for the model. The importance of diagnostic inspection coefficients refers to statistical tests for residual behavior and the order of the model. If the predicted model data represents the process of emergence, the residuals behave as white-noise
This means that the residues have not to be auto-correlated. Auto-correlation of the residue is examined by Lung-Box’s Statistics Q (1978, 1979) and is defined as:
$${Q}_{LB}=n\left(n+2\right)\sum _{s=1}^{m}\frac{{\widehat{\rho }}_{s}^{2}}{n-s} \left(3\right)$$
Where,
n: denotes the number of observations
m: denotes the number of auto-correlation coefficients and\(m=\sqrt{n}\)
\({\rho }_{s};\) denotes the sampled residual auto-correlation.
To test whether the residuals are normally distributed, the standard residual graph along with the Shapiro-Wilk test is used. When comparing the descriptive efficiency of alternative models that differ in both number of parameters and sample size, Akaike, Hannan-Quinn and Schwartz are used as the evaluation metrics. The lower the value for these three metrics, the better the model is. If partial auto-correlations and auto-correlations are both of low value, the model is then preferred as adequate for future forecasting.
Forecasting
The predictive method for future phase range values is derived from the previous phases, and it refers to the most appropriate model derived from the previous stages. The estimation of the Auto-Regressive Integrated Moving average model is evaluated using the Mean Squared Error (MSE) as the optimum metric. Other indicators commonly used to measure predictive accuracy are the Root Mean Square Error (RMSE), the Mean Absolute Error (MAE), and the Theil (U-Theil) inequality coefficient. These indicators are listed as follows
$$MSE=\frac{1}{T}{\sum _{t=1}^{T}({\widehat{y}}_{t}-{y}_{t})}^{2} \left(4\right)$$
$$MAE=\frac{1}{T}\sum _{t=1}^{T}|{\widehat{y}}_{t}-{y}_{t}\left| \right(5)$$
$$RMSE=\sqrt{\frac{1}{T}{\sum _{t=1}^{T}({\widehat{y}}_{t}-{y}_{t})}^{2}} \left(6\right)$$
Theil Inequality Coefficient is as follows:
$$U=\frac{\sqrt{\frac{1}{T}{\sum _{t=1}^{T}({\widehat{y}}_{t}-{y}_{t})}^{2}}}{\sqrt{\frac{1}{T}{\sum _{t=1}^{T}({\widehat{y}}_{t})}^{2}}+\sqrt{\frac{1}{T}{\sum _{t=1}^{T}({y}_{t})}^{2}}} 0\le U\le 1 \left(7\right)$$
Where,
\({y}_{t}\) : is the actual value of the endogenous variable y in time t.
\({\widehat{y}}_{t}\) : is the revised value of the endogenous variable y in time t.
T: is the number of observations in the simulation (of a sample).
When Theil Inequality Coefficient U = 0, the estimated values are equal to the actual values of the series \({y}_{t}\)=\({\widehat{y}}_{t}\) for all t. This represents a case where the actual and predicted values perfectly fit. On the other hand, if U = 1, the forecasting is not correct for the sample being investigated.
Artificial Neural Network (ANN) models
NNs are an architecture with interconnected neurons which that aim of mimicking the functionality of the human brain(McCulloch & Pitts, 1943). The ANNs structure's main challenge is to identify the optimum number of layers and nodes in the layer in the time series prediction. This is determined through experiment because there is no theoretical basis for identifying the parameters. Simple architecture for ANN model of \(p*q*1\) is
$${y}_{t}={\varnothing }_{0}+\sum _{j=1}^{q}{\varnothing }_{j}g+\left[{\theta }_{0j}+\sum _{i=1}^{p}{\theta }_{ij}{y}_{t-1}\right]+{\epsilon }_{t} \left(8\right)$$
Here,\(j\left(j\text{0,1},2\dots q\right),ij(i\text{0,1},2\dots p;j\text{0,1},2\dots q)\)
ARIMA-ANN hybrid model (Zhang’s hybrid model)
When linear and nonlinear behaviors are in the data under study, each of the ARIMA and ANN is no longer appropriate to model the series. Thus, the model that can capture both linear and non-linear patterns are preferred. Zhang stipulated that it is rational to develop a hybrid model that combines both ARIMA and ANN for modeling linear and non-linear behavior in the historical data (Zhang G., 2003.). According to Zhang, there are:
Additive hybrid model
Multiplicative hybrid model
Where, yt is the observation at time t and Lt, Nt denote linear and nonlinear components respectively at time t. Auto-regression is appropriate for linear component and the forecasted value is Lt at time t and \({e}_{t}={y}_{t}+\widehat{{L}_{t}}\) is the residual at time t.
Zhang confirmed that, ANN is appropriate in modelling the residuals from ARIMA which only have non-linear relationship (Zhang G., 2003.). Using \(\rho\) input nodes, the ANN for residuals is represented as follow:
\(\) \({e}_{t}f{e}_{t-1},{e}_{t-2},\dots ,{e}_{t-\rho }+{\epsilon }_{t} \left(9\right)\)
Where \(f\) denotes a non-linear function by the ANN and \(\epsilon\)t represents the white-noise. By ( \(\widehat{{N}_{t}}\))
Representing the ANN, then the hybrid estimate at time t is:
\(\widehat{{y}_{t}}=\widehat{{L}_{t}}+\widehat{{N}_{t}}\) For Additive hybrid model
\(\widehat{{y}_{t}}=\widehat{{L}_{t}}*\widehat{{N}_{t}}\) For Multiplicative hybrid model
The evaluation metrics namely Mean Absolute Percentage Error (MAPE) and Root Mean Squared Error (RMSE) are used to evaluate the hybrid models.