This section discusses the methodology of this study. Specifically, it provides data type, sources, model specification and basic panel data econometric tests along with their justifications. It further provides estimation techniques, procedures and justifications.
Data type, sources and model specification
Except for mean years of schooling (from the United Nations Development Programme1) and average dietary energy supply (from the FAO2), all the data were collected from the World Bank3. Notably, the data were collected from 31 SSA countries (Angola, Benin, Botswana, Burkina Faso, Cameroon, Cabo Verde, Chad, Congo Rep., Côte d'Ivoire, Ethiopia, Gabon, The Gambia, Ghana, Kenya, Lesotho, Liberia, Madagascar, Malawi, Mali, Mauritania, Mauritius, Mozambique, Namibia, Nigeria, Rwanda, Senegal, Sierra Leone, South Africa, Sudan, Tanzania and Togo). Since SSA countries suffer from food insecurity and its related health problems, this study believes the sampled countries are appropriate. Further, this study's time scope (from 2001–2018) is also appropriate because it captures the Millennium Development Goals, SDGs and other economic conditions such as the rise of SSA countries' economies and the global financial crisis of the 2000s. Therefore, this study considers various global development programmes and events.
Besides social factors, the study includes economic factors determining people's health status. Moreover, it uses two proxies’ indicators to measure both food insecurity and health status; hence, it specifies the general model as follows:
$$\stackrel{Target variables}{\underset{LNLEXP\left(LNINFMOR\right)}{⏞}}=f(\stackrel{Target variables}{\underset{PRUND\left(AVDES\right)}{⏞}},GDPPC, GOVEXP, MNSCHOOL, URBAN) \left(1\right)$$
The study uses four models to analyze the impact of food insecurity on health outcomes.
$${LNLEXP}_{it }= {\alpha }_{0}+ {\alpha }_{1}{PRUND}_{it}+{\alpha }_{2} {{GDPPC}_{it}+{\alpha }_{3}{GOVEXP}_{it}+{\alpha }_{4}{MNSCHOOL}_{it}+{\alpha }_{5}{URBAN}_{it}+ ղ}_{it} \left(1A\right)$$
$${LNLEXP}_{it }= {\beta }_{0}+ {\beta }_{1}{AVDES}_{it}+{\beta }_{2} {{GDPPC}_{it}+{\beta }_{3}{GOVEXP}_{it}+{\beta }_{4}{MNSCHOOL}_{it}+{\beta }_{5}{URBAN}_{it}+ v}_{it} \left(1B\right)$$
$${LNINFMOR}_{it }= {\theta }_{0}+ {\theta }_{1}{PRUND}_{it}+{\theta }_{2} {{GDPPC}_{it}+{\theta }_{3}{GOVEXP}_{it}+{\theta }_{4}{MNSCHOOL}_{it}+{\theta }_{5}{URBAN}_{it}+ \epsilon }_{it} \left(1C\right)$$
$${LNINFMOR}_{it }= {\delta }_{0}+ {\delta }_{1}{AVDES}_{it}+{\delta }_{2} {{GDPPC}_{it}+{\delta }_{3}{GOVEXP}_{it}+{\delta }_{4}{MNSCHOOL}_{it}+{\delta }_{5}{URBAN}_{it}+ \mu }_{it} \left(1D\right)$$
where LNLEXP and LNINFMOR refer to the natural logarithm of life expectancy at birth and infant mortality used as proxy variables for health outcomes. Similarly, PRUND and AVDES are the prevalence of undernourishment and average dietary energy supply adequacy – proxy variables for food insecurity. GDPPC is GDP per capita, GOVEXP refers to domestic general government health expenditure, MNSCHOOL is mean years of schooling and URBAN refers to urbanisation. Further, \({ղ}_{it}, {v}_{it}, {\epsilon }_{it},{ and \mu }_{it}\)are the stochastic error terms at period t. The parameters \({\alpha }_{0,} {\beta }_{0}, {\theta }_{0},{\delta }_{0}\) refer to intercept terms and \({\alpha }_{1}-{\alpha }_{5}, {\beta }_{1}-{\beta }_{5}, { \theta }_{1}-{\theta }_{5}, and {\delta }_{1}-{\delta }_{5}\) are the long-run estimation coefficients. Since health outcomes and food insecurity have two indicators used as proxy variables, this study estimates different alternative models and robustness checks of the main results.
Basic panel econometric tests and their justifications
Cross-sectional dependence (CD)
A growing body of the panel data literature concludes that panel data models are likely to exhibit substantial CD in the errors resulting from frequent shocks, unobserved components, spatial dependence and idiosyncratic pairwise dependence. Even though the impact of CD in estimation depends on several factors, relative to the static model, the effect of CD in dynamic panel estimators is more severe [22]. Moreover, Pesaran [23] notes that occurrences such as recessions and economic or financial crises potentially affect all countries, even though it might start from just one or two countries. These occurrences inevitably introduce some cross-sectional interdependencies across the cross-sectional unit, their regressors and the error terms. Hence, overlooking the CD in panel data leads to biased estimates and spurious results [22, 24]. Further, the CD test determines the type of panel unit root and cointegration tests that we should apply. Therefore, examining the CD is vital and is the first step in panel data econometrics.
In the literature, there are several tests for CD, such as the Breusch and Pagan [25] Lagrange multiplier (LM) test, Pesaran [26] scaled LM test, Pesaran [26] CD test and Baltagi et al. [27] bias-corrected scaled LM test (for more detail, see Tugcu and Tiwari [28]). Besides, Friedman [29] and Frees [30, 31] also other types of CD tests (for more detail, see De Hoyos and Sarafidis [22]). This study basically employs Frees [30] and Pesaran [26] among the existing CD tests. This is because, unlike the Breusch and Pangan [25] test, these tests do not require infinite T and fixed N, and are rather applicable for both a large N and T. Additionally, Free’s CD test can overcome the irregular signs associated with correlation. However, it also employs Friedman [29] CD for mixed results of the above tests.
Unit root test
The panel unit root and panel cointegration tests are also the common steps following the CD test. Generally, there are two types of panel unit root test: (1) the first-generation panel unit root tests, such as Im et al. [32], Maddala and Wu [33], Choi [34], Levin et al. [35], Breitung [36] and Hadri [37], and (2) the second-generation panel unit root tests, such as [24, 38–47].
The first-generation panel unit root tests have been criticised because they assume cross-sectional independence [48–51]. This hypothesis is somewhat restrictive and unrealistic, as macroeconomic time series exhibit significant cross-sectional correlation among countries in a panel [50], and co-movements of economies are often observed in the majority of macroeconomic applications of unit root tests [49]. The cross-sectional correlation of errors in panel data applications in economics is likely to be the rule rather than the exception [51]. Moreover, applying first-generation unit root tests under CD models can generate substantial size distortions [48], resulting in the null hypothesis of nonstationary being quickly rejected [24, 52]. As a result, second-generation panel unit root tests have been proposed to take CD into account. Therefore, among the existing second-generation tests, this study employs Pesaran’s [24] cross-sectionally augmented panel unit root test (CIPS) unit root test for models 1A–1C. The rationale for this is that, unlike other unit root tests that allow CD, such as Bai and Ng [38], Moon and Perron [45] and Phillips and Sul [42], Pesaran’s [24] test is simple and clear. Besides, Pesaran [24] is robust when time-series’ heteroscedasticity is observed in the unobserved common factor [53]. Even though theoretically Moon and Perron [45], Choi [54] and Pesaran [24] require large N and T, Pesaran [24] is uniquely robust in small sample sizes [55]. Therefore, this study employs the CIPS test to take into account CD, heteroskedasticity in the unobserved common factor and both large and small sample countries. However, since there is no CD in model 1D, this study employs the first-generation unit root tests called Levin, Lin, and Chu (LLC), Im, Pesaran, Shin (IPS) and Fisher augmented Dickey–Fuller (ADF) for model 1D.
Cointegration test
The most common panel cointegartion tests when there is CD are Westerlund [56], Westerlund and Edgerton [57], Westerlund and Edgerton [58], Groen and Kleibergen [59], Westerlund’s [60] Durbin-Hausman test, Gengenbach et al. [61] and Banerjee and Carrion-i-Silvestre [62]. However, except for a few, most of them are not coded in Statistical Software (STATA) or Econometrics Views (EViews) and are affected by insufficient observations. The current study primarily uses Westerlund [56] and Banerjee and Carrion-i-Silvestre [62] for models 1A–1C. However, to decide uncertain results, it also uses McCoskey and Kao [63] cointegration tests for model 1C. The rationale for using Westerlund’s [56] cointegration test is that most panel cointegration has failed to reject the null hypothesis of no cointegration due to failure of common-factor restriction [64]. However, Westerlund [56] does not require any common factor restriction [65] and allows for a large degree of heterogeneity (e.g. individual-specific short-run dynamics, intercepts, linear trends and slope parameters) [50, 65, 66]. Besides, its command is coded and easily available in STATA. However, it suffers from insufficient observations, especially when the number of independent variables increases. The present study employs the Banerjee and Carrion-i-Silvestre [62] and McCoskey and Kao [63] cointegration tests to overcome this limitation. The three Engle-Granger-based cointegration tests applicable when there is no CD and which are widely used and available in EViews and STATA are Pedroni [67, 68], Kao [69] and Fisher-type [34]. Compared to the Fisher-type, both the Pedroni and Kao cointegration tests are more efficient and comprehensive. This study therefore uses them for model 1D.
Estimation techniques, procedures and justifications
This study mainly employs the Driscoll-Kraay [70] standard error (DKSE) (for models 1A and 1B), FE (for model 1C) and two-step GMM (for model 1D) estimation techniques to examine the impact of food insecurity on health outcomes. It also employs the Granger causality test. However, for robustness checks, it employs fully modified ordinary least square (FMOLS), dynamic OLS (DOLS), panel-corrected standard error (PCSE) and feasible generalised least squares (FGLS) methods (for models 1A and 1B), random effect (RE) techniques for model 1C and panel dynamic fixed effect (DFE) techniques for model 1D.
Even though several panel estimation techniques allow CD, most of them – such as cross-section augmented autoregressive distributed lag (CS-ARDL), cross-section augmented distributed lag (CS-DL), common correlated effects pooled (CCEP) and common correlated effects mean group (CCEMG) estimators – require a large number of observations over groups and time periods. Similarly, the continuously updated full modified (CUP-FM) and continuously updated bias-corrected (CUP-BC) estimators are not coded in both STATA and EViews. Others, like the PCSE, FGLS and seemingly unrelated regression (SUR), are feasible for T (the number of time series) > N (the number of cross-sectional units) [71, 72]. However, a DKSE estimate is feasible for N > T [71]. Therefore, depending on the CD, cointegration test, availability in STATA and EViews and comparing N against T, this study mainly employs the DKSE regression for models 1A and 1B. Due to the absence of cointegration, and to deal with heterogeneity and spatial dependence in the dynamic panel, this study employs FE for model 1C. However, due to the absence of CD, the existence of cointegration and N > T, and because all variables are I(1), this study uses GMM for model 1D.
The DKSE regression can be estimated in three different ways: FE with DKSE, RE with DKSE and pooled Ordinary Least Squares/Weighted Least Squares (pooled OLS/WLS) regression with DKSE. Hence, we have to choose the most efficient model using Hausman and Breusch-Pagan LM for RE tests. In other words, we have to select the most efficient model among FE, RE and Pooled OLS for models 1A and 1B; a more efficient model between FE and RE for model 1C; and the most efficient of the panel ARDL (Pooled Mean Group (PMG), Mean Group (MG), and DFE) models. Therefore, this study uses a FE model within the DKSE estimates for models 1A and 1B, and a FE model for model 1C (in the interest of space, these results are not reported here but are available from the author).
Finally, to check the robustness of the main result, this study employs FMOLS, DOLS, FGLS and PCSE estimation techniques for models 1A and 1B. Even though the Hausman test confirms that the FE is more efficient, the study employs the RE model for model 1C. This is because Firebaugh et al. [73] note that the RE and FE models are the best performers in panel data. Besides, unlike FE, RE assumes that the differences between individuals are random. This study also uses panel DFE for model 1D (selected based on Hausman test). Moreover, the robustness check is also conducted using an alternative model (i.e. dependent variable without natural log and Granger causality test).