Developing and Validating Regression Models for Predicting Household Consumption to Introduce an Equitable and Sustainable Health Insurance System in Cambodia
Background
Financial protection is a key health system objective and an essential dimension of universal health coverage. However, it is a challenge for low- and middle-income countries, where the general tax revenue is limited, and a majority of the population is engaged in the informal economy. This study developed and validated regression models for Cambodia to predict household consumption, which allows the country to collect insurance contributions according to one’s ability to pay. This strategy would maximize the contribution revenue, optimize the government subsidy, and simultaneously ensure equity in healthcare access.
Methods
This study used nationally representative survey data collected annually between 2010 and 2017, involving 38472 households. We developed four alternative prediction models for annual household consumption: ordinary least squares (OLS) method with manually selected predictors, OLS method with stepwise backward variable selection, mixed-effects linear regression, and elastic net regression, which resulted in an adaptive least absolute shrinkage and selection operator (LASSO) regression. Household-level socioeconomic characteristics were also included as the predictors. Subsequently, we performed out-of-sample cross-validation for each model. Finally, we evaluated the prediction performance of the models using mean absolute error, root mean squared error, and mean absolute percentage error (MAPE).
Results
Overall, we found a linearly positive relationship between observed and predicted household consumptions in all four models. While the prediction performance of the four alternative models did not substantially differ, Stepwise Linear Model showed the best performance with the lowest values in all three statistical measurements, including MAPE of 1.376%. The use of regularization and the mixed effects in the regression was not particularly effective in this environment. The household consumption was better predicted for those with lower consumption, and the predictive performance declined as the consumption level increased. Although the richer household consumptions were likely to be overestimated, the trend was less noticeable in Adaptive LASSO Model.
Conclusions
This study suggests the possibility of predicting household consumption at a reasonable level with the existing survey data. Such a prediction would enable the country to raise the secured health insurance revenue equitably. The prediction model should be tested in real settings and continuously improved.
Figure 1
Figure 2
This is a list of supplementary files associated with this preprint. Click to download.
Posted 23 Sep, 2020
Developing and Validating Regression Models for Predicting Household Consumption to Introduce an Equitable and Sustainable Health Insurance System in Cambodia
Posted 23 Sep, 2020
Background
Financial protection is a key health system objective and an essential dimension of universal health coverage. However, it is a challenge for low- and middle-income countries, where the general tax revenue is limited, and a majority of the population is engaged in the informal economy. This study developed and validated regression models for Cambodia to predict household consumption, which allows the country to collect insurance contributions according to one’s ability to pay. This strategy would maximize the contribution revenue, optimize the government subsidy, and simultaneously ensure equity in healthcare access.
Methods
This study used nationally representative survey data collected annually between 2010 and 2017, involving 38472 households. We developed four alternative prediction models for annual household consumption: ordinary least squares (OLS) method with manually selected predictors, OLS method with stepwise backward variable selection, mixed-effects linear regression, and elastic net regression, which resulted in an adaptive least absolute shrinkage and selection operator (LASSO) regression. Household-level socioeconomic characteristics were also included as the predictors. Subsequently, we performed out-of-sample cross-validation for each model. Finally, we evaluated the prediction performance of the models using mean absolute error, root mean squared error, and mean absolute percentage error (MAPE).
Results
Overall, we found a linearly positive relationship between observed and predicted household consumptions in all four models. While the prediction performance of the four alternative models did not substantially differ, Stepwise Linear Model showed the best performance with the lowest values in all three statistical measurements, including MAPE of 1.376%. The use of regularization and the mixed effects in the regression was not particularly effective in this environment. The household consumption was better predicted for those with lower consumption, and the predictive performance declined as the consumption level increased. Although the richer household consumptions were likely to be overestimated, the trend was less noticeable in Adaptive LASSO Model.
Conclusions
This study suggests the possibility of predicting household consumption at a reasonable level with the existing survey data. Such a prediction would enable the country to raise the secured health insurance revenue equitably. The prediction model should be tested in real settings and continuously improved.
Figure 1
Figure 2