We conducted an ecological study using open data collected, compiled, and published as government statistics for 2010, 2013, and 2016 by prefecture. The data used for analysis were LE, three measures of HLE, population by age, number of deaths, and number of unhealthy people. The values of the three measures of HLE and LE were obtained from the “Healthy Life Report from Japan.”3 All other variables were derived from the government statistics website “e-Stat.”14 The population by age was derived from the 2010 population census and population estimates from 2013 and 2016. The number of deaths was derived from official vital statistics. There are three measures of “number of unhealthy people,” as they vary according to the method of HLE (Additional file1). In DFLE-AL, the people who answered “Yes” to the question “Do health problems currently affect your daily life in some way? (Yes/No)” were regarded as unhealthy, and in LE-SH, the people who answered “Not very good” or “Not good” to the question “What is your current state of health?”, which is rated on a 5-point scale (range, not good–very good), were regarded as unhealthy. The numbers of unhealthy persons in DFLE-AL and LE-SH were derived from a comprehensive survey of living conditions.15 In DFLE-CN, the people requiring long-term care (level 2 or higher) were determined to be unhealthy in a survey of long-term care benefit expenses.16
Main predictors
The predictor variables were converted to the following rates or ratios: aging rate, mortality and proportion of unhealthy people. Proportion of unhealthy people was used as main predictor. In this paper, we define the proportion of unhealthy people for three measures of HLE as the restriction rate for DFLE-AL, the subjective unhealthy rate for LE-SH, and the care need rate for DFLE-CN. The restriction rate and the subjective unhealthy rate were calculated by dividing the number of people who responded as being unhealthy to an anonymous self-administered questionnaire by the number of all survey respondents. The care need rate was calculated through dividing the number of persons requiring long-term care (level 2 or higher) by the number of persons aged ≥ 40 years who were eligible for long-term care insurance.
Other variables
Mortality, aging rate and data year were used as the other predictor variables. Mortality was calculated by dividing the number of deaths by the total population. Aging rate was calculated by dividing the population aged ≥ 65 years by the total population. Data year was used after converting from 1 to 3 in the order of 2010, 2013, and 2016. To assist with interpretation, all variables except data year were expressed in “per 1000 persons.”
Statistical analyses
First, Spearman's rank correlation coefficients for LE and for the three measures of HLE were calculated to confirm their similarity. Then, regression analysis was performed using a generalized linear mixed model (GLMM), in which three measures of HLE were used as dependent variables. Data year, aging rate, mortality, and proportion of unhealthy people were included as independent variables. The random effects of prefectures were assumed for data year and the intercept.
In the regression analysis according to the GLMM, data of 47 prefectures in 2010, 2013, and 2016 were combined for all variables and treated as variables of 140 samples (the data for Kumamoto Prefecture in 2016 are missing due to an earthquake). For the independent variables, the model was constructed after the evaluation of multicollinearity using a variance inflation factor (VIF). Models with both aging rate and mortality as independent variables had a high VIF value of 8.4–15.2, which could lead to multicollinearity problems. Therefore, Model 1, with data year, aging rate, restriction rate, subjective unhealthy rate, and care need rate as independent variables, and Model 2, with mortality as an independent variable used instead of the aging rate, were developed. In these two models, regression analysis assuming random effects of prefectures for data year and the intercept was conducted according to sex and HLE (Figs. 1 & 2).
For estimation of the parameters, simulated draws from the posterior were obtained for each parameter using the Markov chain Monte Carlo (MCMC) method. MCMC is a method of deriving an estimate according to a desired distribution by combining a chain in which the next state is determined only from the previous state (Markov chain) and an algorithm for generating random numbers (Monte Carlo method).17–18 The simulated draws were preceded with 2500 “burn in” draws, which were discarded from the analysis. To reduce temporal auto-correlation among the draws, the MCMC chain was thinned by including only every second draw, yielding 5000 simulated posterior observations. Then, Rhat was calculated to confirm the convergence of the simulation. Rhat is an index of the divergence between chains, and in the case of three or more chains, if it is 1.1 or less by convention; it is considered to have converged. The calculation formula is as follows:

Analysis was performed using Open source statistical software R (ver. 3.6.2)19, and Rstan18 package was used for the parameter estimation by MCMC.