1 Data source
The data used in this study were obtained from The Chinese Longitudinal Healthy Longevity Survey (CLHLS), an ongoing open cohort study of the elderly jointly conducted by Peking University Center for Healthy Aging and Development and National Development Institute. The survey collected health-related information on Chinese seniors aged 65 and older in 23 provinces in China, including socio-demographic characteristics, socioeconomic status, cognitive function, lifestyle, ability to perform daily activities, health status, and healthcare utilization, etc. The goal of the survey is to investigate the factors related to the health and longevity of the Chinese population. Details of CLHLS can be accessed elsewhere [18]. The biomedical ethics committee of Peking University approved the study (IRB00001052-13074), and all participants or their proxies provided written informed consent.
1998 was the year that the CLHLS baseline survey was conducted, followed by 2000,2002,2005,2008,2011,2014,2018. Among these, the survey content of healthcare cost varied in different waves. From 1998 to 2008, only the total healthcare costs of respondents in the past year were investigated. After 2011, costs incurred by outpatients and inpatients were analyzed. From the viewpoints of outpatient, inpatient, and healthcare services, In this study, we examined the impact of different chronic diseases on the healthcare utilization and healthcare costs among the elderly in this study.
As a result, we selected the data from the three most recent periods of CLHLS, namely, the data of 2011, 2014, and 2018, and retained the elderly who were surveyed in all three periods as the research sample. After removing missing and outlier values for key variables, the number of observations used for analysis was 5511 (1837 individuals each year).
2 Measurement
2.1 Healthcare utilization and costs
In terms of healthcare costs, outpatient costs (including registration, outpatient examinations, drugs, outpatient treatment, and all other ambulatory medical expenses), inpatient costs (including post-hospital examinations, surgical and non-surgical treatment, drugs, escort, and hospital costs), and healthcare costs (the sum of outpatient and inpatient costs) were included in our analysis.
We developed dummy variables for whether individual used healthcare services based on their healthcare costs, including whether they use outpatient services (in indicator of positive outpatient costs), whether they use inpatient services (an indicator of positive inpatient costs), and whether they use healthcare services (an indicator of positive healthcare costs).
2.2 Chronic disease
We selected the top six chronic diseases with the highest prevalence in our sample, including Hypertension, Arthritis, Heart disease, Cataracts, Chronic lung disease (bronchitis or emphysema or asthma or pneumonia), and Stroke or Cerebrovascular disease (CVD) to conduct cost attribution analysis. In this study, the primary explanatory variables were the dummy variables for these six chronic diseases (1 indicates having the chronic disease, 2 indicates not having the chronic disease).
2.3 Covariates
Referring to existing studies [12, 19, 20], we controlled for a number of variables that influence the utilization and cost of healthcare services for the elderly in the model, including age, gender (male = 0, female = 1), residence (rural = 0, urban = 1), education level (primary and below = 0, middle and above = 1), marital status (other = 0, cohabited = 1), health insurance coverage (no = 0, yes = 1), smoking status (none = 1, quit = 2, still = 3), drinking status (none = 1, quit = 2, still = 3), body mass index (BMI) (underweight = 1, normal weight = 2, overweight = 3, obesity = 4), activities of daily living (ADL), annual household income per capita.
The educational level was categorized into two groups according to the years of education, primary and below (≤ 6 years) and middle and above (> 6 years); BMI was determined as the weight in kilograms divided by the height in meters squared, kg/m2. The criteria for the BMI was adopted “The guidelines for prevention and control of overweight and obesity in Chinese adults” [21]. It was specifically divided into four groups: underweight (< 18.5kg/m2), normal weight (18.5-23.9kg/m2), overweight (24.0-27.9kg/m2) and obesity (≥ 28.0kg/m2). ADL is used to evaluate the daily activities of the elderly, including 1) Bathing, 2) Dressing, 3) Toilet, 4) Indoor Transfer, 5) Continence, and 6) Eating [22]. Each item is categorized as 0 (Completely independent) or 1 (Needs help). The total ADL score ranged from 0 to 6, with higher values suggesting more limited activities of daily living.
3 Statistical analyses
First, descriptive statistics were employed to analyze the basic characteristics of participants over years. Frequency/percentage and mean/standard deviation (SD) were employed to characterize categorical and continuous variables, respectively.
Then, to further investigate the influence of chronic diseases on healthcare costs of the elderly, we referred to studies on the impact of smoking [23] or obesity [24] on healthcare costs and developed the following two-part random effects model. To investigate the influence of chronic diseases on healthcare utilization and cost in older adults.
$$\text{D}\text{E}{\text{x}\text{p}}_{\text{i}\text{t}}={{\gamma }}_{0}+\sum {{\gamma }}_{\text{j}}{\text{C}\text{h}\text{r}\text{o}\text{n}\text{i}\text{c}}_{\text{i}\text{t}}+{{\gamma }}_{7}{\text{X}}_{\text{i}\text{t}}+{{\gamma }}_{8}{\text{Y}}_{\text{i}\text{t}}+{{\gamma }}_{9}{\text{Z}}_{\text{i}\text{t}}+{{\alpha }}_{\text{i}}+{\text{u}}_{\text{i}\text{t}}$$
1
$$\text{I}\text{n}{\text{E}\text{x}\text{p}}_{\text{i}\text{t}}={{\beta }}_{0}+\sum {{\beta }}_{\text{j}}{\text{C}\text{h}\text{r}\text{o}\text{n}\text{i}\text{c}}_{\text{i}\text{t}}+{{\beta }}_{7}{\text{X}}_{\text{i}\text{t}}+{{\beta }}_{8}{\text{Y}}_{\text{i}\text{t}}+{{\beta }}_{9}{\text{Z}}_{\text{i}\text{t}}+{{{\alpha }}^{{\prime }}}_{\text{i}}+{{\text{u}}^{{\prime }}}_{\text{i}\text{t}}$$
2
Equation 1 examines the effect of chronic disease on healthcare utilization among older adults, and Eq. 2 explores the effect of chronic disease on healthcare costs for older adults. \(\text{D}\text{E}{\text{x}\text{p}}_{\text{i}\text{t}}\) is a dummy variable indicating whether the individual used healthcare services (whether it is a positive healthcare cost); \(\text{I}\text{n}{\text{E}\text{x}\text{p}}_{\text{i}\text{t}}\) is the logarithm of healthcare costs for older adults with positive healthcare costs; \({\text{C}\text{h}\text{r}\text{o}\text{n}\text{i}\text{c}}_{\text{i}\text{t}}\) is the six chronic diseases with the highest incidence mentioned above; \({\text{X}}_{\text{i}\text{t}}\)is the sociodemographic characteristics of the individual; \({\text{Y}}_{\text{i}\text{t}}\) is the socioeconomic status of the individual; \({\text{Z}}_{\text{i}\text{t}}\) is a variable related to an individual's health; \({{\alpha }}_{\text{i}}\) and \({{{\alpha }}^{{\prime }}}_{\text{i}}\) represent the constant random heterogeneity of the ith individual; \({\text{u}}_{\text{i}\text{t}}\) and \({{\text{u}}^{{\prime }}}_{\text{i}\text{t}}\) represent the random disturbance terms of Eq. (1) and Eq. (2), respectively. Since the dependent variables of Eq. (1) are dummy variables of three classifications (whether outpatient service is utilized, whether hospitalization is utilized and whether healthcare is utilized), Eq. (1) is estimated using random effects Logit model, whereas Eq. (2) is estimated using random effects generalized least squares model.
In addition, we used the parameters acquired by fitting the two models to predict healthcare costs in both the chronic disease presence and absence scenarios. Referring to the existing study [24, 25], first, the estimated parameters were used to predict the annual per capita healthcare cost of the elderly based on the actual situation, as an actual value (\({\widehat{\text{Y}}}_{\text{i}}^{\text{A}}\)). Then, we predict the healthcare cost of the elderly without the chronic disease as a counterfactual value (\({\widehat{\text{Y}}}_{\text{i}}^{\text{C}\text{F}}\)) by setting the parameter of the chronic disease to 0 (assuming that the elderly do not have the chronic disease) and all other characteristics remain the same. The chronic disease attributable fraction (CDAF) can be calculated based on the predicted actual value and the counterfactual value using Eq. 3.
$$\text{C}\text{D}\text{A}\text{F}=\frac{{\widehat{\text{Y}}}_{\text{i}}^{\text{A}}-{\widehat{\text{Y}}}_{\text{i}}^{\text{C}\text{F}}}{{\widehat{\text{Y}}}_{\text{i}}^{\text{A}}}$$
3
The p-value less than 0.05 could be identified as statistical significance. All data were collated and analyzed using STATA version 16.0 (Stata Corp, College Station, TX, USA).