1.Data source, study design, and study sample
We employed a cross-sectional study. Data came from the China Health and Retirement Longitudinal Study (CHARLS) 2011 National Baseline Survey, which included 17,708 Chinese aged over 45. Individuals came from 28 provinces in mainland China and were recruited by a multistage random sampling procedure . Information regarding sampling, recruitment, response rate, and procedures for data collection could be retrieved from a prior study .The present study used individual-level data from the individual survey as well as neighborhood-level characteristics from the community questionnaire from CHARLS. In our definition, neighborhoods refer to villages in rural areas and communities in urban areas. As CHARLS was a community-based survey, individuals were clustered within their neighborhoods (communities or villages) by sharing the unique neighborhood identification. Excluding 4295 observations with missing data, we derived data on 13,413 subjects from 432 neighborhoods.
Whether a subject had ever been diagnosed with the fourteen non-communicable chronic diseases was recorded in the survey. The fourteen non-communicable chronic diseases included hypertension, dyslipidemia, diabetes, cancer or malignant tumor, chronic lung diseases, chronic liver disease, heart problems, stroke, chronic kidney disease, stomach or other digestive diseases, mental problems, memory-related disease, arthritis or rheumatism, and asthma. Data were derived from answers to the question, "Have you been diagnosed with the following 14 NCDs?". In addition, the data on hypertension, chronic lung disease, and mental problems also included answers to the question: "Do you know if you have hypertension, chronic lung disease, and mental problems, respectively?".
Multimorbidity is the primary outcome identified as the overall number of non-communicable chronic diseases. The values of multimorbidity ranged from 0 to 14. Additionally, we respectively identified the diagnosis of the 14 NCDs as the secondary outcomes (1 vs. 0).
The first exposure is the urban versus rural settings to reflect neighborhoods' locality. Second, we focused on the neighborhood's road types and categorized roads into unpaved roads, paved roads, and others (e.g., sand-stone roads and highways). Data about road types came from the question "What type of road does your village/community mainly have?" in CHARLS's community questionnaire.
The number of primary care institutions within the neighborhood (community health centers, community health care medical posts, township health clinics, and village medical posts) was obtained to measure residents' access to primary care since prior studies have documented that access to health care resources could be associated with population health [42-44]. We categorized the number of primary care institutions in each neighborhood (village or community) into 0, 1, 2, and ≥3. Data were derived from the question "How many community health centers, community health care medical posts, township health clinics, or village medical posts in the village or community?". Last, whether the neighborhood has the groundwater system or not (yes vs. no) was introduced as a covariate to reflect the neighborhood's living conditions.
Individual-level confounders including age (in a unit of years), sex (women vs. men), marital status (married vs. others), education attainment (illiteracy, some primary school, primary school, junior school or above), household income (1st, 2nd, 3rd, and 4th), body mass index (BMI, <18.5, ≥18.5 and <25, ≥25 and <30, and ≥30), physical activity (never, seldom, on a weekly basis, and on a daily basis), and health care insurance were introduced as covariates since prior observations have considered them to be potential indicators of multimorbidity [6, 9, 45-48]. Health care insurance was classified as uninsured, rural cooperative medical insurance (RCMI), and others, which included business medical insurance, Urban Residents Medical Insurance, and Urban Employees Medical Insurance, due to a limited number of subjects with the last three types of medical insurance in CHARLS.
3. Statistical analyses
First, we stratified study subjects into urban and rural groups to exam the distribution of the baseline characteristics. T-test was employed for age. Mann–Whitney U tests were employed for the number of primary care institutions, education attainment, household income, BMI, and physical activity. Chi-squared tests were employed for road types, groundwater systems, sex, marital status, and health care insurance. The analyses were used for in-sample interpretations; thereby, CHARLS survey weights were not introduced (Table 1).
When estimating the prevalence of multimorbidity and each NCD among China's middle-aged and older adults, CHARLS complex sampling weights were used to account for selection bias (Table 2). Since there were no significant differences between results from weighted analyses and unweighted analyses, unweighted analyses were used for the following modeling analysis.
Negative binomial regression was employed to investigate variations in multimorbidity. We employed negative binomial regression rather than Poisson regression since the dependent variable's variance was larger than its mean value. Univariate analyses were performed to examine the association between multimorbidity and each independent variable, respectively. Multivariate negative binomial regression analysis was employed with all covariates introduced. We performed this between-person analysis (Model1 in Table 3) in order to compare our results in terms of urban-rural disparities with those from prior studies since most of them did not adjust for neighborhood-level variations [21, 24]. Clustered robust standard errors were generated in Model 2 to take individuals nested within neighborhoods into account. Furthermore, we employed the multilevel logistic regression to investigate the association between neighborhoods' characteristics and the fourteen chronic diseases, respectively (Table 4). Variance inflation factors were calculated to exam collinearity among independent variables. Results suggested slight collinearity. Models' significance was examined by Pearson chi-square.
As a sensitivity analysis, we performed a multinomial logistic regression with five responses (Y = 0, 1, 2, 3, and > or =4) with robust standard errors. Results were qualitatively similar to those from negative binomial regression. Statistical analyses were performed with Stata/SE 15.0 (StataCorp, TX, USA). A two-tailed P value of less than 0.05 was considered statistically significant.