Data Source and Study Population
We used data from the Harmonized China Health and Retirement Longitudinal Study (CHARLS), a population-based survey of non-institutionalized middle-aged and older individuals in China . The baseline (wave 1) survey was conducted between May 2011 and March 2012 with 17,708 participants, and follow-up surveys were conducted in 2013 (wave 2) and 2015 (wave 4) . Trained interviewers conducted face-to-face interviews. The CHARLS sampling strategy has been described in detail elsewhere . In this study, we analyzed Harmonized CHARLS data from waves 1, 2, and 4. As wave 3 was a special life-history survey that did not include the collection of data on health behaviour or mental health variables, wave 3 data were not included in our analysis. More detailed information can be found at www.g2aging.org.
Respondents were excluded from the present analysis for the following reasons: age < 50 years at baseline; inconsistent age information across waves (e.g., younger reported age than in a previous wave); missing value for any covariate of interest (age, gender, residence, educational level, marital status, number of children, co-residence with a child) in at least one wave; and missing 10-item Centre for Epidemiologic Studies Depression Scale (CES-D 10) score in at least one wave. The procedure used for sample selection is summarized in Fig. 1.
We applied the age threshold of 50 years to focus on middle-aged and older adults, based on previous studies [13, 28] and in accordance with the World Health Organization’s Study on Global Ageing and Adult Health . The final sample comprised 25,317 observations for 8439 respondents.
Physical activity, social participation, and smoking served as manifest items in the LCA. The manifest items were used to identify the underlying (latent) health behaviour patterns.
Respondents were asked whether they performed vigorous physical activity (VPA) or moderate physical activity (MPA) for at least 10 minutes every week (“yes” or “no”). “Yes” responses prompted the interviewers to ask respondents on how many days they performed at least 10 minutes of VPA and MPA in a usual week (0–7). VPA was defined to include activities that made respondents breathe much harder than normal, such as heavy lifting, digging, plowing, aerobics, fast bicycling, and cycling with a heavy load. MPA was defined to include activities that made respondents breathe somewhat harder than normal, such as carrying a light load, bicycling at regular pace, or mopping the floor. In this study, we dichotomized the physical activity variable, classifying respondents as physically active (MPA/VPA on ≥5 days/week) and physically inactive (MPA/VPA on <5 days/week).
In the CHARLS, the social participation variable was operationalized by asking participants whether they had done any of the following in the month prior to the survey: 1) interacted with (a) friend(s); 2) played ma-jong, chess, or cards or gone to a community club; 3) gone to a sporting event or participated in a social group or other type of club; 4) engaged in the activities of a community-related organization; 5) conducted volunteer or charity work; and 6) attended an educational or training course. The social participation variable was dichotomized, with 0 indicating that the respondent did not participate in any listed social group or activity and 1 indicating that respondent participated in any of the listed social activities in the past month .
We used respondents’ current smoking habits to evaluate smoking status in this study. A value of 0 was assigned to respondents who never smoked or were ex-smokers, and a value of 1 to the respondents who were current smokers. Notably, wave 2 information on smoking was missing for 2098 respondents, the majority of whom reported the same smoking status in waves 1 and 4. Thus, we coded smoking status in wave 2 as in waves 1 and 4 in these cases.
Time-varying and time-invariant covariates
Based on previous studies, the availability of CHARLS data, and known associations with health behaviours and depression [31–34], age, gender, residence, level of education, marital status, number of children, and co-residence with a child were included as potential confounders. Time-varying covariates were age (interview date – birth date), residence (urban /rural), marital status (single [separated, divorced, widowed, or never married]/not single [currently married or cohabiting]), co-residence with at least one child (no/yes), and number of living children (0/1/2/3/≥4). Time-invariant covariates were gender (male/female) and educational level (higher [upper secondary school or more]/lower [less than lower secondary school]).
Depressive symptoms, assessed using the CES-D 10 , made up the distal outcome in this study. The respondents were asked how often they had experienced each of the following in the past week: 1) “I was bothered by things that do not usually bother me,” 2) “I had trouble keeping my mind on what I was doing,” 3) “I felt depressed,” 4) “I felt hopeful about the future,” 5) “I felt everything I did was an effort,” 6) “I felt fearful,” 7) “My sleep was restless,” 8) “I was happy,” 9) “I felt lonely,” and 10) “I could not get ‘going’.” Item responses were provided on a four-point scale ranging from 0 (“rarely or none of the time”) to 3 (“most or all of the time”). Responses to the positively worded items 4 and 8 were reverse coded before analysis. Total CES-D 10 scores range from 0 to 30, with higher scores indicating higher levels of depressive symptoms. The CES-D 10 has demonstrated sufficient reliability and validity among community-dwelling older adults in China , and showed good reliability for all three CHARLS waves in this study (Cronbach’s α, 0.777–0.807).
The analysis performed for this study consisted of multiple steps. First, we performed LCA to identify distinct longitudinal health behaviour profiles. LCA identifies mutually exclusive and exhaustive unobserved classes in a population via a set of manifest items [36–38], whereby between-class variation is maximized and within-class variation is minimized . Specifically, we aimed to identify latent profiles underlying manifest information about physical activity, smoking, and social participation across the three CHARLS waves. We began with a two-class solution model, and added classes until we observed no further improvement of model fit. Identification of the optimal number of classes was based on the lowest values of the Bayesian information criterion (BIC), which favors more parsimonious models, as it penalizes model complexity relatively strongly . The BIC is the most widely used statistic in LCA model selection . Each empty model was estimated 500 times with different initial values. We did not exclude respondents with missing information on health behaviour variables because the expectation-maximization algorithm used in LCA enables latent class model estimation even when manifest item information is missing . The underlying assumption that information is missing completely at random  holds, as CHARLS interviewers asked only a randomly selected subsample (half of the total sample) questions about physical activity .
After determination of the optimal number of latent classes, the latent class with the greatest posterior probability, i.e., the class that corresponded most strongly to the observed health behaviour pattern, was stored for each respondent [43, 44]. We then estimated longitudinal random-effects models of depressive symptoms by wave, whereby we allowed the effect of wave to vary as a function of the recorded latent health behaviour class. These models, adjusted for the potential confounders, were used to examine associations between health behaviour classes and levels of depressive symptoms.
LCA was performed using the poLCA package in R version 3.6.1 (R Foundation for Statistical Computing). Random effects modeling was performed using Stata software version 15.1 (Stata Corporation, Collage Station, TX, USA).