Tobacco smoking and depressive symptoms in Chinese middle-aged and older adults: Handling missing values in panel data with multiple imputation

Introduction The high co-occurrence of tobacco smoking and depression is a major public health concern during the novel coronavirus disease-2019 pandemic. However, no studies have dealt with missing values when assessing depression. Therefore, the present study aimed to examine the effect of tobacco smoking on depressive symptoms using a multiple imputation technique. Methods This research was a longitudinal study using data from four waves of the China Health and Retirement Longitudinal Study conducted between 2011 and 2018, and the final sample consisted of 74,381 observations across all four waves of data collection. The present study employed a multiple imputation technique to deal with missing values, and a fixed effects logistic regression model was used for the analysis. Results The results of fixed effects logistic regression showed that heavy smokers had 20% higher odds of suffering from depressive symptoms than those who never smoked. Compared to those who never smoked, for short-term and moderate-term quitters, the odds of suffering from depressive symptoms increased by 30% and 22%, respectively. The magnitudes of the odds ratios for of the variables short-term quitters, moderate-term quitters, and long-term quitters decreased in absolute terms with increasing time-gaps since quitting. The sub-group analysis for men and women found that heavy male smokers, short-term and moderate-term male quitters had higher odds of suffering from depressive symptoms than those who never smoked. However, associations between smoking status and depressive symptoms were not significant for women. Conclusions The empirical findings suggested that among Chinese middle-aged and older adults, heavy smokers and short-term and moderate-term quitters have increased odds of suffering from depressive symptoms than those who never smoked. Moreover, former smokers reported that the probability of having depressive symptoms decreased with a longer duration since quitting. Nevertheless, the association between depressive symptoms and smoking among Chinese middle-aged and older adults is not straightforward and may vary according to gender. These results may have important implications that support the government in allocating more resources to smoking cessation programs to help middle-aged and older smokers, particularly in men.

Introduction: The high co-occurrence of tobacco smoking and depression is a major public health concern during the novel coronavirus diseasepandemic. However, no studies have dealt with missing values when assessing depression. Therefore, the present study aimed to examine the e ect of tobacco smoking on depressive symptoms using a multiple imputation technique.
Methods: This research was a longitudinal study using data from four waves of the China Health and Retirement Longitudinal Study conducted between and , and the final sample consisted of , observations across all four waves of data collection. The present study employed a multiple imputation technique to deal with missing values, and a fixed e ects logistic regression model was used for the analysis.
Results: The results of fixed e ects logistic regression showed that heavy smokers had % higher odds of su ering from depressive symptoms than those who never smoked. Compared to those who never smoked, for short-term and moderate-term quitters, the odds of su ering from depressive symptoms increased by % and %, respectively. The magnitudes of the odds ratios for of the variables short-term quitters, moderate-term quitters, and long-term quitters decreased in absolute terms with increasing time-gaps since quitting. The sub-group analysis for men and women found that heavy male smokers, short-term and moderate-term male quitters had higher odds of su ering from depressive symptoms than those who never smoked. However, associations between smoking status and depressive symptoms were not significant for women.

Conclusions:
The empirical findings suggested that among Chinese middle-aged and older adults, heavy smokers and short-term and moderateterm quitters have increased odds of su ering from depressive symptoms than those who never smoked. Moreover, former smokers reported that the probability of having depressive symptoms decreased with a longer duration since quitting. Nevertheless, the association between depressive symptoms and smoking among Chinese middle-aged and older adults is not straightforward and may vary according to gender. These results may have Introduction Tobacco smoking is one of the biggest public health threats, resulting in more than 7 million deaths a year worldwide (1). In China, 50.5% of adult men were current smokers in 2018, although the figure was only 2.1% for adult women. The prevalence of current smoking was 30.2% among adults aged 45-64 years and 23.1% among adults aged 65 years or older, implying that there are more than 163 million middle-aged and older adults who smoke in China. Even though tobacco smoking has been proven to be a major cause of diseases such as cancers, heart diseases, and respiratory diseases, only 16.1% of current smokers in China plan to quit smoking within 12 months (2). Meanwhile, depression is currently becoming a significant public health problem, with more than 300 million people estimated to suffer from depression worldwide (3). The prevalence of depressive symptoms among older adults was 20.0%. Depression in late life is associated with an elevated risk of morbidity and suicide and decreased physical and cognitive functioning (4,5).
The novel coronavirus disease 2019 pandemic has led to adverse changes in health behaviors such as smoking and physical activity and widespread mental disorders such as depression and anxiety (6,7). The high co-occurrence of tobacco smoking and depression is a major public health concern during this unprecedented crisis. The reciprocal relationships between tobacco smoking and depression have been widely documented; for example, depression is associated with subsequent smoking behavior, and smoking exposure is associated with subsequent depression (8)(9)(10)(11)(12). Several previous studies have demonstrated an association between tobacco smoking and depression: tobacco smoking increases the risk of depressive symptoms. Such an association has been shown across different age groups, such as adolescents (13), adults (14,15), middle-aged and older adults (16, 17), and elderly people (18). Conversely, few studies have shown that tobacco smoking reduces depressive symptoms  (19,20). Furthermore, other studies have shown that depression predicts the persistence of tobacco smoking (21)(22)(23)(24)(25). Previous studies document that the association between depressive symptoms and smoking is relatively stable across ages (26). However, the prevalence of both depression and smoking peaks in adolescence (27). Most studies on depression and smoking were conducted during adolescence. Few studies have explicitly focused on middle-aged and older adults (20,28). Across the lifespan, a more modest peak in the prevalence of depression occurs in the fifth and sixth decades (29). Moreover, middle-aged and older smokers are an underserved population for smoking cessation interventions (28). Therefore, this study analyzes the association between depressive symptoms and smoking, focusing on middle-aged and older adults.
Past studies provide evidence of the association between depressive symptoms and smoking among Chinese middleaged and older adults. On the one hand, smoking had higher odds of suffering from depressive symptoms (30); women smokers showed a higher likelihood of suffering from depressive symptoms (31). On the other hand, smokers were less likely to develop depressive symptoms (20); formerly smoking behavior was inversely associated with the risk of depressive symptoms (32). The results show a negative association between depressive symptoms and smoking using the data from the China Health and Retirement Longitudinal Study (CHARLS). In the CHARLS, there were 1,965 missing 10-item Center for Epidemiologic Studies Depression Scale (CESD-10) baseline scores and 2,888 missing CESD-10 follow-up scores among respondents in both interviews. Therefore, the CESD-10 items with missing values may lead to conflicting findings. As we know, missing values are a common challenge in most social research studies, and the problem is often pronounced in studies using self-rated instruments. Similarly, previous studies on depression may have encountered the problem of missing values (33,34). Missing values reduce statistical power, cause bias in the estimation of the parameters, and lessen the representativeness of the samples (35).
Missingness mechanisms were first introduced by Rubin (36). Rubin distinguished three fundamental missing-values mechanisms: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). MCAR occurs when the missingness is unrelated to the observed and unobserved value for a unit. Under an MAR mechanism, the probability of a missing value for an item may depend on Frontiers in Public Health frontiersin.org . /fpubh. . observed data but not on unobserved data. MNAR means that the probability of missingness depends on the underlying value of an item. Many researchers adopt a listwise deletion approach (complete-case analysis) to deal with missing values. This approach is based on the assumption of MCAR. However, this assumption is sometimes difficult to justify in practice (37). Therefore, various imputation methods have been developed to compensate for missing values in survey data. The methods include random selection, preceding question, question mean, individual mean, single regression, and multiple imputation. Multiple imputation is the most accurate technique for dealing with missing values when assessing depressive symptoms (33).
Overall, the difficulty of estimating the effect of smoking status on depression has led to conflicting findings in the general population. Moreover, although the association between tobacco smoking and depression has been documented, thus far, there have been no studies dealing with missing values when assessing depression. To bridge these gaps, the present study is aimed to examine the effect of tobacco smoking on depressive symptoms using panel data with a multiple imputation technique.

Data source
The data used in this study were obtained from the China Health and Retirement Longitudinal Study (CHARLS), launched by the National School of Development of Peking University. The CHARLS sample was drawn from 28 provinces, 150 counties/districts, and 450 villages/residential committees. Multistage stratified sampling with a probability proportional to size was used for the survey. More details on the sampling procedure and data collection process are available in the study by Zhao et al. (38). The CHARLS questionnaires include sections on household information, demographic background, health status and physical functioning, health care and insurance, work, retirement, pension, income, expenditure, assets, housing characteristics, etc.
The CHARLS is a biennially longitudinal survey of Chinese families and individuals aged older than 45 years. In the first wave in 2011, 17,337 persons (older than 45 years old) were successfully interviewed; in the three waves of full sample follow-up surveys in 2013, 2015, and 2018, 18,248 persons, 20,083 persons, and 19,584 persons were successfully interviewed, respectively. The measured variables and their respective percentages of missing values are presented in Table 1. After eliminating the measured variables with missing values <1% (871 persons), the final sample consisted of a total of 74,381 persons for the data collection waves, 58,854 with no missing values and an additional 15,527 with missing values in at least one of the measured variables.

Measurements Depressive symptoms
The CESD-10 was used to assess depressive symptoms in the CHARLS questionnaire. First, the CESD-10 was developed from the full-length version of the 20-item Center for Epidemiologic Studies Depression Scale (CESD-D), which was designed as a screening instrument for depressive symptoms in older adults (39). Second, the CESD-10 indicated good predictive accuracy when compared to the CESD-D. Last, the CESD-10 has shown reasonable validity and reliability in the Chinese population and adequate validity across a range of ages in longitudinal studies (40,41).
In the CHARLS, each respondent was asked, "How often have you felt or behaved this way during the last week?". The survey consists of 10 items (e.g., I felt depressed, I felt fearful, my sleep was restless, and I was happy), which can be rated on a 4-point Likert scale from 0 (<1 day) to 3 (5-7 days). The range of the CESD-10 total score is 0-30, and a cutoff score of 10 or higher indicates the presence of significant depressive symptoms (39). Therefore, the variable of depressive symptoms was set as a dichotomous variable that equaled 1 if the individual self-rated CESD-10 score was equal to or >10 and equaled 0 if otherwise.

Smoking status
In the CHARLS, each adult was asked, "Have you ever chewed tobacco, smoked a pipe, smoked self-rolled cigarettes, or smoked cigarettes/cigars?" and "Do you still have a smoking habit or have you totally stopped smoking?". According to these two questions, all adults were divided into three mutually exclusive groups: never smoked, current smokers, and former smokers. For further analysis, the present study categorized current smokers into three subgroups (light, moderate, and heavy smokers) based on their average cigarette consumption ("In 1 day about how many cigarettes do you consume?"). Light smokers were current smokers who reported consuming from 1 to 10 cigarettes per day, moderate smokers were those who consumed from 11 to 19 cigarettes per day, and heavy smokers were those who consumed 20 cigarettes or more per day. Former smokers were categorized into three subgroups (shortterm, moderate-term, and long-term quitters) based on the total number of years since the respondents had quit smoking ("At what age did you totally quit smoking?"): short-term quitters were former smokers who had quit smoking ≤1 year age, and moderate-term quitters and long-term quitters were former smokers who had quit smoking 2-5 years and ≥6 years age, respectively (42).

Covariates
The analysis also considered the following three categories of variables as covariates to explain depressive symptoms: Frontiers in Public Health frontiersin.org . /fpubh. . "Don't know, " "Refused, " and "Blank" equal missing.
(1) current health-related factors, including self-rated health, functional limitations, and chronic conditions; (2) several demographic characteristics that may also affect depressive symptoms, such as sex, age, educational attainment, marital status, and rural residency; and (3) in addition to smoking, another health behavior factor included in the analysis was drinking. The definitions of the variables are provided in Table 2.

Multiple imputation of missing values
During the CHARLS investigation, the respondents were not required to answer any question in the cognition and depression section that they do not want to answer, and the interviewers went on to the next question. Moreover, the CESD-10 must be answered by the respondents themselves and cannot be answered by other family members. As a result, the proportion of missing values for the CESD-10 items was approximately 10% (see Table 1). In addition, the respondents who smoked filtered or unfiltered cigarettes answered the question about average cigarette consumption. The respondents who smoked a pipe, self-rolled cigarettes, cigars, or water cigarettes skipped the average cigarette consumption question. Therefore, missing values accounted for approximately 12% of the values for the variable of smoking status.
Testing on whether the given data set is MCAR or MAR was performed. The regular Little's MCAR test gives a distance of 22,619.85 with the degree of freedom = 10,099 and pvalue < 0.001. The test suggests that the missing data of the measured variables of interest are not MCAR under significance level 0.05. Then we used a logistic model to identify other variables predicting missing responses to the CESD-10 items and smoking status (data not shown). The logistic model predicted that the missing data of the CESD-10 items and smoking status were related to the respondents' age. Therefore, the test provides evidence that the data is MAR. The present study adopted the MAR assumption and employed a multiple imputation technique to deal with missing values.
Multiple imputation is a simulation-based statistical technique that allows researchers to increase the availability of data points, thus reducing biases associated with the deletion of observations due to missing values (43). Multiple imputation has three elemental phases: imputation, analysis, and pooling. In the imputation phase, m copies of the dataset are created, with the missing values replaced by imputed values using an appropriate model. Rubin suggested that m = 5 should be sufficient to obtain valid inference (44). Therefore, 5 copies were created in this study to reduce the sampling error due to imputations.
. /fpubh. . Two common imputation approaches, multiple imputation with the multivariate normal model (MVN) and multiple imputation by chained equations (MICE), are widely available in statistical software. MVN assumes a joint multivariate normal distribution of all variables and uses multivariate normal data augmentation to impute missing values of imputation variables. MVN has a theoretical justification and appears to perform well compared to the MICE (45). However, most epidemiologists work with datasets that include non-continuous variables, which cannot be modeled by MVN (46). On the other hand, MICE is a more flexible approach that does not rely on rigorous theoretical justification to impute missing data for multiple variables based on a set of univariate imputation models (47). Therefore, the imputation process was carried out based on MICE. The present study selected conditional models based on the type of variables. MICE allows the use of logistic and Poisson regression models to impute binary variables, such as rural residency, and count variables, such as chronic conditions. Moreover, ordered logistic and multinomial logistic regression models can impute ordered categorical variables such as the CESD-10 items, educational attainment, and self-rated health, and unordered categorical variables, such as smoking status. Multiple imputation should include variables associated with the probability of missing values (48). The variables listed in Table 2 were used in the imputation models.
In the analysis phase, each of the 5 completed datasets was analyzed using a desired statistical method. The results obtained from 5 the completed datasets were combined into a single multiple-imputation result in the pooling phase. The single parameter estimate is calculated as the mean of the m (= 5) parameter estimates ofQ: The estimated variance of this MI estimate is calculated based on Rubin's rules as expressed below:

Statistical analysis
Depressive symptoms may be both an antecedent and a consequence of tobacco smoking (10). Panel data, also referred to as longitudinal data in epidemiology, are a dataset in which observations of multiple subjects are collected over time. The benefit of panel data is that it is possible to control for the unknown or unmeasured determinants of depressive symptoms that are constant over time (49). Based on a fourwave unbalanced panel dataset, the current study estimated the effect of tobacco smoking on depressive symptoms and used a logistic regression model. The logistic regression model built a latent regression and was defined as follows: . . , n, t = 1, . . . , T i y * it is an unobserved latent variable linked to the observed binary response variable (with or without depressive symptoms). x ′ it is the vector of the demographic characteristics and health status of an individual. The vector S ′ it represents smoking status including heavy smokers, moderate smokers, light smokers, short-term quitters, moderate-term quitters, and long-term quitters. α and β are the estimated coefficients. µ i is the unobserved and individual-specific heterogeneity, and ε it is a time-varying error term. A logistic regression model was performed to analyze the impact of tobacco smoking on depressive symptoms among middle-aged and older adults in China. The first step in the analysis, pooled logistic regression, was a starting point. After that, this study treated the data as a panel structure and made a choice between the fixed effects and random effects logistic model. In this study, a possible unobserved variable was attitudes toward smoking, which was correlated with the time-varying explanatory variables (tobacco smoking) in the model. With such correlated heterogeneity, a fixed effects logistic model should be preferred over a random effects logistic model; however, when estimating a fixed effects logistic model, many pieces of information are lost. A random effects logistic model was also presented in this study (49,50). The results are presented as odds ratios (ORs) along with 95% confidence intervals (CIs). All statistical analyses were conducted employing the Stata 15 statistical software package.

Results
A descriptive summary of all variables over time is displayed in Table 3. The total sample size was 74,381 respondents, with 51.76% of the respondents being female and 32.61% aged 65 years or older. In addition, 77.88% of the respondents reported living in rural areas, and approximately 34% of the respondents completed at least middle school. The proportions of respondents with depressive symptoms were 37.55% in Wave 1, 32.26% in Wave 2, 33.68% in Wave 3, and 38.50% in Wave 4, implying that the proportions of respondents with depressive symptoms first decreased and then increased. Approximately one in four respondents has been current smokers over the years, and the proportions of light, moderate, and heavy smokers were approximately 9%, 2%, and 14%, respectively. Moreover, ∼10% of the respondents were former smokers from 2011 to 2018. The proportions of short-term, moderate-term, and long-term quitters were approximately 2%, 2%, and 6%, respectively.
The results of the logistic regression analysis are shown in Table 4 as ORs. Column III of Table 4 presents the effect of tobacco smoking on depressive symptoms using the fixed effects logistic model. The results revealed that smoking status was associated with depressive symptoms. Heavy smokers had 20% higher odds of suffering from depressive symptoms than those who never smoked (OR = 1.20; 95% CI: 1.05, 1.37). Compared to those who never smoked, for short-term and moderateterm quitters, the odds of suffering from depressive symptoms increased 30% and 22%, respectively (OR = 1.30; 95% CI: 1.04, 1.63, OR = 1.22, 95% CI: 1.02, 1.47). In particular, long-term quitters had an increased likelihood of suffering from depressive symptoms than those who never smoked, but the difference was not statistically significant. The magnitudes of the ORs for the variables short-term quitters, moderate-term quitters, and long-term quitters decreased in absolute terms with increasing time-gaps since quitting. Therefore, among former smokers, the probability of suffering from depressive symptoms decreased with increasing duration of since quitting smoking.
Irrespective of the estimation method in Scenario 1, heavy smokers showed an increased probability of suffering from    depressive symptoms compared to those who had never smoked. Short-term quitters showed an increased probability of suffering from depressive symptoms compared to those who had never smoked (see Columns I-III of Table 4). The missing values were not imputed in Scenario 2, the results indicated that former smokers showed a decreased probability of suffering from depressive symptoms compared to those who had never smoked. The complete case and multiple imputation analyses are shown in Table 5. The coefficient of variation (CV) of the OR offers its normalized measure of dispersion. Compared to complete case analysis, for smoking status variables, it is clearly reduced after multiple imputation. The variation rate (VR) assesses the relative variation of the OR obtained from complete case and multiple imputation analyses. For smoking status variables, the VR varied from 28.05% for long-term quitters to 66.67% for light and heavy smokers.
The smoking prevalence in China varies widely between men and women. Therefore, the current study conducted a subgroup analysis for men and women, respectively (see Table 6). The results of the fixed effects logistic model revealed that heavy male smokers, short-term and moderate-term male quitters had higher odds of suffering from depressive symptoms than those who never smoked (OR = 1.22; 95% CI: 1.05, 1.41, OR = 1.31; 95% CI: 1.05, 1.64, OR = 1.25; 95% CI: 1.02, 1.54). However, associations between smoking status and depressive symptoms were not significant for women.

Discussion
To the best of our knowledge, this is the first research study that examined the effect of tobacco smoking on depressive symptoms among individuals aged 45 years or older in China using a nationally representative survey with a multiple imputation technique. The results indicated that among Chinese middle-aged and older adults, heavy smokers and short-term and moderate-term quitters have increased odds of suffering from depressive symptoms than those who never smoked after controlling for other relevant variables. Our findings are consistent with the findings for European middle-aged and older adults (16) and American middle-aged adults (17). Therefore, local communities and primary care facilities should consider promoting health education programs for middle-aged and older smokers and improving their understanding of the hazards of tobacco smoking. This type of analysis, however, does not identify pathways between tobacco smoking and depressive symptoms. One possible explanation is that smoking or chewing tobacco releases nicotine affecting an individual's neurocircuitry, which increases their susceptibility to depression (13,15). Another possible explanation is from a self-medication model, suggesting that smokers use nicotine to alleviate depressed mood (9, 18,51).
This study found that short-term and moderate-term quitters show increased odds of suffering from depressive symptoms than those who never smoked regardless of the time Frontiers in Public Health frontiersin.org . /fpubh. . elapsed since quitting, but the likelihood of having depressive symptoms declined with increasing time gaps since stopping smoking. Although the causality between smoking cessation and depression cannot be established without a longitudinal followup design, the decline in the prevalence of depressive symptoms after 1 year of quitting implies that smoking cessation and depressive symptoms are related. These results are comparable to other findings in the literature (52)(53)(54). In this study, former smokers reported that the probability of having depressive symptoms decreased with a longer duration since quitting, and hence, the earlier a smoker stops smoking, the greater the impact of smoking cessation on not being depressed. In other words, smoking cessation is an effective way to reduce middle-aged and older adults' risk of depression. Therefore, the government should consider allocating more resources to smoking cessation programs to help adults and adolescent smokers quit as early as possible and/or to remain non-smokers. It is also essential to tailor smoking cessation programs for middle-aged and older adults and help them quit smoking and prevent relapse. The sub-group analysis for men and women found that associations between smoking status and depressive symptoms were significant for men but not women. Therefore, the association between depressive symptoms and smoking among Chinese middle-aged and older adults is not straightforward and may vary according to gender. According to the multiple imputed datasets, we found that the proportion of current smoking in men (54.17%) was significantly higher than in women (5.39%). However, the proportion of women with depressive symptoms (42.26%) was significantly higher than in men (28.23%). The results revealed that men are more likely to smoke and women are more likely to suffer from depressive symptoms. Given the special situation, further study will be needed to employ different types of research methods investigating gender differences in the association between depressive symptoms and smoking.
Extensive health-related surveys, such as the CHARLS, provide numerous data regarding health-related behaviors and health outcomes. However, almost every analysis faces the annoying problem of missing data. When comparing the results from multiple imputation and listwise deletion (Table 4). multiple imputation recovers a fully observed sample size. More importantly, multiple imputation restores the natural variability of the missing values. Recovering information and restoring variability may reduce bias or increase precision, which results in a valid statistical inference from multiple imputation (35,55). It will be essential to employ multiple imputation in future analyses, when survey items on smoking status and depressive symptoms have a large number of missing values. However, researchers should be aware of hazards in multiple imputation analyses. First, multiple imputation is valid under the MAR assumption and gives biased results for MNAR mechanisms. Unfortunately, the distinction between MAR and MNAR is based on a non-testable assumption. Second, the misspecification of the imputation model gives biased results unless enough variables predictive of missing data are included in the imputation model (56)(57)(58).

Limitations
Although the current study employs a national survey with multiple imputation to analyze the effect of smoking status on depressive symptoms in Chinese middle-aged and older adults, several limitations should be emphasized. First, the results provide no evidence about the causal direction between smoking and depressive symptoms. The authors can only conclude that findings suggest the co-occurrence of smoking and depressive symptoms among Chinese middle-aged and older adults. Second, since the CESD-10 is a self-reported screening instrument for symptoms of depression and not a diagnostic tool, our analysis may have resulted in either an underestimation or overestimation of the association between tobacco smoking and depressive symptoms. Third, the data were obtained via surveys, and thus the limitations of all self-reported data exist, such as recall bias and the unreliability of responses when respondents are under pressure. Last, when the respondents' unchanging depressive symptoms across all four waves did not contribute to the likelihood, the results of the fixed effects model could be less precise and have larger standard errors.

Conclusion
The purpose of this study was to empirically ascertain the effect of tobacco smoking on depressive symptoms among individuals aged 45 years or older in China using multiple imputed datasets. The empirical findings suggest that among Chinese middle-aged and older adults, heavy smokers and short-term and moderate-term quitters have increased odds of suffering from depressive symptoms than those who never smoked. Moreover, former smokers reported that the probability of having depressive symptoms decreased with a longer duration since quitting. Nevertheless, the association between depressive symptoms and smoking among Chinese middle-aged and older adults is not straightforward and may vary according to gender. These results may have important implications that support the government in allocating more resources to smoking cessation programs to help middle-aged and older smokers, particularly in men.

Data availability statement
Publicly available datasets were analyzed in this study. This data can be found here: https://opendata.pku.edu.cn/dataverse/ CHARLS.