The impact of internet use on health outcomes in China: A longitudinal study using a three-wave nationwide survey

Background: Previous studies have examined the impact of internet use on health in China but have not addressed the reverse causality problem nor analyzed the health status in detail. In this study, we conducted a longitudinal analysis to investigate the causal association between internet use and health status in China to address these problems. Methods: Using three-wave longitudinal data from the China Family Panel Studies conducted in 2014, 2016, and 2018, we adopted dynamic regression models with lagged internet use variables to examine the association between internet use and �ve types of health outcomes. Results: Internet use was positively associated with health outcomes (self-rated health, mental health, and outpatients), and these effects differed by gender, age, and urban/rural region groups. Conclusions: The results provided rich evidence of the positive effect of internet use on health outcomes in China. Thus, digital economic policies are expected to improve individuals’ health status.

level information, such as a set of indices on health, demographic characteristics, family structure, household income, house ownership, health behavior, and enrollment in social insurance, which were used in this study. The samples of the CFPS for 2014, 2016, and 2018 were 37,147, 36,892, and 37,354, respectively. This study focused on individuals aged 16 years or older in the baseline survey, who were committed to at least one of two follow-up surveys. After excluding respondents who were missing key variables used in the statistical analysis, the total number of individuals whose data were used in this study was 60,077 (20,024 from 2014, 20,026 from 2016, and 20,027 from 2018). The sample used in the regression differed slightly depending on the model.

Variables
The key independent variables were ve indices of health outcomes: (i) self-rated health (SRH), (ii) mental health including total mental health disorder (TMH), I nd nothing exciting (MH1), I feel nervous (MH2), I cannot concentrate on things (MH3), I feel depressed (MH4), I nd it di cult to do anything (MH5), and I feel that I cannot continue with my life (MH6); (iii) chronic disease; (iv) outpatient; and (v) inpatient-all of which are binary variables. We constructed the binary variable of SRH as 1 = excellent or good and 0 = otherwise. As for TMH and MH1-6, we categorized the answers to the question "How often do you feel hopeful about the future?" into each day = 5, often = 4, half of days = 3, sometimes = 2, rarely or never = 1. We constructed the binary variables of MH1-6, taking the value 1 when the values of MH1-6 were equal to 4 or 5 and 0 otherwise. The total score of MH1-6 ranged from 1 to 30, and we constructed the binary variable of TMH as 1 = score is more than 15, and 0 = otherwise. TMH and MH1-6 were part of the original questionnaire in the CFPS and were used for the rst time in this study; a high value indicates a higher probability of developing a mental health disorder. We constructed a binary variable of disease by allocating 1 to those who answered that they had one or more diseases diagnosed by doctors and 0 otherwise. We also constructed a binary variable of outpatients or inpatients by allocating 1 to those who answered that they had outpatient or inpatient experiences in the survey year and 0 otherwise. Higher values indicate poor health status for all indices of health outcomes.
The key independent variable was the internet usage dummy variable. Based on the question item "Did you use the internet in the past year?" we scored internet usage as 1 for "used the internet" and 0 for "did not use the internet." As covariates, we considered the following variables, all of which are likely to have affected the health outcomes and were available from the CFPS: (1) demographic factors including age, sex, years of education, ethnicity (han), Communist Party of China member, urban resident; (2) family factors, including having a spouse or not, number of family members; (3) income factor: per capita household income, house ownership; (4) health behavior (smoking, drinking, and weekly exercise); (5) enrollment pension/medical insurance (1 = enrollment, 0 = otherwise); (6) regions (east, central, and west); and (7) survey years (2014, 2016, and 2018).

Analytic strategy
As the benchmark, we considered the regression model to estimate the association between internet use and health outcomes, along with a set of covariates, X: H i = a + βINT i + ∑ n δ n X ni + ϵ i , 1 where i and n denote the individual and covariate, and ε is an error term.
We addressed the initial value problem [24][25][26]: health at time t might be affected by health at time t−1. To deal with this problem, we considered a dynamic model that included health at time t−1 as an explanatory variable. We further addressed the reverse causality problem by using the internet use status at time t−1 to mitigate the problem by allowing a one-wave (that is, two-year) lag from internet use to health [27,28]. Overall, we estimated the following dynamic model using balanced panel data: H it = a + ρH it − 1 + βINT it − 1 + ∑ n δ n X nit + u it , 2 where t and t−1 denote a combination of survey years (2014 and 2016) or (2016 and 2018), and u is an error term. In the actual regression analysis, we estimated logistic regression models using a set of binary variables of health indices. We estimated these models not only for the entire sample but also for each sex group (male and female), age group (aged 16-24, 25-44, 45-59, and 60 or above), and area group (urban and rural) to examine heterogeneity. Table 1 summarizes the key features of the study samples used in the statistical analysis. In general, the proportion of those who answered "have used the internet in the past year" was 40.4% in China from 2014 to 2018. Note: Age, years of education, number of family members, and per capita household income are shown as the mean and SD values. Table 2 presents the unadjusted association between internet use in 2016 and health outcomes in 2018, comparing health outcomes between internet users and non-users using the entire sample; high values indicate poor health outcomes. Internet use was positively associated with SRH, TMH, MH3-6, disease, outpatient, and inpatient. It should be noted, however, that the comparisons in this table did not control for covariates and were not adjusted for potential biases related to cross-sectional comparisons.

Regression analysis
The results of the regression models are summarized in Table 3, which reports the odds ratios (ORs) of reporting poor health status, along with 95% con dence intervals (CIs), in response to internet use. The  Note: a Obtained from dynamic logistic models with lagged explanatory variables (controlled for covariates). * p < 0.05, ** p < 0.01.
Tables 4-6 summarize the results obtained from separate estimations by sex, age, and area group. Table 4 indicates that internet use had signi cantly negative associations with poor SRH, mental health disorder of MH3 and MH4 and modestly negative associations with mental health disorders of TMH and MH6 (p < 0.1) for males; moreover, it had signi cantly negative associations with poor SRH and mental health disorder of MH4-6 for women. In sum, internet use had a positive effect on improving the health status of both males and females, while the effect was modestly greater for females than males. Note: a Obtained from dynamic logistic models with lagged explanatory variables (controlled for covariates). ** p < 0.01, * p < 0.05, † p < 0.1. Table 5 compares the association between internet use and health by age group. The most noticeable nding is that the positive effect of internet use on SRH and mental health was signi cantly smaller for the younger generation (aged 16-24 years) than for middle-and older generations (aged 45-59, and 60 and over), while the positive effect of internet use on outpatients was greater for younger generations (OR: 2.06, p < 0.01) than for middle and older generations, which suggests that the negative effect of internet use was signi cantly greater for younger generations than for other age groups. Note: a Obtained from dynamic logistic models with lagged explanatory variables (controlled for covariates). ** p < 0.01, * p < 0.05, † p < 0.1. Note: a Obtained from dynamic logistic models with lagged explanatory variables (controlled for covariates). ** p < 0.01, * p < 0.05, † p < 0.1. Table 6 compares the association between the internet and health in urban and rural area groups. The results show that internet use had signi cantly negative associations with poor SRH for both urban and rural area groups, signi cantly negative associations with mental health disorders of MH6 and MH6, and modestly negative associations with mental health disorders of MH4 (p < 0.1) for the urban group; that it had signi cantly negative associations with mental health disorders of MH4 and MH6 for the rural group; and that it had signi cantly negative associations with outpatients for the urban group and inpatients for the rural group. In sum, internet use had a positive effect on improving the health status of both urban and rural groups, while this effect differed between urban and rural residents.

Discussion
We examined how internet use was associated with health in China from 2014 to 2018. Our regression analysis based on three wave longitudinal data and dynamic models with the lagged internet use variable indicated that internet use had signi cant positive associations with SRH, and these results on SHR were generally in line with the positive results from previous studies in China using the cross-sectional data analysis method [14][15][16][17], which did not fully control for statistical biases, and the study based on two wave longitudinal data [18].
Regarding the association between internet use and other health outcomes, which were estimated in this study for the rst time, our results showed that internet use may reduce the probability of developing a mental health disorder and becoming an inpatient nationwide.
These ndings can contribute to the literature on the issue of association with the internet and health outcomes from multiple perspectives. It is reported that in 2017, 792 million people lived with a mental health disorder; the proportion was 10.7%, which is slightly more than one in 10 people worldwide [29]. The World Health Organization (WHO) reported that 54 million people suffered from depression and about 41 million from anxiety disorders in China [29], and the proportion of people with mental health disorder in this country was more than 12% of that worldwide. With the exception of increasing public health care expenditure on the treatment of mental health disorders, our results suggest that policies promoting the digital economy and expanding internet penetration may contribute to improving mental health status.
Our estimation results indicated that the positive effect of internet use was modestly signi cant for males than for females, which may be due to the difference in internet access by gender. It is argued that a gender digital gap exists in internet access that arose in developed countries in the early stages of ICT development [31][32][33]. According to data from the CNNIC, the proportion of female internet users in China was 30% in 2000 and 48.1% in 2020 [1], which suggested a gender disparity in internet access in China. Additionally, the gender gap in educational attainment can lead to a gender gap in internet use skills, which may explain the difference in internet effects on health outcomes by gender. Based on the data from CFPS, in 2018, the period of schooling of individuals aged 24 and over was 8.17 years for males-longer than that (6.68 years) for females.
The results indicated disparities in internet use effects among age groups. The positive effect of internet use on health outcomes was greater for middle and older generations; this may be because the problem of addictive use (overuse) of the internet is serious in younger generations compared to other age groups as the ability to control internet addiction is weaker for teenagers than for adults [11].
Furthermore, a disparity existed in the effects of internet use among urban and rural residents. The positive effect of internet use on health outcomes was greater for urban residents than for their counterparts. The reasons for this can be explained as follows: rst, the internet penetration rate is lower for rural residents than for urban residents as the development of internet infrastructure is lagged in rural areas.
According to No. 45 of the Statistical Report on the Development of the Internet in China published by the China Internet Information Center, the number of internet users in China reached 989 million in June 2020, including 680 million urban residents and 309 million rural residents with internet penetration rates of 76.4% and 52.3%, respectively [1]. Second, the level of educational attainment is higher for urban residents. For example, based on the CFPS, the average period of schooling of individuals aged 24 and over was 8.84 years for urban residents and 5.98 years for rural residents in 2018. It is predicted that well-educated individuals may have a higher ability to use the internet to improve their health outcomes. The internet access/use skill gap between urban and rural residents can expand the disparity of health outcomes between the two, which may cause the problem of a severe socioeconomic status inequality [34].
Based on the results of this study, we can argue that in general, policies promoting the development of the digital economy may improve the nation's health status. To reduce digital division problems, policies for reducing the problematic use of the internet for teenagers, promoting internet infrastructure expansion in rural areas, and reducing the gender and urban-rural gaps in internet access and education attainment should be considered by the Chinese government.

Conclusions
We conclude that internet use was positively associated with the health outcomes of individuals aged 16 and over in China nationwide, and the positive effects of internet use differed by sex, age, and urban/rural groups based on three-wave longitudinal data from 2014 to 2018.
This study has several limitations. First, although we used dynamic models with lagged internet use variables, we could not identify causation from internet use to health, which should be investigated in a more in-depth analysis. Second, as no policy reform existed for internet use during the period 2014-2018, we could not investigate the policy effect on the association between internet use and health outcomes based on a quasi-experiment method, which has become an issue to be addressed in future research.
Despite these limitations, we believe that the current study, which took full advantage of longitudinal data, provided new insights for understanding the association between internet use and health. We also expect the Chinese experience to provide valuable lessons for other countries that are also looking to improve nations' health outcomes in the digital economy era worldwide.

Declarations
Ethics approval and consent to participate The dataset used in this study, the China Family Panel Studies (CFPS), is publicly available (http://opendata.pku.edu.cn/en), and the study protocol was approved by the Ethical Review Committee of Peking University, China. Hence, ethical approval was not required for this study. Survey data were obtained from Peking University with o cial permission; therefore, the current study did not require further ethical approval, and the need for written consent was waived by the committee.

Consent for publication
Not applicable.