1. Data Resource
The data of this article is from the third wave of CHARLS; CHARLS (https://g2aging.org/) is a nationwide cohort study conducted by the National Development Research Institute of Peking University, using a multi-stage stratified probability sampling method to sample the middle-aged and elderly population of 150 municipalities directly under the central government in 28 provinces, municipalities, and autonomous regions across the country. This study started investigating and collecting data from 2011 to 2012 and has since collected data every two years. It provides longitudinal data on demographic indicators, family conditions, biomedical indicators, health status, and body functions of the middle-aged and elderly population in China. CHARLS has been approved by the Ethics Committee of Peking University School of Medicine, and the detailed investigation design plan has been described elsewhere[28].
The third wave of the CHARLS survey had a total of 20284 participants, excluding 571 participants under the age of 45 and further excluding 7194 participants who did not undergo venous blood collection; this study ultimately included 12519 participants. For details on excluding participants, please refer to Fig. 1.
2. Definition of Environmental NO2Exposure Concentration and HDL LevelThe NO2 exposure concentration in the participant's environment was obtained from the China Air Pollutants (CHAP) dataset[29], which uses multi-source satellite remote sensing and artificial intelligence technology combined with rich ground observation data, satellite remote sensing products, atmospheric reanalysis, and model simulation to eliminate the spatiotemporal heterogeneity of air pollution. High-quality NO2 data with a 1-kilometer resolution in China from 2000 to 2018 were obtained for recording. This study considers the participants' privacy issues in their residences. The NO2 concentration in the city-level environment where the participants reside is taken as the personal level data of the participants, and the average value from the beginning of the study (2011) to the end of the study (2015) is taken as the exposure level of NO2 in the participants' environment. The collected blood samples are analyzed for whole blood cell count (CBC) at the local health centre and then sent to the research headquarters for HDL measurement[30].
3. Covariates
To eliminate any possible confounding variables, this study analyzed specific parameters that have been previously proven to affect HDL levels. Quantify age in years. BMI was calculated by dividing the weight in kilograms by the square of the height in meters. According to different levels of education, it is divided into two categories: secondary education and below and higher education. Residential locations are divided into rural and urban areas. Marital status is divided into two categories: married or with a partner, as well as separated, divorced, widowed or unmarried; Divided into different subgroups based on the subject's smoking and alcohol consumption; At the same time, participants were asked whether they had been diagnosed as the following diseases by doctors in the past: such as hypertension, diabetes, respiratory diseases, heart diseases, stroke, psychological diseases, arthritis, dyslipidemia, liver diseases, kidney diseases, digestive system diseases, etc. Missing covariates are inputted using multivariate input techniques that rely on predictive mean-matching methods.
4. Statistical analysis
Standard deviation and mean represent continuous variables in baseline features, while ratios and percentages indicate categorical variables. Kruskal Wallis test is used to determine the p-value of constant variables; Classification data analysis adopts the chi-square test. Fisher's exact test is used when the theoretical number is less than 10[31].
Environmental NO2 exposure concentration and HDL level were treated as continuous variables. We stratified the baseline data by gender to examine potential differences among the middle-aged and elderly. Multiple linear regression analysis was used to study the relationship between environmental NO2 exposure concentration and high-density lipoprotein levels. To reduce the impact of confounding factors on the research results, models I-IV were set up to adjust for different covariates. Model I only adjusted for age, which is divided into four age groups based on the age of the subjects: 50 years old, 60 years old, 70 years old, and over 70 years old; Model II adjusted for the age, gender, education level, marital status, and place of residence of the subjects; Model III added variables such as smoking and alcohol consumption, BMI, etc. to Model II for additional variable adjustments; Model IV adds adjustments for previously diagnosed disease variables based on Model III. Using interaction analysis and covariate stratification to examine the heterogeneity of the association between environmental NO2 exposure concentration and HDL levels. Use stratified linear regression analysis for subgroup analysis and use a logarithmic likelihood ratio test to determine the p-value of interactions, which involves comparing models with and without covariate interactions. All statistical results of this study are considered statistically significant at a significance level of 0.05. All analyses were conducted using version R 4.2.2.