Nationwide Prospective Cohort study
The data used in this study is obtained from the CHARLS, which has been previously discussed in other studies [16]. CHARLS is a prospective survey aiming at the Chinese population aged 45 and above, covering 28 provinces, including 150 regions and 450 villages/communities. This survey mainly employed standardized questionnaires to collect socio-demographic information, lifestyle, and health-related data. Baseline data was collected through computer-assisted face-to-face interviews in 2011 (wave 1), followed by subsequent surveys in 2013 (wave 2), 2015 (wave 3), 2018 (wave 4), and 2020 (wave 5). In the CHARLS, blood samples were drawn in the year 2011 and 2015.
Our study focused on the individuals over 45 years old in the CHARLS, covering the 2011-2018 period. Eligible participants were required to have documented hs-CRP levels at both wave 1 and wave 3, along with survival records from the year 2018. A total of 11,848 participants underwent baseline blood examinations in 2011. Below are the exclusion criteria: (1) age less than 45 years old; (2) missing important baseline covariates; (3) subjects with cancer. (Figure 1)
CHARLS obtained approval from the Ethics Committee of Peking University, and informed consent forms were signed by all participants.
Definition of hs-CRP
This investigation was centered on the dynamics of hs-CRP as the principal exposure. Our analysis considered both the longitudinal variations of hs-CRP over time and the cumulative hs-CRP (cumhs-CRP) levels, to provide a comprehensive view of its role in the population. Hs-CRP measurements were obtained using the immunoturbidimetric method and its concentration below 2 mg/L is generally indicative of low-grade systemic inflammation [17]. Following this criterion, we stratified this population from a longitudinal cohort (2011-2015) into four distinct trajectories [6]: Class 1: persistently low (low-low), Class 2: increasing from low to high (low-high), Class 3: decreasing from high to low (high-high), and Class 4: persistently high (high-high).
To assess cumulative inflammatory load, we employed a cumhs-CRP index using time-weighted averaging. The cumhs-CRP was calculated as follows: cumhs-CRP = [(hs-CRP2012 + hs-CRP2015) / 2] x (time2 - time1), where hs-CRP2012 is the hs-CRP level at the first measurement (wave 1), hs-CRP2015 is the level at a subsequent measurement (wave 3), and (time2 - time1) represents the duration between these two assessments.
Definition of death
Death data were systematically collected for participants under observation from 2015 to 2018, with the occurrence of death being recorded as the outcome. To ascertain the vital status of participants over the study period, interviewers employed a computer-assisted personal interviewing system during onsite visitations. In the instances where a participant was deceased, the interviewers were tasked with obtaining the pertinent details by conducting interviews with the household members of the deceased.
Determination of covariates
In this study, trained investigators conducted a thorough questionnaire survey with the objective of collecting detailed data on the demographic characteristics, lifestyle, and various health conditions of the participants. The survey encompassed a range of information, including demographic characteristics (age, gender, marital status, and education level, residence), lifestyle (smoking and drinking history), as well as measurements of blood pressure and body mass index (BMI, kg/m2). Additionally, various health conditions were assessed, and the study focused on self-reported, medical conditions diagnosed by physicians, including hypertension, dyslipidemia, diabetes, cardiovascular diseases, stroke, kidney disease, liver disease, and digestive diseases. These various health conditions’ definitions have been reported in prior studies [18]. Information on medication use for these conditions was also all collected and considered as important covariates. In the CHARLS, blood samples were gathered by trained Chinese Center for Disease Control (CCDC) staff according to standardized procedures, with participants required to fast prior to sampling. Subsequently, all samples were promptly frozen and transferred to the CCDC in two weeks, where detailed biochemical analysis was conducted at the Capital Medical University laboratories. The following indicators were considered as covariates: fasting glucose, glycated hemoglobin, low-density lipoprotein cholesterol (LDL-c), triglycerides, high-density lipoprotein cholesterol (HDL-c), total cholesterol, hemoglobin, and serum creatinine.
Statistical analysis
The statistical analysis for this study was conducted using Stata 16.0 and R version 4.2. Continuous variables with a normal distribution were presented as the mean ± standard deviation, and group comparisons were conducted utilizing one-way analysis of variance (ANOVA). For continuous variables that did not display a normal distribution, we performed group comparisons using the Kruskal-Wallis H test. Categorical variables were expressed as counts and/or percentages, and differences between groups were assessed using the chi-square test.
Prior to conducting multivariate regression analysis, a thorough evaluation for multicollinearity was undertaken using the variance inflation factor (VIF). Then we employed stepwise regression analysis to select models with the lowest Akaike information criterion (AIC) to ensure the inclusion of the most appropriate variables.
This study employed binary logistic regression analysis with Class 1 as the reference to investigate the relationship between dynamic changes in hs-CRP levels and the mortality risk. Three models were developed in the primary analysis. Model 1 incorporated covariates including age, gender, residence, marital status, and smoking. Model 2 extended the model 1 by incorporating hemoglobin levels. Subsequent model was adjusted for variables present in the model 1, with medications use of digestive disease being the additional factor considered. Additionally, we also conducted a comprehensive subgroup analysis. The following subgroups were considered: age (≥60 years or <60 years), gender, marital status, education, residence, smoking, drinking, hypertension, diabetes, dyslipidemia, cardiovascular diseases, and stroke. In the subgroup analysis, the interactions between longitudinal changes in hs-CRP levels and the aforementioned factors were also assessed.
Additionally, cumhs-CRP levels were also categorized into quartiles, with Q1 as the reference, after stepAIC selection, a binary logistic regression analysis was employed to evaluate the relationship of cumhs-CRP quartiles with mortality, adjusting for age, education, gender, smoking, drinking, diastolic blood pressure, glucose, HDL-c, and digestive diseases. Moreover, we also examined potential non-linear associations between cumhs-CRP quartiles and mortality using restricted cubic splines (RCS). When interpreting the results of an RCS analysis, the 50th percentile value of the predictor variable was selected as the reference point. Cumhs-CRP levels were divided into two distinct segments based on this inflection point. This segmented logistic regression allowed for a more nuanced understanding of the relationship between these two variables in each segment.
We conducted three sensitivity analyses to validate our findings: (1) The primary analysis was replicated utilizing multiple imputation method; (2) The associations of baseline hs-CRP measured in the year 2012 and 2015, with the mortality risk were evaluated; (3) We also examined the relationship between the fluctuations in longitudinal hs-CRP levels and the mortality from the year 2020.
Two-sample mendelian randomization
Based on a TSMR design, we analyzed the causal relationship of CRP with TL with inverse variance weighted (IVW) method. The summary data for CRP comprised both the discovery and validation set. The discovery set was obtained from the cohorts for HARGE Consortium, which is the largest genome-wide association study (GWAS) on CRP, including 575,531 subjects [19]. The validation set was derived from the pooled data from UKB bank, which included 389,057 subjects, with a total of 10,783,679 SNP loci [20]. The pooled data for TL were obtained from a large GWAS study involving 472,174 cases and 20,134,421 SNP loci of European ancestry (ieu-b-4879). First, candidate IVs with P < 5∗10−8 at genome-wide significance threshold were included. The criterion for linkage disequilibrium threshold was then set to r2 < 0.001 with a genetic distance of 10,000 kb after clumping with PLINK. Second, we explored if the aforementioned SNPs were linked to known confounders using the Catalog and PhenoScanner, if so, the SNP was excluded. Additionally, if F statistic >10, weak IV is generally considered unbiased. We also used other approaches for further identifying these findings’ robustness. MR Egger intercept was employed to assess the pleiotropy, and the “leave-one-out” method and Cochran Q statistic was used to investigate potential heterogeneities.
Feature interpretation and visualization of mortality
We also employed the Extreme Gradient Boosting (XGBoost) algorithm and SHapley Additive exPlanations (SHAP) for identifying determinants of mortality risk. A game-theoretic approach to machine learning, SHAP assigns an importance value to each feature that represents its contribution to the predictions [21]. And the SHAP important plot and heat force plot allowed us to visualize which variables are most important to all-cause mortality as well as how they influence it.