2.1 Study Population
The NHANES program is a comprehensive study of the health and nutrition of people in the United States. It uses a combination of interviews and physical assessments to collect data. The interviews cover a wide range of topics, such as demographic information, socioeconomic status, diet, and health-related questions. The physical examinations take place in mobile examination centers (MECs) and encompass various medical, dental, and physiological measurements, as well as laboratory tests.
We conducted a prospective study using data from the NHANES 1999–2018, combined with mortality data from the National Death Index (NDI). We focused on participants aged ≥ 20 years with prediabetes, resulting in a sample of 13,268 subjects. Then excluded individuals with missing data on dietary caffeine intake (n = 833), participants who were pregnant at baseline (n = 78), those with extreme energy intake (n = 755), and participants with missing follow-up data (n = 21). Finally, our study included 11,581 participants with prediabetes (Fig. 1).
2.2 Ascertainment of Prediabetes
According to the American Diabetes Association (ADA) guidelines, prediabetes is characterized by not having diabetes mellitus, but meeting certain criteria, including: 1) fasting plasma glucose levels ranging from 100–125 mg/dL (5.6–6.9 mmol/L); 2) 2-hour plasma glucose levels during the oral glucose tolerance test (OGTT) ranging from 140–199 mg/dL (7.8–11.0 mmol/L); or 3) hemoglobin A1c (HbA1c) levels ranging from 5.7% − 6.4% 20.
2.3 Assessment of Caffeine Intake
In NHANES 1999–2002, all participants underwent a 24-hour dietary recall interview. Since 2003, two interviews were included, the first was the same as before, and then after 3 to 10 days, the second interview was conducted by telephone. Consequently, for the 1999–2002 cycle, information from a single 24-hour dietary recall was used, while for the 2003–2018 period, the average of two 24-hour dietary recalls was used 21. In order to record the quantity of food more accurately, each MEC was equipped with a standard measuring tool, and the data were recorded through an automated multiple-pass method 22. The study participants were categorized into four quintiles (Q1–Q4), based on their daily caffeine intake levels: < 29.5 mg/d, 29.5–101.0 mg/d, 101.1–210.0 mg/d, and > 210.0 mg/d.
2.4 Ascertainment of Mortality
In this study, we focused on two main outcomes: all-cause mortality and cardiovascular mortality. To obtain information on mortality, we linked survey data from the National Center for Health Statistics (NCHS) to the NDI death certificates. This linked mortality file (LMF) was updated until December 31, 2019 23. To identify the specific cause of death, we referenced disease codes from the International Classification of Diseases, 10th Revision (ICD-10), which I00-I09, I11, I13, I20-I51, and I60-I69 represent deaths due to CVD, and C00-C97 represent deaths due to cancer. The days between the first examination date and the last survival date represents the follow-up time for each participant.
2.5 Assessment of Covariates
We collected information on age, gender, race or ethnicity, education level, family income to poverty ratio (PIR), smoking status, drinking status, physical activity and energy intake from standardized questionnaires. Race or ethnicity was classified into: Mexican American, other Hispanic, non-Hispanic White, non-Hispanic Black, and others. Education level was classified as below high school, high school, and above high school. Family PIR was stratified into three groups, based on eligibility for federal food assistance programs and considering the number of individuals in the household 24. Smoking status was divided into: nonsmoker, former smoker, and current smoker. Drinking status was also divided into: nondrinker, low-to-moderate drinker, and heavy drinker. Based on self-reported weekly leisure-time activities and metabolic equivalent (MET), physical activity was divided into: inactive group (no leisure activity), insufficiently active group (moderate activity 1–5 times weekly with MET between 3–6 or vigorous activity 1–3 times weekly with MET > 6), and active group (moderate or vigorous activity more than above) 25. Energy intake was determined from the 24-hour dietary recall. Hypertension was defined as previously diagnosed with hypertension or taking anti-hypertensive drugs. CVD was defined as having a previous diagnosis of coronary heart disease (CHD), myocardial infarction, heart failure or stroke. Cancer was defined as those with a previous diagnosis of cancer or any type of malignancy.
In addition, we collected data on physical examinations, including body mass index (BMI), fasting blood glucose (FBG), HbA1c, HOMA-IR, total cholesterol (TC), triglyceride (TG), C-reactive protein (CRP), systolic blood pressure (SBP) and diastolic blood pressure (DBP). BMI was calculated by dividing weight in kilograms by height in meters squared and categorized as < 25.0, 25.0-29.9, and > 29.9 kg/m2. HOMA-IR was calculated by multiplying insulin and glucose and dividing by 22.5. Measurements of other biochemical indicators were available in the NHANES Laboratory Procedures Manual 26.
2.6 Statistical Analysis
Sample weights, strata, and primary sampling units were included in all analyses. The appropriate weights were selected for each study variable, as recommended by the US Centers for Disease Control and Prevention.
The baseline characteristics of study participants were categorized into continuous and categorical variables. Continuous variables were represented as means and standard deviations (SDs), or medians and interquartile ranges (IQRs). Categorical variables were presented as percentages. The study populations were divided into four quintiles (Q1-Q4) based on the daily caffeine intake levels.
The effect of caffeine intake on all-cause and cardiovascular death was evaluated by weighted Cox proportional hazards regression models, with following covariates: age, gender, and race/ethnicity (model 1); model 1 plus education level, family PIR, smoking status, drinking status, physical activity, total energy intake, and BMI (model 2). The results were expressed as hazard ratios (HRs) and 95% confidence intervals (CIs).
Stratified analyses were performed in the strata of age, gender, race/ethnicity (non-Hispanic White and others), family PIR (≤ 1.0, 1.1–3.0, and > 3.0), smoking status (nonsmoker, former smoker, and current smoker), drinking status (nondrinker, low-to-moderate drinker, and heavy drinker.), physical activity (inactive, insufficiently active and active), BMI (< 25.0, 25.0-29.9, and > 29.9 kg/m2), and self-reported hypertension (yes or no). P values were also calculated for the interaction between caffeine intake levels and stratified variables. To investigate dose-response associations between caffeine intake levels with all-cause and cardiovascular mortality, we applied a restricted cubic spline (RCS) model.
In addition, we carried out sensitivity analyses to validate the reliability of the findings. Firstly, in order to reduce interference of reverse causality, we excluded participants who passed away within the first 2 years. Secondly, individuals with a history of CVD were excluded. Thirdly, those who had cancer history were also excluded. Finally, we adjusted for several cardiovascular risk factors to explore the relationship between caffeine intake and the risk of mortality in individuals with prediabetes.
The statistical analyses for this study were performed using R software, version 4.2.2. We considered two-sided P values less than 0.05 as statistically significant.