The present study was reported using the REporting of studies Conducted using Observational Routinely collected Data (RECORD) guidelines (http://www.record-statement.org), as advised by Benchimol et al. (14) and the guidelines of the EQUATOR network (The EQUATOR Network | Enhancing the QUAlity and Transparency Of Health Research (equator-network.org) (15).
Study design
It is a cross-sectional study designed to assess the association between B12 and B9 and lipidic and non-lipidic biological markers of CVD as well as inflammatory biomarkers, using archived laboratory data from a university hospital’s Laboratory Information System (LIS).
Settings
The laboratory database analyzed was that of Hôtel Dieu de France, a university hospital in Beirut, Lebanon. It includes laboratory testing results of inpatients and outpatients referred to the hospital’s laboratories. To increase the sample size, its representativeness, and to derive stable estimates of the measured biological variables, a five-year lookback period (1 January 2017–30 June 2022) was selected. The LIS database also contains minimal demographic variables including age, gender, and the setting in which laboratory testing was conducted (inpatient vs outpatient). In inpatients, the database included the department from which testing was prescribed (e.g., cardiology, nephrology, etc.).
Study Population
In compliance with data privacy policies, patient identification codes were automatically anonymized to preserve the patients’ confidentiality. The inclusion criteria were: (1) subjects aged 16 years and older; (2) with at least one valid serum B12 and/or B9 measurement; (3) between 1 January 2017 and 30 June 2022; (4) and at least one of the following serum assays executed at the same time as the vitamin assay(s): creatinine, the estimated Glomerular Filtration Rate (eGFR), HbA1c, homocysteine, uric acid, C-reactive protein (CRP), TC, Low Density Lipoprotein (LDL) cholesterol, HDL cholesterol, triglycerides, fibrinogen, and ferritin; (5) outpatients as well as patients admitted into a cardiology department (cardiology, coronary care, cardio-thoracic surgery, cardiac reanimation) were included. Data from non-cardiology inpatients (e.g., oncology, etc.) was excluded from the analysis.
Variables
The exposure variables were levels of B9 in nmol/l and B12 in pmol/l. The outcome variables defined for the study were the serum levels of: uric acid (in µmol/l), CRP (in mg/l), eGFR (in ml/min/1.73 m²), HbA1c (in %), lipid measurements (in mmol/l), fibrinogen (in g/l), ferritin in (µg/l), homocysteine (in µmol/l).
Given the absence of clinical information except the hospitalization department, admission to a cardiology department (cardiology ward, coronary care unit, thoracic surgery, and cardiac reanimation) at least once during the analysis period was used as a surrogate for CVD status.
The confounding variables are defined later in the logistic regression models section.
Data sources/measurement
Reflecting real life settings, not all the assays were performed for all the subjects. Thus, a complete-case approach in which only patients with all measurements are studied was deemed to be detrimental by drastically reducing the sample size and selecting a biased subset of subjects with a complete biological profile was not representative of the underlying population. Henceforth, depending on the availability of data, different subsets were created to allow the measurement of the association between B9 and B12 levels and each of the dependent variables.
Bias
For each outcome, the subjects had at least one measurement besides the vitaminic measurements. For subjects with multiple records, the mean value of the assays was used instead of the most recent value to limit measurement bias and the influence of outlier observations. Since the database was not intended to capture all the biological information for all the subjects, there were missing values for some of the assays. The strategy of dividing the dataset into subsets, as defined earlier, allowed to circumvent this issue.
Study size
The anonymized extraction in long format included 20,836 observations, yielding 14,312 unique subjects. Applying inclusion/exclusion criteria resulted in retaining 12,293 unique patients, of whom 5,649 patients had multiple records (Supplementary Material 1). Following data cleaning, subjects were deemed suitable for the analysis. Further subsets were created according to the availability of biological assays (Supplementary Material 2).
Quantitative variables
To generate univariable and multivariable odds ratios to measure the association between dependent variables and the outcome variables, quantitative variables were recoded into Boolean variables as follows: when not reported, eGFR was derived from creatinine and age according to the CKD-EPI equation. An eGFR cut-off of 60 ml/min/1.73 m² was selected because it defines moderate renal failure. B12 was recoded into 2 categories with a cut-off of 148 pmol/l, according to the BSH definition of B12 deficiency. B9 was also recoded into 2 categories with a 7 nmol/l cutoff, as defined by the BSH. The CRP cut-off was set at 5 mg/l following clinical practice guidelines. The HbA1c cut-off was set at 6.5%, the cut-off defined by the American Diabetes Association for the diagnosis of non-insulin dependent diabetes (16). The uric acid cut-off was set at 356 µmol/l for women and 416 µmol/l for men. Distributional percentile cut-offs were used for lipids and other biomarkers: TC (90th percentile, 6.25 mmol/l), triglycerides (90th percentile, 2.49 mmol/l), HDL-C (10th percentile, 0.90 mmol/l), LDL-C (90th percentile, 4.05 mmol/l), fibrinogen (90th percentile, 4.875 g/l), ferritin (90th percentile, 247.54 µg/l), and homocysteine (90th percentile, 22.36 µmol/l). These cut-offs were used in both univariate and multivariate statistical analyses.
Statistical methods
Missing values analysis
As eGFR was not systematically reported for subjects with available creatinine levels, missing values were calculated using the CKD-EPI equation. The validity of the calculation was assessed using the intraclass correlation coefficient type 3 with absolute agreement, yielding a value of 0.775 [95% confidence interval 0.487–0.901].
Exploratory univariable analysis
Normality assumptions were checked using histograms, quantile–quantile plots and the Shapiro–Wilk test. All quantitative variables departed from normality and logarithmic transformation (and inverse, and square root) of variables did not result in normal distributions; therefore, exploratory univariate analysis relied on non-parametric correlations (Spearman’s rho) between B12/B9 and CRP, fibrinogen, ferritin, HbA1c, TC, HDL-C, LDL-C, triglycerides, uric acid, homocysteine, age, and eGFR. 95% confidence interval limits for rho were derived using the formula proposed by Fieller, Hartley, and Pearson. Univariable unadjusted odds ratios were calculated to measure the association between dichotomized lipidic and non-lipidic biomarkers as dependent variables and B12 then B9 as independent variables. Fisher’s exact test was used to assess their significance. The cut-offs for the latter variables were those described above in the “quantitative variables” section.
Since eGFR is calculated using serum creatinine, the two variables are intrinsically highly correlated. Therefore, serum creatinine was dropped from multivariate analysis.
Multivariable models
To derive adjusted multivariable estimates, separate logistic regression models were used with B12 or B9 as the independent variable, and relevant lipidic and non-lipidic CVD biomarkers as dependent variables, chosen after assessment of univariate analysis results. Biological markers of inflammation (CRP and Ferritin) were included in the models when the subset sample size allowed it, to adjust for confounding resulting from inflammation. Fibrinogen was excluded from multivariate analyses since less than 1% of patients had fibrinogen measurements. Age and gender were included in all the models. Hence, for each logistic model, the association between B9/B12 and the lipidic and non-lipidic CVD biomarkers was adjusted on age, gender and inflammatory biomarkers. General performances of the logistic regression models were assessed using the Omnibus Test of model coefficients. Their goodness of fit was checked using the Hosmer-Lemeshow statistic. The 95% confidence intervals for multivariate ORs were derived from the Wald statistic. The models’ diagnostic performance was assessed with the C-statistic. Statistical analysis was performed using IBM SPSS (IBM Corp. Released 2021. IBM SPSS Statistics for Windows, Version 28.0. Armonk, NY: IBM Corp.).
Data access and cleaning methods
The data was accessed after approval of the Ethics Committee of Hôtel Dieu de France university hospital (CEHDF 2031). Firstly, an anonymized raw dataset was securely handed to the authors to execute the analysis. As laboratory reports were in French, they were harmonized to English.