Recalibration of biochemistry measurements in a multinational cohort study: LIFE course study in CARdiovascular disease Epidemiology (LIFECARE)

Background: Various cardiovascular biomarkers are used to assess and compare the risk of cardiovascular diseases across populations. However, artefactual variations due to the use of different laboratories may make these comparisons invalid. This work describes the inter-laboratory variations in a multi-country cohort, LIFECARE, and the use of recalibration to a reference laboratory to minimise this variability. Methods: LIFECARE is a cohort of 10,479 participants recruited from Indonesia, Malaysia, Philippines and Thailand between 2008 and 2011, with blood samples analysed at country-specific laboratories(n=4). Thailand was the designated reference laboratory. The measurements from each laboratory were compared against the reference laboratory using a common set of samples analysed at all laboratories, using the MethComp package in R. Laboratory values for cohort participants were recalibrated using the equation generated by the package, if large, statistically significant differences were observed during the comparison. Results: Glucose, total cholesterol, HDL cholesterol, LDL cholesterol and triglyceride measurements were reported for all four countries. Cholesterol and HDL from all laboratories required recalibration while glucose did not. Recalibration altered the proportions of the population at risk substantially, with prevalence of high cholesterol changing from 56.3% to 75.0% in Malaysia, 52.1% to 37.5% in Indonesia and 31.3% to 22.7% in Philippines. Prevalence of low HDL was similarly altered. Conclusion: There was significant variation in serum lipid levels measured by different laboratories, leading to variations in estimates of population at risk. Recalibration to a reference laboratory can overcome this variability and facilitate meaningful comparisons of laboratory data across countries.


INTRODUCTION
Cardiovascular disease (CVD) is the single most important cause of death globally, with 32% of all deaths, and 46% of non-communicable disease (NCD) deaths being attributed it in 2013 [1,2]. Of the estimated 17 million CVD deaths in 2013, 8 million were due to ischemic heart disease, and 6 million due to stroke [3]. The growing burden of CVD has been driven mainly by the demographic and epidemiological transitions in Asia, which has seen the biggest jumps in CVD deaths between 1990 and 2013, with an increase of 97% in South Asia and 47% in East Asia [3].
The risk factors for ischaemic heart disease and stroke are well known and have been identified consistently across countries and regions.
However, there are substantial variations in the prevalence of these risk factors, both within and across countries, as well as across studies [4]. In 2011, low income countries showed the lowest, and upper-middle-income countries the highest, prevalence of diabetes, whereas the prevalence of elevated total cholesterol was highest in Europe, followed by the Americas, Africa and South-East Asia, respectively [5].
These variations in the prevalence of risk factors may be real -due to different lifestyles, socioeconomic status, ethnic groups and genetic predisposition; but may also be artefactual -due to differences in methods of estimation. This artefactual variation is of importance as prevention strategies and interventions are based on assumptions of population risk, and such variation can lead to an erroneous estimation of risk and consequently an inappropriate allocation of prevention efforts and resources. The use of different assays, instruments, analytic and calibration reagents can introduce systematic errors in the measurement of biochemical parameters. Such variability and bias have been reported for a variety of biochemical analytes, including glucose and lipids [6][7][8][9].
Analytical variability interferes with the meaningful comparison of data on risk across studies, or even within studies where multiple laboratories are involved. This is especially true of cross-country epidemiological studies, with large numbers of participants spread across multiple countries and the use of multiple laboratories [10]. Hence there is need to devise effective means to make data from different laboratories comparable. In this paper, we demonstrate the use of recalibration to a reference laboratory to minimise variability in the estimation of common cardiovascular risk biomarkers, namely fasting glucose and lipids, across laboratories, using data from a multinational cohort study. We also demonstrate the effect of laboratory variations on population risk estimates, when using raw results.

Recalibration study
Convenience sampling of 63 participants (male=34, female=29) aged above 21 years was done from an outpatient clinic in a local hospital in Singapore. Ethics approval was obtained from National Health Care Group Domain Specific Review Board. Written, informed consent was obtained from each participant. Blood samples were obtained from all participants after an overnight fast. We excluded samples with a high degree of haemolysis. A total of 57 fresh frozen plasma samples and 54 serum samples were shipped to local laboratories in four countries, including the reference laboratory in Division of Clinical Chemistry, Faculty of Medicine, Mahidol University. This reference laboratory is ISO 15189 certified and is a participating laboratory for the CDC lipid standardization program, with performance within the acceptable criteria of the National Cholesterol Education Program [11,12]. All laboratories measured glucose and lipids (total cholesterol, high density lipoprotein cholesterol (HDL-C), low density lipoprotein cholesterol (LDL-C), triglycerides) on these samples. The results obtained from these samples were used to generate the recalibration equations as described in the section on data analysis below.

LIFECARE study
LIFECARE is a multinational cohort study conducted in four south East Asian countries (Thailand, Malaysia, Philippines and Indonesia) [13]. The aim of the LIFECARE study is to identify factors that underlie the changes in cardiovascular risk factors over time in these four countries and determine the impact of changes in these risk factor levels on Health-  [15,16]. Ethics approval was obtained from the Institutional Review Board at Mahidol University.

Consent
Written informed consent was obtained from each participant before the start of the study.

Exclusion
We excluded participants who were not within the pre-defined age  Table 1 shows the details of sample collection and processing in each of the four countries, as well as the coefficients of variation of the individual tests during the study. LDL-C level was directly measured in all countries except Malaysia, where it was calculated using the Friedewald formula.

Definitions
We defined cut-offs of the various biomarkers for identifying proportion of the populations with risk factors. Diabetes mellitus was defined as fasting plasma glucose level of ≥126 mg/dL. High cholesterol was defined as total cholesterol level of ≥ 200mg/dL. Low HDL-C was defined as HDL-C < 40 mg/dL in males and < 50 mg/dL in females. High LDL-C was defined as LDL-C ≥ 130 mg/dl and high triglycerides was defined as levels ≥150mg/dL.

Data analysis
Biomarkers from each country were recalibrated against the measurements made in the reference laboratory, using a workflow ( Figure   1) adapted from Bendix Carstensen's recommendations for the statistical analysis of method comparison studies and the MethComp package in R [17,18].
Each dataset, comprising both original laboratory and reference measurements of a set of samples, was verified to exhibit constant variance before a recalibration equation was estimated. Data with nonconstant variance was log-transformed. Alternative transformations such as the square root transformation were applied if the logarithm transformation did not remediate the non-constant variance. The recalibration equation consisted of a slope term and a constant and was estimated using the MethComp package in R as described by Carstensen [17,18].  Bland-Altman (BA) plots were used to detect persistent low-quality measurements in the datasets. We first identified samples that deviated from the random scatter about zero in each BA plot for each biomarker. If a sample was identified three or more times across different biomarkers within each country, the quality of the measurement was classified as low.
These low-quality measurements were removed and the workflow was reapplied to the amended dataset for a separate recalibration. nine country-specific biomarkers were not recalibrated. Table 3 illustrates the recalibration parameters used on each biomarker from each country. Between biomarkers, it is evident that glucose performed the best, as it did not require recalibration against the reference lab. Cholesterol and HDL most often required recalibration.   In general, the trends in prevalence corresponded with the trends in mean values and ranges of the biomarkers. In Malaysia, mean recalibrated HDL values were lower than the original values, yet the prevalence of low HDL decreased due to a reduction in the range after recalibration. Changes in prevalence of lipid abnormalities persisted after removing low quality measurements.

Variation across gender and age categories
On stratifying the data by gender, it was observed that females, in general, had a significantly more favourable profile than males for diabetes, high cholesterol, and high triglycerides ( Table 5). The difference was most pronounced in Malaysia. The significant differences in prevalence between the two genders persisted after recalibration. Prevalence of high cholesterol (HC) and high TG (HT) in Malaysia increased and remained  In general, the same phenomenon was observed for mean values of the biochemistry variables.

CONCLUSIONS
In our study, we found that there was substantial variation among the measurements from different laboratories, especially for lipids. This variability could result either from the pre-analytic stage (i.e. procedure in the field) or from the analytic stage (i.e. laboratory assessment). At the preanalytic stage, sample collection, storage, processing and transportation may introduce variability. All study sites followed the same procedures for sample collection. Philippines analysed lipids from plasma samples whereas the other countries analysed them from serum, which may lead to differences in the measured concentrations, though of small magnitude [19,20]. Due to the distance from the study site to the laboratory, the Philippine team had to centrifuge and separate plasma before transporting the samples to the laboratory for analysis. However, storage at different temperatures might not be a source of major variability as the samples were analysed within a few days and bacterial contamination was avoided [19,20].
Hence, the significant variability between laboratories observed in our study is likely to have occurred at the analytic stage.
Variation at analytic stage can result from methods of determination (enzymatic or chemical methods), the calibrator, instruments and the reagents used. In the LIFECARE study, all the laboratories used enzymatic methods, so the variability can be attributed to the differences in calibrators, instruments or the reagents. In our study, we found that there was a significant change in the values of almost all lipid components after recalibration. The change was most prominent in the prevalence of high LDL, which decreased from 55.5% to 32.2% after recalibration, and the prevalence of low HDL, which increased from 36.2% to 56.7% in Indonesia.
One of the reasons that LDL and HDL are more prone to variability than other analytes could be the heterogeneity of LDL and HDL in terms of particle size, density, shape, lipid and apolipoprotein composition, with different assays measuring different subclasses of particles [21,22].
Significantly greater magnitude of bias and total errors have been reported while estimating lipid levels in individuals with disease, compared those without disease [21]. In addition, abnormally high or low TG levels [23,24], and bias and errors in HDL measurement [25,26] can lead to underand over-estimation of LDL-C levels by the Friedewald formula.
As seen from the results of our study, the magnitude and direction of error for a biomarker can vary between laboratories and even between different biomarkers measured within the same lab. In Indonesia, for instance, people were being over-diagnosed with high LDL and underdiagnosed for low HDL within the same laboratory based on the original measurements. Such variability has important clinical implications. Diagnosis and treatment of cardiovascular risk factors like diabetes and dyslipidaemias is based on cut-offs recommended by national and international guidelines, and not individual laboratory reference ranges. LDL-C is the primary target for medical treatment of hypercholesterolemia due to proven efficacy in CVD risk reduction [27], and an overestimation can result in CVD risk misclassification into high risk. This might result in unnecessary treatment of a patient whose LDL-C levels can be managed just by dietary control and physical activity. Conversely, underestimation can result in misclassification of a person who actually has high risk into a low risk category resulting in a failure to treat and reduce risk appropriately. In terms of population health, this variability in estimates may lead to erroneous projections of population risk and disease burden, leading to inaccurate prioritization of the issue and allocation of resources disproportionate to the need.
Hence, it is crucial to standardize the measurements of these analytes in order to have a meaningful comparison when we study the cardiovascular risk factors across countries and even within the same country over time.
The ideal way to achieve this is by accreditation of laboratories and/or manufacturers for accuracy with a reference gold-standard procedure, such as the lipid standardization programme run by the Centres for Diseases Control in the US [11,[28][29][30]. Such accuracy-based standardization ensures that measurements from any laboratory or any combination of instruments, reagents and assays are directly traceable to the reference measurement [31]. However, most of national and international laboratory certification and external quality assurance programmes compare individual laboratory performance for a specific analyte with pooled means derived from all the laboratories that participate in the programme. While such comparisons can give an estimate of deviation of laboratory performance against peers, they do not give any information on the accuracy of the results obtained [6,32]. In addition, many of these programmes use lyophilised sera, which may have significant matrix effects, leading to erroneous conclusions about system performance [32,33]. While there are on-going efforts to harmonize quality assurance performance using commutable materials, i.e. materials without matrix effects, and with a focus on traceability to reference standards, this will take time given the sheer number of laboratories around the world, and impetus from national agencies to adopt such targets [34,35].
In the short term, another alternative for epidemiological studies is to recalibrate the results from the various laboratories involved to a reference laboratory, as we have demonstrated in this paper. This may be more feasible when studies are conducted over geographically dispersed populations, where sample storage and transport to a reference laboratory may be more challenging. This reduces pre-analytical variations due to storage conditions and duration. Using an accredited laboratory with a strong quality assurance program for recalibration allows us to improve the accuracy and precision of the biochemical measurements without the need for individual laboratories to invest in new quality assurance initiatives that may be time consuming and economically challenging.
In summary, we have demonstrated artefactual variations in serum lipid levels and prevalence of lipid abnormalities due to variation in estimation of these parameters between laboratories and showcased a method for recalibration to ensure comparability of the results. Researchers, policy makers and health professionals need to take such artefactual variations into consideration while comparing biochemical data across studies and countries.

Consent for publication
Not applicable.

Availability of data and materials
All data generated or analysed during this study are included in this published article.

Competing interests
The authors declare that they have no competing interests.

Funding
The LIFECARE study is supported by an Investigator Initiated Research Grant from Pfizer Inc. The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.

Authors' contributions
All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.