A Method to Adjust for Measurement Error in Three Exposures Measured with Correlated Errors in the Absence of Internal Validation Study

Diﬃculty in obtaining the correct measurement for an individual’s long-term exposure is a major challenge in epidemiological studies that investigate the association between exposures and health outcomes. Measurement error in an exposure biases the association between the exposure and a disease outcome. Usually an internal validation study is required to adjust for exposure measurement error; it is chal-lenging if such a study is not available. We proposed a method (trivariate method) that adjusts for measurement error in three correlated exposures in the absence of internal validation study and illustrated the method using real data. We compared the results from the proposed method with those obtained using a method that ignores measurement error and a method that ignores correlations between the errors and true exposures (the univariate method). It was found that ignoring measurement error leads to bias and underestimates the standard error. It was also found that the magnitude of adjustment in the trivariate method is sensitive to the magnitude of measurement error, sign and correlation between the errors. We conclude that the proposed method can be used to adjust for bias in the outcome-exposure association in a case where three exposures are measured with correlated errors in the absence of an internal validation study. The method is useful in conducting a sensitivity analysis on the magnitude of measurement error and the sign of the error correlation.


Background
Difficulty in obtaining correct measurements of an individual's long-term exposure is a major challenge in an epidemiological study that investigates the association between an exposure and a health outcome. For instance, several studies estimated the correlations between self-reported intake from a questionnaire and the true long-term intake values to be less than 0.82 for fruits and about 0.72 for vegetables [1,2,3,4,5], an implication that some of the variation in the diet intake measurements is due to random errors. Due to random error, the association between the dietary intakes and health outcomes may be biased. The effect of measurement error can be quantified using either: (i) the attenuation factor, which quantifies the bias in the association or (ii) the correlation coefficient between the true and the observed exposure, which quantifies the loss of statistical power to detect a significant association (i.e. validity coefficient) [6].
Validation studies are used to assess the accuracy of the dietary questionnaire [7,8,9,6,10,11]. Validation study constitutes a small number of individuals from whom dietary intakes are measured repeatedly using unbiased instrument [12]. However, these studies are expensive to conduct and, in some cases not feasible. Several methods have been proposed to handle measurement error in the absence of internal validation data [13,14,15,16,17].
Agogo et al. [13] conducted a sensitivity analysis to investigate the effect of the magnitude of correlation between errors in the covariates of interest and found that the magnitude of measurement error adjustment is sensitive to the assumed measurement error structure. Dellaportas and Stephens [14] presented a Bayesian method for analysis of non-linear errors-in-variable where prior knowledge of the unknown true covariate is incorporated. Huang et al. [15] proposed a quantile regression-based non-linear mixed effects joint models for longitudinal data that simultaneously accounts for response with non-central location and for covariate with non-normality and measurement error under Bayesian framework. Lin [16] proposed a Bayesian semi-parametric accelerated failure time model to analyze censored survival data with covariate measurement error and evaluated their method using an intensive simulation study. Muff et al. [17] introduced a Bayesian method to handle a mixture of classical and Berkson measurement errors in a single explanatory variable and illustrated their method in studying cardiovascular disease mortality.
Majority of these authors considered a case where one exposure is measured with error (hereafter, a univariate case); Agogo et al. [13] focused on a case where two exposures are measured with error (hereafter, a bivariate case). In a univariate case, the bias in the association between an outcome and the exposure is adjusted by dividing the unadjusted association estimate by the attenuation factor [18]. Attenuation factor is the ratio of the variance of the true exposure to the variance of the observed exposure. This method ignores correlations between the errors, which can lead to substantial bias. In this study, we extend bias adjustment methods to a case where three exposures are measured with correlated errors (hereafter, a trivariate case) in the absence of internal validation study and demonstrate the implementation of this method using R software [19]. In this case, we use real data to illustrate the method. Specifically, we use a subset data from a home-based HIV counseling and testing study that was conducted in rural and peri-urban communities in KwaZulu-Natal Province, South Africa [20]. Unlike the methods proposed in the literature, we developed an R package for our method.
The remaining sections of this paper are organized as follows. In section 2, we discuss materials and methods used in this study. We present the results of the study in section 3. Finally, we provide a discussion and conclusion in section 4.

Data and study design
In this work, we use a subset data from a home-based HIV counseling and testing (HBCT) study that was conducted in rural and peri-urban communities in KwaZulu-Natal Province, South Africa, between November 2011 and June 2012 [20]. The data were obtained from the Human Sciences Research Council (HSRC) of South Africa [20]. This study aimed to provide a better understanding of the complexity, severity and prevalence of non-communicable disease (NCDs) in a community, known to have one of the highest rates of HIV incidence and prevalence in the world [20].
Home-based HIV counseling and testing is a cross-sectional, single site study in South Africa which aims to increase engagement in HIV care by integrating NCDs screening with community-based HIV testing [21]. A random sampling approach was used, where 587 participants over the age of 18 were selected from 50,000 people living in Mpumuza suburb [20]. Anthropometric and biological measures were collected in the survey with the purpose of establishing the prevalence of a range of NCDs and associated risk factors. Eligible individuals participated in a face-to-face interview, physical, psychological and clinical examinations. Persons younger than 18 years living in Mpumuza and all household members not previously enrolled, and members unable to give written consent were excluded from the study. Mobile phones were used for data collection to increase efficiency in data capture and analysis [20].
In our study, we used a subset data consisting of 76 current daily-smokers of cigarette to model the amount of association between body mass index (BMI) and three exposures namely: smoking, fruit and vegetable consumption. BMI was measured in kg/m 2 , while smoking was measured as the average number of cigarettes smoked per day. Initially, fruits and vegetable intake were measured in number of servings consumed per day. It is often assumed that a standard portion of fruit/vegetable weighs about 80g [5]. Therefore, for this study, we converted the number of servings to grams per day (g/day) by multiplying the reported number of servings by 80g. In this set up there are two draw backs: (1) measurement error in the recorded number of cigarettes smoked due to possible misreporting and (2) measurement error in fruits and vegetable consumption due to recall bias and assuming the average weight of a portion of fruit or vegetable. The subset data used in this study is not a representative of the the HBCT-NCD cohort and is only used to illustrate how to adjust for correlated measurement error in the absence of internal validation data and not for inferential purpose.

Ethical statement
Ethics approval was granted by both HSRC Research Ethics Committee (REC: 1/26/05/11) and the University of Washington Institutional Review Board (48733). Informed written consent was obtained from each participant in the study. Participants were provided with written information on the study (including the background and objectives of the study) and their rights regarding participation and withdrawing at any time.
A measurement error model for the data An interest in epidemiological study could be to investigate the association between BMI and three exposures namely: fruit, vegetable and smoking using the multiple linear regression, defined using the following generalized linear model where g(.) is the identity link function, Y denotes the BMI, β 0 is the the intercept, β X1 , β X2 and β X3 are the coefficient parameters for the true long-term fruit (X 1 ), vegetable (X 2 ) and cigarette (X 3 ) intake respectively. In this study, we use vegetable intake and cigarette smoking as confounders and assume that the main interest is in estimating β X1 . In practice, the true intakes are unobservable and, therefore, the intakes recorded in self-reported questionnaires are used. Let W 1 , W 2 and W 3 denote the measured versions of X 1 , X 2 and X 3 , respectively. The use of W p 's in place of X p 's, (p = 1, 2, 3), in equation (1) yields biased estimatesβ W1 ,β W2 and β W3 of β X1 , β X2 and β X3 respectively. Letβ W = (β W1 ,β W2 ,β W3 ) .

A univariate method
In a univariate case, bias in the association between an outcome and an exposure is adjusted by dividing the unadjusted association estimate by the attenuation factor [18]. Attenuation factor (λ) is defined as, λ = var(X i )/var(W i ), i.e., the ratio of the variance of the true exposure to the variance of the observed exposure, also referred to as reliability ratio. This method ignores correlations between the errors and also the correlation between the true exposures.

A bivariate method
In the case where two exposures are measured with correlated errors (hereafter, a bivariate case), bias in the association between an outcome and the exposures can be adjusted using the relationship, β W 2 = (Λ T 2 ) −1 β X , where Λ 2 denotes a 2 × 2 attenuation-contamination matrix [18,22]. The off-diagonal elements in Λ are known as contamination factors while the diagonal elements are called attenuation factors [13]. Therefore, the bivariate method considers the contamination effect caused by correlated measurement errors. Noteworthy, this method ignores measurement error and error correlation for the third exposure variable, when dealing with three exposures measured with error.

The proposed method
For simplicity and without loss of generality, we assume that W i is measured without systematic bias (i.e., α 0i = 0, α 1i = 1 for the three exposures). When dealing with three exposures measured with correlated errors (hereafter, trivariate case), we define the estimate of attenuation-contamination matrixΛ aŝ where Σ X is the estimate of covariance matrix of the true intakes, Σ −1 W is the inverse of the estimate of covariance matrix of the measured exposures,σ 2 Xi is the variance estimate of X i (i = 1, 2, 3) ;σ XiXj (i = j) denotes the covariance estimate between the true exposures;σ 2 Wi is the variance estimate of W i ;σ WiWj (i = j) is the covariance estimate between the observed exposures.
In the trivariate case, we can obtain the adjusted association estimates by premultiplying the unadjusted association estimates by the inverse of the transpose of attenuation-contamination matrix aŝ whereβ W can be obtained from the observed questionnaire data. The elements of the variance-covariance matrix of the observed exposures, Σ W , are estimated from the observed data. The variances of the true exposures, σ 2 Xi 's, can be estimated using validity coefficients for the questionnaire. According to Kipnis et al. [6], the validity coefficient is given by: where W i is assumed to be the measured with error term only and Wi is assumed to be independent of X i . From equation (5), we estimate the variance of the true exposures aŝ by incorporating external validation information on ρ WiXi . To obtain covariances between the true exposures (i.e.σ X1X2 ,σ X1X3 andσ X2X3 ), one of the following two approaches is used: (i) if external information aboutρ XiXj is available, we obtain covariances between true exposures as follows: whereσ Xi are obtained as shown in equation (6); (ii) if we can obtain prior information aboutρ W i W j , we can solve forσ XiXj by decomposing the covariance of observed exposures into unknown covariance between true exposures and unknown covariance between errors as follows: where X i and Wj , X j and Wi are assumed to be uncorrelated. From equations (2) and (6), the estimate of the error varianceσ 2 See Additional file 1: Appendix B for the proof. From equations (8)(9), the covariances between the true exposures are given bŷ Using the observed data and external information, we can determine all the terms required to estimate the attenuation-contamination matrix, Λ, as shown in equation (3) and adjust for the bias in the association between the exposures measured with error and the outcome using equation (4).

Illustration of the proposed method using the study data
We illustrate the method that accounts for uncertainty in the validity measures attributable to heterogeneity in the study populations and in parameter estimation. The proposed Bayesian method applies Markov Chain Monte Carlo (MCMC) estimation approach to combine observed self-reported data and external validation data in adjusting for measurement error in three exposures measured with correlated errors (hereafter, trivariate method). MCMC is a class of algorithms that samples from the posterior distributions by traversing the parameter space [23]. Posterior distribution is obtained by updating the prior distribution with observed data. The steps for implementing the proposed trivariate method are described below.
We first obtained external information on validity coefficients and generated validity coefficients for use by interpreting the lower and upper limits obtained from the literature as the 95% credible intervals (CIs) of the distribution of possible values respectively. Due to the skewed distribution of validity coefficients, Fisher's transformation was used to generate the validity coefficients as explained in the next section.
Second, we estimated the posterior distribution of the covariance matrix for the observed exposures (Σ W ). The exposures were assumed to follow a multivariate normal distribution with mean and covariance, i.e. W ∼ N 3 (µ W , Σ W ). We assumed weakly informative multivariate normal prior for µ W as µ W prior ∼ N 3 (0, 10 6 I 3 ), where I 3 is a 3 × 3 identity matrix. In multivariate normal distribution, Σ W must satisfy two conditions: (1) be positive definite (i.e. W T Σ W W > 0, for all W ) and (2) be a symmetric matrix. The semi-conjugate prior distribution for Σ W , which has these two properties is the inverse-wishart distribution [23]. To minimize influence of the prior information on the estimate of Σ W , we considered weakly informative inverse-wishart prior as Σ W prior ∼ IW(I 3 , v), where v = 3 is the degrees of freedom.
Third, using the validity coefficients generated from the external data and the posterior distribution of covariance matrix for observed exposures, we estimated the distribution of the variance of true intakes (σ 2 Xi ) using equation (5). To estimate the covariance between true intakes (σ XiXj ) using equation (9), we required external validation information on correlation between the errors (ρ W i W j ). Similar to Agogo et al. [13], we generated the correlation between errors from plausible range guided by correlation in the observed data and prior expert information on the most likely sign of the correlation between the exposures, as described in the next section. Having obtained the covariance matrices of the true and observed exposures, we estimated the attenuation-contamination matrix (Λ) from their joint distribution as shown in equation (3).
Lastly, we fitted a Bayesian multiple linear regression model (hereafter, naive method) to obtain the posterior distributions of the unadjusted coefficient estimates (β W1 ,β W2 ,β W3 ) . In the naive model, we assumed weakly informative normal independent priors by choosing a very small precision (large variance) for the unadjusted coefficient estimates as β Wi prior ∼ N(0, 10 6 ). The adjusted coefficient estimatesβ X were then obtained from the joint posterior distribution ofΛ andβ W asβ X = (Λ T ) −1β W .

Software implementation of the proposed method
We implemented the proposed method in R using rjags,coda,MCMCpack and mvtnorm packages. To facilitate Bayesian estimation of the covariance matrix of the observed exposures (Σ W ), rjags package was used to provide an interface from R to the JAGS library [24]. JAGS is a gibbs sampler that uses MCMC to draw dependent samples from the posterior distribution of the parameters [25]. The Bayesian estimation of Σ W proceeded in the following steps: (1) defining a model for Σ W under Bayesian inference using gibbs sampling (BUGS) algorithm in a stand alone file, (2) reading the model file using the jags.model function, (3) updating the model using the update method for jags objects and (4) extracting the posterior samples of the model using the coda.samples function from the coda package.
MCMCregress function from MCMCpack package was used to generate a posterior density sample from the naive linear regression model [26]. MCMC convergence diagnostics of all the model parameters was done using trace plots and autocorrelation (ACF) plots from the coda package [27]. See Additional file 1: Appendix D for convergence diagnostics results. For each model, the burn-in iterations was set to 2,000 and 10,000 MCMC iterations were run after the burn-in iterations. Every first sample value was kept in the MCMC simulations by using a thinning interval of 1. When compiling a JAGS model, an initial sampling step may be needed during which the samplers learn their behaviour to maximize their performance [28]. Therefore, the number of iterations for adaptation in the the jags model was set to 500. The results were presented in terms of density plots, posterior mean, median and 95% CIs. We compared the results obtained under the method that ignores measurement error, the univariate and trivariate methods. The R code used for analysis is presented in Additional file 1: Appendix C.
External information on the validity coefficient and error correlations for the study data External information on the validity coefficient and error correlations for fruit, vegetable and cigarette information was obtained from the literature. According to Kaaks et.al [1], the validity coefficient of self reported fruit intake ranged from 0.33 to 0.79, while that of vegetable intake ranged from 0.30 to 0.60. A meta-analysis study on the validity of questionnaires assessing fruit and vegetable consumption by Collese et.al [2] reported validity coefficients of 0.26 for vegetables and 0.49 for fruits. Other similar validation studies reported validity coefficients in the aforementioned ranges for fruits and vegetables [3,4,29]. Therefore, based on these information we considered a range of 0.3 to 0.8 for fruits and a range of 0.25 and 0.7 for vegetables.
In the Scottish Heart Health Study of 2,849 men and 2,900 women [30], the correlation between self-reported number of cigarettes and biochemical measures was reported between 0.67 and 0.72. In a study on the validation of self-reported smoking by analysis of hair for nicotine and cotinine [31], the validity coefficient between the number of cigarettes smoked per day and nicotine/cotinine levels in hair and plasma was found to be between 0.48 and 0.63, while the correlation between the number of cigarettes smoked and carboxy-haemoglobin was 0.70. In a followup study to examine the relationships among self-reported cigarette consumption, exhaled carbon monoxide, and urinary cotinine/creatinine ratio in pregnant women [32], a validity coefficient in the range of 0.61 to 0.70 was reported. A study by Stram et al. [33] found the correlation between self-reported number of cigarettes smoked and the true lung dose to be between 0.40 and 0.70, and this range was consistent with the findings from the previously discussed related validation studies. Based on this information, we considered a validity coefficient range of 0.40 and 0.70. Similar to Agogo et al. [13], we generated the correlation between errors from plausible ranges that were determined based on the correlation in the observed data and the most probable sign of the correlation among fruits, vegetables and cigarettes as explained below: a. Since the correlation coefficient between fruit and vegetable intake in the observed data was positive, we also assumed the error correlation between fruit and vegetables to be mostly positive; b. An investigation on the correlation coefficient between cigarette smoking and fruits/vegetable intake in the observed data showed a negative correlation coefficient. Based on this and the fact that persons who tends to overstate fruit and vegetable consumption are likely to understate the number of cigarettes smoked, we assumed the error correlation to be mostly negative.
We obtained the upper limits of error correlations by assuming that the error covariance equals the covariance in the observed data and set the lower limit of the error correlation to zero, based on the assumption that the covariance in the observed data equals to the covariance between the true intakes [13].
Estimating the distribution of ρ WiXi Using the range of plausible values obtained from external validation information, we generated the validity coefficients using Fisher-Z transformation method by assuming that the reported lower and upper limits are 0.05 and 0.95 quantiles of the uncertainty distribution, respectively. Fisher Z-transformation is a commonly used method to transform the sampling distribution of correlation coefficients to become approximately normally distributed [34,35]. The procedure is as outlined below: (i) Using the Fisher Z-transformation formula transform the lower (r l ) and upper (r u ) limits of the validity coefficient ρ WiXi to get the corresponding Fisher-Z transformed values F Z l and F Zu respectively. (ii) Compute the mean µ Zi and the standard deviation σ Zi of F Zi as µ Zi = Zi ) (iv) Using the inverse of Fisher Z-transformation, back-transform the generated F Zi 's to validity coefficient as Sensitivity analysis We investigated how varying the level of uncertainty assumed for the limits of the validity coefficients reported from literature affected the estimates for fruit, vegetable and the average number of cigarettes smoked. We also investigated how the estimates varied with the magnitude of the correlation between errors in fruit and vegetable intake, fruit and cigarette smoking and vegetable and cigarette smoking. This helps to assess the sensitivity of the estimates to various magnitudes of CI and correlation between errors, when using the proposed method. Table 1 presents regression coefficients estimates for fruit intake (g/day), vegetable intake (g/day) and the average number of cigarettes smoked per day obtained using the naive method and the two bias adjustment methods (i.e. univariate and trivariate methods). The regression coefficient estimate adjusted for bias using either the univariate or trivariate method was greater in absolute value than that obtained using the naive method. Specifically, for fruit intake and average number of cigarettes smoked, the bias adjusted coefficient estimates were three times as large as the naive coefficient estimates. For vegetables intake, the increase in the strength of the association was about four times as compared to the naive regression coefficient estimates.

Results
[ Table 1] For both fruit intake and average number of cigarettes smoked, the univariate method gave slightly greater estimates while the bias adjusted values for vegetable intake was slightly lower in the univariate method. The variability for regression coefficient estimate of the number of cigarettes smoked was higher than that for both fruits and vegetables intake. Again, the variability in either the univariate or trivariate method was higher than in the naive method due to uncertainty involved in adjusting for measurement error. Figures 1-3 shows the kernel densities representing the distributions of adjusted for measurement error (solid curves) and naive (dotted curves) estimates for fruits intake, vegetable intake and the number of cigarettes smoked respectively. The solid vertical lines on the density plots depicts the posterior mean of the adjusted regression coefficients while the vertical dotted lines show the posterior mean of the naive regression coefficient estimates. A careful investigation of the posterior means as represented by the vertical lines on the kernel densities reveals that the adjusted for bias regression coefficient estimates are generally higher (in absolute value) than their corresponding naive estimates.
[ Figure 1] [ Figure 2] With the naive method, the variance of the regression coefficient for vegetable intake is more underestimated than for fruit intake as depicted by the smaller length between the tails of the density plots. Of the three exposures considered in this study, the variance of the regression coefficient for the average number of cigarettes smoked is the most underestimated (see Table 1 and Figures 1 -3). In general, a comparison of the variance of the regression coefficients in the naive and the proposed method shows that the naive method underestimates the variance of regression coefficients.
[ Figure 3] Presented in Table 2 is the mean (standard deviation), median and the 95% CI for the estimates of fruit, vegetable and the average number of cigarettes smoked adjusted for measurement error using the trivariate (proposed) method in exploring the effects of the magnitude of uncertainty in the reported validity coefficients. From the results, the CI assumed in the distribution of validity coefficient does not affect the mean and the median estimates of fruit, vegetable and smoking. With the proposed method, the results further shows that the uncertainty in the estimates is slightly affected by the level of uncertainty assumed for the validity coefficients.
[ Table 2] Tables 3 to 5 presents the mean (standard deviation), median and 95% CI for the estimates of fruit, vegetable and the average number of cigarettes smoked adjusted for measurement error using the trivariate method in the sensitivity analysis by varying the magnitude of error correlation between measurements for the exposures.
[ Table 3] [ Table 4] The results shows that varying the magnitude of correlation between errors in any two exposures affects the estimates for the three exposures. For instance, from Table 3 increasing the magnitude of the positive correlation between errors in fruit and vegetable intake increases the mean and the median estimates for both fruit and vegetable intake while it causes a decrease (in absolute value) in the estimate for the average number of cigarettes smoked; increasing the negative correlation between errors in the measurements for fruit and cigarette smoking increases (in absolute value) both the mean and the median estimates for both fruit and the average number of cigarettes smoked while it leads to a decrease in the estimate for vegetable intake (Table 4). Similarly, an increase in the magnitude of the negative correlation between errors in vegetable and fruit intake causes an increase in the estimates for both vegetable and cigarette smoking and a decrease in the estimate for fruit intake (Table 5).

Discussion and Conclusion
In this study, we proposed and illustrated a method that adjusts for measurement error in three exposures measured with correlated errors in the absence of internal validation data. The proposed method combines external validation data from the literature with the observed self-reported data to adjust for bias in the association between the exposures and the outcome. The advantages of the trivariate method proposed in this work includes: (1) the method can be used to adjust for bias in the outcome-exposure association caused by measurement error reported in three exposures and can be extended to more than three exposures measured with correlated errors, (2) the method is useful in the absence of the costly internal validation data, provided that external information on the correlation between the observed and the true data or the error correlations of the observed data are plausible within the study context, (3) it can be used in the sensitivity analysis on the effect of uncertainty of the reported validity coefficients, (4) can be used for sensitivity analysis on the magnitude and the direction of correlated errors, (5) the method can adjust for confounding effect in the outcome regression model and (6) This method can be easily implemented on the readily available and free software R shown in Additional file 1: Appendix C. Often, fruit and vegetable intakes are considered as one food group. Our study is relevant because fruit intake and vegetable intake are separately assessed as independent food groups and adjusted for correlated measurement errors. In the HBCT study example used for illustration, the estimates for fruit intake, vegetable intake and the average number of cigarettes smoked adjusted for bias using the trivariate method were almost similar to the estimates adjusted for bias using the univariate method. The slight differences between the bias adjusted coefficient estimates in the univariate and trivariate methods could be attributed to the weak correlations between errors assumed in this study. Sensitivity analysis on the magnitude of error correlation showed that the estimates obtained using the two methods will be different when stronger error correlations are assumed. Further, from the sensitivity analysis, we found that in a case where three exposures are measured with correlated errors, an increase in the magnitude of error correlation between two exposures can increase their estimates and cause a decrease in the estimate of the other exposure. From the sensitivity analysis of the level of level of uncertainty using CI assumed for the validity coefficients, we found that the estimates for the exposures were minimally influenced by the assumed CI. However, the CIs for the validity coefficients should be reasonably chosen as studies have shown that uncertainty in the estimates may be affected by the level of uncertainty assigned to the validity coefficients [13]. From our results, we also noted that the presence of measurement error in the exposures can bias the association in either direction. These results are in support of the finding by Agogo et.al [13] that it is difficult to predict the direction and magnitude of the association between the exposure(s) of interest when several exposures are measured with correlated errors. Noteworthy, our results shows that both fruit and vegetable intakes have a weak positive association with BMI. This is contrary to the expected. However, it is worth noting that some types of vegetables and fruits have high sugar content in them which can be associated with slight increase in BMI if consumed in excess [36,37].
This study has a few limitations: (1) for simplicity, we assumed that the exposures are measured without systematic bias, i.e., only with random errors. However, in practice, the exposures can be measured with systematic error. In such a case, the systematic error components can be incorporated in the measurement error model and also in estimating the attenuation-contamination matrix; (2) although we can have a multiplicative measurement error structure [38], our study assumed an additive measurement error structure. Exposures measured with multiplicative error can be handled using our method by first converting the multiplicative structure to an additive structure through a suitable transformation that linearizes the error structure and (3) our study focused on a subset of current daily smokers which is not a representative of the HBCT cohort and, therefore, the results are not generalizable.
From the findings of this study, we conclude that the proposed method can be used to adjust for bias in the outcome-exposure association in a case where three or more exposures are measured with correlated errors. This is possible even in the absence of internal validation data provided that there is prior information about the validity of the data collection instruments and the magnitude of the measurement error correlation between the exposures. The method is useful in conducting a sensitivity analysis on the magnitude of measurement error and the sign of the error correlation. Data availability Data used in this study are made available to the researcher upon registration and agreeing to the terms and conditions of use in the HSRC web site at http://curation.hsrc.ac.za/Dataset-565-datafiles.phtml.

Competing interests
The authors declare that they have no competing interests.

Consent for publication
Not applicable.
Author's contributions AKM HM GOA ON conceived the idea. AKM contributed in developing the method, wrote the R code, analysed the data and wrote the draft manuscript. G0A, HM and ON helped in developing the method and writing the paper.