A generalisation of the method of regression calibration and comparison with the Bayesian 2-dimensional Monte Carlo method

For many cancer sites it is necessary to assess risks from low-dose exposures via extrapolation from groups exposed at moderate and high levels of dose. Measurement error can substantially alter the shape of this relationship and hence the derived population risk estimates. Even in studies with direct measurement of low-dose exposures measurement error could be substantial in relation to the size of the dose estimates and thereby distort population risk estimates. Recently, much attention has been devoted to the issue of shared errors, common in many datasets, and particularly important in occupational settings. In this paper we test a Bayesian model averaging method, the so-called Bayesian two-dimensional Monte Carlo (2DMC) method, that has been fairly recently proposed against a very newly proposed modification of the regression calibration method, which is particularly suited to studies in which there is a substantial amount of shared error, and in which there may also be curvature in the true dose response. We also compared both methods against standard regression calibration and Monte Carlo maximum likelihood. The Bayesian 2DMC method performs poorly, with coverage probabilities both for the linear and quadratic dose coefficients that are under 5%, particularly when the magnitudes of classical and Berkson error are both moderate to large (20%–50%). The method also produces substantially biased (by a factor of 10) estimates of both the linear and quadratic coefficients, with the linear coefficient overestimated and the quadratic coefficient underestimated. By comparison the extended regression calibration method yields coverage probabilities that are too low when shared and unshared Berkson errors are both large (50%), although otherwise it performs well, and coverage is generally better than the Bayesian 2DMC and all other methods. The bias of the predicted relative risk at a variety of doses is generally smallest for extended regression calibration, and largest for the Bayesian 2DMC method (apart from unadjusted regression), with standard regression calibration and Monte Carlo maximum likelihood exhibiting bias in predicted relative risk generally somewhat intermediate between the other two methods.

(apart from unadjusted regression), with standard regression calibration and Monte Carlo maximum likelihood exhibiting bias in predicted relative risk generally somewhat intermediate between the other two methods.

Introduction
Moderate and high doses of ionising radiation are well established causes of most types of cancer 1,2 .There is emerging evidence, particularly for leukaemia and thyroid cancer, of risk at low dose (<100 mGy) low-linear energy transfer (LET) radiation [3][4][5][6] .For most other cancer endpoints it is necessary to assess risks via extrapolation from groups exposed at moderate and high levels of dose.A number of recent reviews of low dose risk have been conducted [7][8][9][10][11][12][13] .Crucial to the resolution of this area of uncertainty is the modelling of the dose-response relationship and the importance of both systematic and random dosimetric errors for analyses of the dose response, in particular in the Japanese atomic bomb survivors, which are central to evaluations of population risks by a number of committees assessing radiation risk 1,14 .The problem of allowing for measurement error in dose when estimating dose-response relationships has been the subject of much interest in epidemiology [15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30] and it is known that measurement error can alter substantially the shape of this relationship and hence the derived population risk estimates 31 .Dose measurement errors can arise in a number of different ways.Berkson errors commonly result when true values are randomly distributed around a measured dial setting, as for example on a radiotherapy (RT) machine, i d , with error i U , so that ii i D dU =+, implying that the , i i d U are independent.Alternatively, when the measured "doses", i d are distributed at random around the true "doses", i D , so that iii d DU =+ so that the , i i D U are independent, then the "classical" error model results.A crucial difference between these error models is that in the Berkson model the nominal dose and error are independent, but in the classical error model it is the true dose and the error that are independent.In the atomic bomb survivors, radiation doses are estimated by using estimates of the position of the survivors in each city, orientation with respect to the bomb and other shielding structures, e.g., buildings.In this case the estimated doses, i d , are thought to be lognormally distributed around the true doses, i D (i.e.classical error model) 32 .This assumption underlies many of the attempts that have been made to model dose error in the Japanese atomic bomb survivor Life Span Study (LSS) data [15][16][17][18][19][21][22][23]29 .
Regression calibration has been frequently used to correct for the effects of classical error, and entails replacing terms for true dose, i D , in regression models by the condititonal expectation of true dose given the oberved dose 31 .Regression calibration works well when dose errors are not substantial and when there is no curvature in the dose response 31 .However, substantial bias can result when regression calibration is applied when dose errors are substantial, also when errors are non-differential 31,33,34 .
A modification of the regression calibration method has been very recently proposed which is particularly suited to studies in which there is a substantial amount of shared error, and in which there may also be curvature in the true dose response 35 .This so-called extended regression calibration (ERC) method can be used in settings where there is a mixture of Berkson and classical error 35 .In fits to synthetic datasets in which there is substantial upward curvature in the true dose response, and varying (and sometimes substantial) amounts of classical and Berkson error, the ERC method generally outperformed both standard regression calibration, Monte Carlo maximum likelihood (MCML) and unadjusted regression, particularly with respect to the coverage probabilities of the quadratic coefficient, particularly for larger magnitudes of the Berkson error, whether this is shared or unshared 35 .
A Bayesian model averaging (BMA) method has been recently proposed, the so-called 2 dimensional Monte Carlo with Bayesian model averaging (2DMC with BMA) method 28 , which has been used in fits to radiation thyroid nodule data 36 .In the present paper we shall assess the performance of the 2DMC with BMA method against other methods of correction for dose error using simulated data.The simulated data we shall use is exactly as per the previous report 35 .

Synthetic data used for assessing corrections for dose error
The methods used closely parallel those of the previous paper 35 .We used the publicly available version of the leukaemia and lymphoma data of Hsu et al 37 to guide construction of a synthetic dataset, which we provide in outline in Table 1.Specifically we used the person year distribution by bone marrow dose groups 0-0.07, 0.08-0.
the scaling constant j  being chosen for each simulation (but not for the Bayesian model fits) to make these sum to 1.We assumed coefficients A total of 1000 m = samples were taken of each type of dose, as given by expressions (1)   and (2).A total of n=500 simulations of these dose+cancer ensembles were used to fit models and evaluate fitted model means and coverage probability.Having derived synthetic individual level data, for the purposes of model fitting, for all models except MCML and 2DMC with BMA, the data were then collapsed (summing cases, averaging doses) into the 5 dose groups given in Table 1.Poisson linear relative risk generalised linear models 38 were fitted to this grouped data, with rates given by expression (3), using as offsets the number per group in Table 1.Models were fitted using five separate methods: (1) unadjustedusing only the mean surrogate doses per group given by group means of the samples generated by expression (2), using a single sampled dose per individual for each of 500 m = dose+cancer ensembles; (2) regression calibration adjustedusing the mean true doses per group given by group means of the samples generated by expression ( 1 (5) 2DMC with BMA, using the full set of mean doses as for MCML and fitted using Bayesian Markov Chain Monte Carlo (MCMC) methods.
For the Bayesian model ( 5), associated with the dose vector is a vector of probabilities , 1,...,1000 j pj = which is generated using variables , 1,...,999 j j  = , so that: All main model parameters ( , , , k     ) had normal priors, with mean 0 and standard deviation (SD) 1000.The Metropolis-Hastings algorithm was used to generate samples from the posterior distribution.Random normal proposal distributions were assumed for all variables, with SD of 0.2 for  and 1 for , , and SD 2 for all k  .The k  were proposed in blocks of 10.Two separate chains were used, in order to compute the Brooks-Gelman-Rubin (BGR) convergence statistic 39,40 .The first 1000 simulations were discarded, and a further 1000 simulations taken for sampling.The proposal SDs and number of burn-in sample were chosen to give mean BGR statistics (over the 500 simulated datasets) that were in all cases less than 1.03 and acceptance probabilities of about 30% for the main model parameters ( ,,    ), suggesting good mixture and likelihood of chain convergence.For models ( 1)-( 4) confidence intervals were derived using the profile likelihood 38 This was evaluated for two values of predicted dose, 0.1 Gy The datasets are temporarily stored in computer memory, and the program uses them for fitting the Poisson models described in the Methods section.

Results
As shown in Table 2, the coverage probabilities of all methods apart from 2DMC with BMA for the linear coefficient  are near the desired 95% level, irrespective of the magnitudes of assumed Berkson and classical error, whether shared or unshared.As would be expected for all methods save the uncorrected regression the magnitude of classical error makes no difference, since all regression correction methods use doses based on the true (rather than surrogate) dose as given by Eq. ( 1), without the classical error component.However, the coverage probabilities for the quadratic coefficient  are generally too low for the unadjusted and regression calibration methods, particularly for larger magnitudes of Berkson error (with GSD=50%), whether this is shared or unshared (Table 2).The ERC method also yields coverage probabilities that are somewhat too low when shared and unshared Berkson errors are both large (with GSD=50%), although otherwise it performs well, and coverage is uniformly better than all other methods (Table 2).In contrast MCML yields coverage probabilities for  that are uniformly too high (Table 2).
By contrast the coverage probabilities of both the linear coefficient  and the quadratic coefficient  for the 2DMC with BMA method are generally much too low, and when shared Berkson error is large (50%) the coverage probabilities do not exceed 5%.
Table 3 shows the coefficient mean values, averaged over all 500 simulations.A notable feature is that for larger values of Berkson error, the linear coefficient  for 2DMC with BMA is substantially overestimated, and the quadratic dose coefficient  substantially underestimated, both by factors of about 10.For all other methods apart from ERC the estimates of the quadratic coefficient  are upwardly biased, but not by such large amounts.There is upward bias in estimates of both  and  in the unadjusted analysis (using surrogate dose) even when there are no Berkson errors, for various magnitudes of classical errors, as shown by the first four rows of Table 3.As above, and as would be expected, for all models apart from the uncorrected one, the magnitude of classical error makes no difference.
Table 4 shows that the bias in RR for ERC evaluated either at 0.1 Gy or 1 Gy does not exceed 20%.Regression calibration performs somewhat worse, with bias ~40% when shared and unshared Berkson error are large (50%) for predictions at 1 Gy, although otherwise under 25%.For all but the smallest shared and unshared Berkson errors (both 0% or 20%) MCML has bias about ~25-45% at 1 Gy, although under 3% otherwise.2DMC with BMA performs somewhat worse, with bias of ~30-45% when Berkson errors are large, whatever the evaluated dose.Unadjusted regression yields the most severe bias, which exceeds 50% in many cases for predictions at 1 Gy, and when shared or unshared classical errors are large (50%) often exceeds 100% (Table 4).

Discussion
We have demonstrated that the 2DMC with BMA method performs poorly, with coverage probabilities both for the linear and quadratic dose coefficients that are under 5%, particularly when the magnitudes of classical and Berkson error are both moderate to large (20%-50%) (Table 2).This method also produces substantially biased (by a factor of 10) estimates of both the linear and quadratic coefficients, with the linear coefficient overestimated and the quadratic coefficient underestimated (Table 3).By comparison the ERC method yields coverage probabilities that are too low when shared and unshared Berkson errors are both large (50%), although otherwise it performs well, and coverage is generally better than for Bayesian 2DMC with BMA and all other methods (regression calibration, MCML, unadjusted regression).The upward bias in estimates of the  coefficient and the downward bias in the estimates of the  coefficient, at least for larger magnitudes of error (Table 3) largely explains the poor coverage in these cases (Table 2).The bias of the predicted RR at a variety of doses is generally smallest for ERC, and largest (apart from unadjusted regression) for 2DMC with BMA, with standard regression calibration and MCML exhibiting bias in predicted RR generally somewhat intermediate between the other two methods (Table 4).
The form of the 2DMC with BMA model that we fit differs slightly from that employed by Kwon et al 28 which uses an approximate Monte Carlo sampler, the so-called stochastic approximation Monte Carlo (SAMC) method of Liang et al 41 .Unfortunately Kwon et al do not provide enough information to infer the precise form of SAMC that was used by them 28 .It is possible that the SAMC implementation used by Kwon et al 28 may behave differently from the more standard implementation of BMA given here.Kwon et al 28 report results of a simulation study that tested the 2DMC with BMA method against what they term "conventional regression", which may have been regression calibration.They did not assess performance against MCML.
Kwon et al 28 report generally better performance of 2DMC with BMA against the regression calibration alternative.
The 2DMC with BMA method relies on Monte Carlo simulations from a Monte Carlo dosimetry system (MCDS).The key aspect is that ensembles of doses 11 () j n N ijk j k D == are produced for all individuals for a large number of scenarios i , 1 iM  .However, unlike other uses of MCDS it is assumed that only one of the dose scenarios i , and therefore one of the sets of dose realisations 11 () j n N ijk j k D == is the correct one.Essentially this method therefore assumes something like a combination of functional and structural approachesthere are assumed to be random errors in the data, but certain parameters are assumed fixed (but unknown).The BMA approach is used to reweight the scenarios depending on the goodness of fit 28 .So realisations where the risk-dose relationship was linear would be much more highly weighted than realisations where this was not the case.The contrast with MCML is quite pronounced -MCML works by averaging the likelihood in one go and then maximising the averaged likelihood with respect to the parameters of interest.
The 2DMC with BMA method appears designed for applications where there is a substantial amount of shared error.This method has been applied to analysis of thyroid nodules in a dataset of persons exposed to atmospheric weapons tests in the former Soviet Union 36 .The method has been much discussed 42 .Stram et al 42 suggested that the method will produce substantially upwardly biased estimates of risk, also that the coverage may be poor, somewhat confirming our own findings; although we do not always find upward bias, the coverage of both regression coefficients is uniformly poor (Tables 2, 3).The implementation of the methodology presently relies on proprietary software, and has only been used by the group that developed it.Another substantial problem with the method is the use of BMA, reflecting general criticism made of this class of models in the literature 43,44 .An implicit assumption of BMA is that one of the underlying models is the "true" one with convergence guaranteed to the "true" model 45 .
As previously discussed 35 the defects in the standard type of regression calibration that our modelling has revealed are not too surprising, as it is well known that this method can break down when dose error is substantial 31 , as it is in many of our scenarios.Nevertheless the method has the considerable advantage that once the conditional expectations have been derived conventional statistical software can be used to perform regressions.The method has been successfully applied to the LSS cohort by a number of investigators [15][16][17][18][19]46 and has also been used in a few other radiation exposed groups 25 . Thee have also been extensive applications in the non-radiation literature, reviewed by Carroll et al 31 and more recently in a series of papers by Shaw et al 33,34 .Calibration approaches that take account of mixtures of Berkson and classical error have also been developed and used to fit domestic radon case-control data 20 .More surprising is the relative weakness of MCML, which is discussed at greater length elsewhere 35 .
A Bayesian approach to the measurement error problem has been developed over the last 30 years which rests on the formulation of conditional independence relationships between different model components 47,48 , following the general structure outlined by Clayton 49 .These are quite distinct form the 2DMC with BMA method.In the canonical Bayesian approach three basic submodels are distinguished and linked: the disease model, the measurement model and the exposure model.The power of this Bayesian approach, as with MCML, is that the dosimetric uncertainty is (in principle) reflected in the variability of the model parameters relating dose to health effects.
An adapted Bayesian method of correction for measurement error has been fitted to various versions of the LSS mortality data 21,23,29 , also to an older version of the LSS incidence data 22 .Derivation of the posterior distribution is generally analytically infeasible, and relies on the MCMC algorithm, specifically the Metropolis sampler, which converges to the posterior distribution of the risk parameters.However, the speed of convergence is not known, and in practice one relies on a number of ad hoc tests of convergence such as the Brooks-Gelman-Rubin statistic 39,40 and other less formal methods, e.g., inspection of caterpillar plots.As such all one can know is when convergence has not taken place.Flexible and efficient software exists to run this on a number of platforms e.g.OpenBUGS 50 or rjags 51 .The method is exceptionally computationally burdensome.As with all Bayesian methods, in particular the 2DMC with BMA method used here, the choice of prior is critical.


taking values of 0.2 (20%) or 0.5 (50%).This individual dose data was then used to simulate the distribution of 250 N = cancers for each of 1000 m = simulated datasets, indexed by j , using a model in which the assumed probability of being a case for individual i is given by: the values derived from fits of a similar model to the 237 leukaemias in the data of Hsu et al37 .
ERC adjustedusing the mean true doses per group given by group means of the samples generated by expression (1), averaged over the 1000 n = dose samples, for each of 500 m = dose+cancer ensembles, and with additional adjustments to the likelihood outlined in Appendix A;(4) MCML, using the full set of mean true doses per group, the mean doses per group for each simulation being given by group means of the samples generated by expression (1), 19, 0.20-0.99,1.00-2.49,≥2.50Gy.The central estimates of dose we assumed are close to the person year weighted means of these groups, and as given in Table1, although for the uppermost dose group we assigned a central estimate of 2 Gy.

Table 2 . Coverage probability of profile likelihood confidence intervals for fits of linear-quadratic model.
Coverage probability evaluated using m=500 dose+cancer simulations.All but the rightmost two columns are taken from the paper of Little et al35.

Table 3 . Mean over m=500 dose+cancer simulations of regression coefficients in fits of linear-quadratic model.
All but the rightmost two columns are taken from the paper of Little et al35.Notes: GSD, geometric standard deviation.ERR, excess relative risk.

Table 4 . Percentage bias in relative risk predicted at 0.1 Gy or 1 Gy with various sorts of dose error correction, using mean regression coefficients over m=500 dose+cancer simulations (as in Table 3)
Notes: GSD, geometric standard deviation.