The data on contaminant concentrations in fish tissue show that PBTs generally covary positively across all types of waterbodies, with the exception of mercury against other contaminants in both lakes and rivers, and total PFCs and heptachlor against other contaminants in rivers. In coastal waters, these particular contaminants were not measured concurrently with others. The positive covariance among most contaminants implies that, lacking any more detailed information on fish characteristics or environmental conditions, one should assume that if one contaminant is elevated at a site, most others will be as well. However, in lakes and rivers, mercury is low when other contaminants are high in fish tissue. Gandhi et al. (2017) show empirically that the covariance patterns present in fish tissue data from the Great Lakes imply that the current practice of basing fish consumption advisories on the single most restrictive contaminant is not adequately protective. Gandhi et al. note, however, that both the magnitude and sign of the covariance among contaminants can differ according to location and fish species. They thus call for a more generalizable statistical approach for performing chemical mixture assessments than the simulation-based method they employ. In particular, they call not for multiple single-contaminant models, but rather for single multi-contaminant models such as that employed in the current study.
The performance of our multivariate modelling approach differed across the three data sets. In lakes, the joint distribution of multiple contaminants was captured reasonably well, with R2 values as high as 0.72 and never below 0.29, and similar proportions of covariance among contaminants explained by the model. This is a level of explanatory power that is similar to previously published, single-contaminant models [5,30, 35]. The model performed less well in rivers, with captured proportions of variances and covariances ranging from 0.16 to 0.59. This may be because riverine fish may occupy a variety of locations over the course of their lifetimes and therefore land use variables are not predictive fish tissue contaminant concentrations. The model of coastal marine systems was able to account for only a small proportion of the variance and covariance among fish tissue contaminant concentrations (R2 values ranging from 0.04 to 0.66, with most in the lower half of this range). This may be because of the wider variety of tissue types, species, and corresponding life cycles of fish included in this data set. Also, as for rivers, fish living in coastal areas are highly mobile and may not spend the majority of their time near the land used to calculate land use percentages. In general across all waterbodies, the contaminants best fit by the models were mercury, total PCBs, DDT and its metabolites, the nonachlors, and oxychlordanes and their covariances.
The majority of the (co-)variation accounted for by the models could be attributed to fish characteristics, such as tissue type, lipid content, length, trophic level, and species. However, watershed land uses and waterbody characteristics were also significant predictors – again in line with the results of single-contaminant modeling [5, 30, 36]. In the model of the EMAP lake and river data, for example, lakes with watersheds having more urban land showed consistently higher fish tissue contaminant concentrations (except for mercury). For coastal fish, agricultural and forested land were significant predictors of contaminant concentrations, although the sign of these relationships varied by contaminant.
Water chemistry has been shown to be important in the bioaccumulation of Hg but is less well explored for organic contaminants [30, 37–39]. We found that fish tissue mercury concentrations decreased with greater alkalinity in lakes and pH in rivers. This is consistent with the finding that acidity is associated with enhanced mercury methylation and uptake in lower alkalinity and pH systems [38]. Concentrations of contaminants other than mercury were positively associated with pH levels in rivers. While metal contaminants are more bioavailable at lower pH, there is little research on the relationship of pH with organic contaminant bioavailability. Mercury, PFCs, and some other contaminants were found to be positively associated with DOC in riverine fish from the NRSA dataset. Like pH, the influence of DOC on metal transport and bioavailability are known but not commonly studied for organic contaminants. Mercury concentrations in water are positively related to DOC, but DOC appears to reduce the bioaccumulation of mercury particularly at high DOC ranges [40, 41]. In the few studies that exist for organic contaminants, the influence of DOC does not show consistent patterns [42–45].
Our intent with this study was to demonstrate a method of modeling multiple contaminants simultaneously, using a Bayesian approach that can account for both missing and censored values (i.e. below detection limit). With such a model, one can assess how much of the covariance present among fish tissue contaminants can be accounted for by similar responses to modeled factors, and how much remains unaccounted for. Because we used a simple linear approach to model many contaminants, in many species, across many aquatic environments, each with particular fish characteristics, water chemistry attributes, and land use variables, our model results may not be as precise or mechanistically interpretable as more focused analyses. Yet, the R2 values we were able to obtain are comparable to those of previously-published statistical models of single contaminants.
A more novel contribution of this study is our demonstration that, while a multivariate model can explain some of the covariance inherently present among contaminants, the proportion of covariance that remains unexplained is at least as large as the proportion of variance left unexplained for individual contaminants. This means that the application of multiple single-contaminant models – even those that fully account for predictive uncertainty – misestimate true human health risk. This is because unexplained covariance implies that, even after all modeled factors are taken into account, the likelihood of unusually high (or low) levels of one contaminant at a site are not independent of the likelihood of unusually high (or low) levels of another contaminant. The fact that most contaminants (except mercury and heptachlor in rivers) display unexplained variation with positive covariance means that unexpectedly high levels of multiple contaminants are likely to occur simultaneously, exacerbating human health risk.
It is important to emphasize that this underestimation of health risk that results from ignorance of positive error covariance among contaminants is a systematic bias and not just an overlooked contribution to predictive uncertainty. This is clear from the fact that the variance of the sum of two correlated random variables X1 and X2, for example, is computed as:
$$\begin{array}{c}{\sigma }_{{\text{X}}_{1}+{\text{X}}_{2}}^{2}={\sigma }_{{\text{X}}_{1}}^{2}+{\sigma }_{{\text{X}}_{2}}^{2}+2\bullet Cov\left({\text{X}}_{1}, {\text{X}}_{2}\right)={\sigma }_{{\text{X}}_{1}}^{2}+{\sigma }_{{\text{X}}_{2}}^{2}+2\bullet \rho \bullet {\sigma }_{{\text{X}}_{1}}\bullet {\sigma }_{{\text{X}}_{2}}\left(2\right)\end{array}$$
where σ2 represents variance, Cov( ) represents covariance, and ρ represents correlation between X1 and X2. Thus, with positive covariance/correlation between two variables, the variance of the sum is always greater than the sum of the variances. Thus, the probability of high concentrations will be larger than would be calculated by assuming independence between the variables. This result can easily be extended to more than two random variables and to any linear combination of variables, making it applicable to the analysis of contaminants assumed to have additive health effects [16].
Models can be useful tools for predicting fish tissue contaminant concentrations at unsampled sites or under new conditions as a function of watershed attributes, water quality measures, and fish characteristics. However, model-based estimates of human dietary exposure for the purposes of setting fish consumption advisories need to account for the full distribution of possible contaminant concentrations, not just single point estimates. Several studies have demonstrated the value of a Bayesian approach in this context, as these methods allow for the characterization of both parametric and structural error [35, 46, 47]. This approach has long been advocated in the academic literature [48, 49] and is starting to make its way into fish consumption advisory decisions, at least for individual contaminants [50]. The present study is the logical extension of that approach to multiple contaminants, acknowledging that, if one contaminant happens to be unexpectedly elevated at a site, others are likely to be as well.
While our study demonstrates a moderate ability to predict contaminant concentrations with a multivariate model, much of the covariance among contaminants remains in the residuals. This indicates that the presence and composition of mixtures are relatively unpredictable, even knowing watershed attributes, water quality measures, and fish characteristics. Further analyses might reveal the importance of including additional predictor variables or formulating nonlinear models, and thus more precisely characterize the patterns of suites of contaminants. In the meantime, uncertainty in both single contaminants and their co-occurrence needs to be reflected in fish consumption advisories and in more holistic assessments of risk-risk and risk-benefit tradeoffs [51, 52].