Characterizing Environmental Inequalities Using Integrated Exposure Assessment and Spatial Approach


 BackgroundAt a regional or continental scale, the characterization of environmental health inequalities (EHI) expresses the idea that populations are not equal in the face of pollution. It implies the analysis in order to identifying and managing areas at risk of overexposure where increasing risk to human health is suspected. The development of methods is a prerequisite for the implementation of public health actions aimed at the protection of populations.MethodsThis paper presents the methodological framework developed by INERIS (French national institute for industrial environment and risks) to identify a common framework for conceptualizing and operationalizing environmental exposures as an important step towards articulating a science of EHI. We develop an integrated exposure assessment approach capable to integrate the multiplicity of exposure pathways from various sources, through a series of models up to the internal exposure.ResultsMeasured data from environmental networks reflecting the actual contamination of the environment are reused to characterize the population's exposure. Sophisticated methods of spatial analysis are applied to include additional information and take benefit from spatial and inter-variable correlation to improve data representativeness and characterize associated uncertainty. Integrated approaches bring together all information necessary for assessing the source-to-human-dose continuum using Geographic Information System, multimedia exposure and toxicokinetic model. ConclusionThis framework could be used for many purposes, such as mapping EHI, identifying vulnerable populations and determinants of exposure to manage and plan remedial actions and assessing spatial relationships between health and environmental to identify factors that influence the variability of disease patterns.


Introduction
World Health Organization in a recent report (2012) [1] has identi ed environmental inequalities as a priority issue in need to be addressed by the national governments in Europe. Reducing health inequalities requires identifying and characterizing exposure in order to interpret how they accumulate across a territory and prioritize interventions. As the health status of a population is the result of complex interactions between several social, territorial and environmental factors, all related information needs to be studied in order to assess it. At a regional or continental scale, the characterization of environmental health inequalities (EHI) expresses the idea that populations are not equal in the face of pollution. It implies the analysis in order to identifying and managing areas at risk of overexposure where increasing risk to human health is suspected. The development of methods is a prerequisite for the implementation of public health actions aimed at the protection of populations. Constructing tools to guide public action in order to reduce EHI requires the evaluation of phenomena not always simple to apprehend and the reliability and representativeness of available information that usually demands statistical processing [2].
After more than 10 years of actions aimed to the prevention of environmental health risks the third national plan for health and environment (PNSE 3, 2015-2019) proposes a new EHI approach that is not only more robust and connected to the territories but also integrating the scienti c concept of exposome. The recently emerged term of exposome [3] is used to describe these complex exposures, considering all sources, routes, and -when possible -the interactions of stressors, that are likely to contribute to the health alteration of individuals. The external contribution to the human exposome is determined by environmental exposure, also termed the eco-exposome [4] and includes exposure from air, water, soil and food exposure media. A coherent exposure assessment conceptual framework is needed to tackle EHI permitting the estimation of the magnitude, frequency and duration of exposure to chemicals, along with the number and characteristics of the population exposed.
Quantitative exposure assessment for environmental inequality characterization poses speci c questions that need to be addressed: -Identi cation of contamination(s) source(s); -Characterization of exposure mechanisms (pathways and relevant routes); -Prioritization of vulnerable populations or speci c susceptible groups (e.g. infants); Contamination process is extremely complex and variable through space and time, with localized multiple sources at larger scale. At a regional scale to better evaluate exposure to large chemical emissions, fate and transport models could provide both the ability to account for relevant spatial variability (e.g., around emissions sources or highly populated areas) and temporal variability during a speci c time of contamination [5].
Exposure assessment to identify and characterize the territorialized EHI depends on the availability of data. Exposure assessment is generally complex due to a lack of data and the inherent natural variability in exposure levels, leading to uncertainty in the estimates [6]. The temporal support also differs between the available data (punctual measurements, annual averages, etc.) which also requires additional treatment. Furthermore, they often lack a common spatial support and therefore preliminary spatial analysis is required in order to homogenize them or increase their resolution. The available databases are often assembled for diverse objectives, and often re-processed using statistical methods. Spatialization and crossing of these data pose several methodological di culties and can introduce uncertainties in the cartography process carried out. For this reason, different methods and techniques are employed to speci cally treat environmental databases in order to take bene t from all available information and reduce the uncertainties (see Section 3.2). This paper presents the methodological framework developed by INERIS to identify a common framework for conceptualizing and operationalizing environmental exposures as an important step towards articulating a science of EHI.
In order to build a calculation infrastructure able to characterize the eco-exposome at the territorial level, we had to solve several methodological issues: (1) de ne an integrated exposure assessment framework that rst requires different scienti c limitations to be overcome, such as the linkage of the global sourceeffect chain, (2) provide statistical methods and numerical tools that would allow spatial and temporal data processing from existing environmental and populational databases, (3) link, adapt or develop transport and transfer models.

The Integrated Exposure Assessment Framework
The characterization of the territorialized exposome implies the development of dynamic, multidimensional, longitudinal approaches, and information systems that require the adoption of transdisciplinary methods of data analysis. To respond to the general objective, it is required to integrate and combine of various levels of data from different environmental compartments and exposure media. Data and information emerging from an expanding eld of exposure science can be integrated in the exposome conceptual framework that provides the necessary linkages between source and internal exposure and helps to identify and compare relationships between different levels at critical life stages, personal health outcomes, and health disparities at a population level across space, place, and time [7]. This framework could be a layered structure that describes the elements of exposure pathways (Fig. 1), the relationship between those elements, and how data describing the elements is stored and used for selected outputs, such as exposure assessment, exposure prediction, epidemiology or public health decision making [8]. Re ned aggregate exposure assessment is data-intensive, requiring detailed information at every step of the source-to-dose pathway. Integrated exposure assessment requires 1) methodologies to allow calculating the aggregate exposure systematically and 2) computational research tools to estimate the exposure from the different contributing sources. For example, integrated approaches could bring together all information necessary for assessing the source-to-exposure continuum referring to the linkage, connecting the source of the exposure with the target exposure. In the context of mapping the environmental inequalities enabling the identi cation of vulnerable individuals and communities at risk in order to target public health interventions, additional requirements are needed in the exposure assessment processes compared with classical risk assessment methodology.
The environmental inequalities operate at different scales (global, regional, local) and could not be apprehended by the study of a single medium, but by the integration of varied contamination pathways: air, water, soil and food. The design study should be able to: -integrate the processes that take place at the interface between the environment contaminants of interest and the organisms, -characterize the principal exposure pathways, -de ne realistic scenarios that integrate the past and present sources, -describe the phenomena at a ne temporal and spatial resolution. Based on the needs described above, research objectives are to bring together all available information within a coherent methodological framework for assessing the source-to-dose continuum covering an extensive chemical space. We develop an integrated exposure assessment approach able to integrate the multiplicity of exposure pathways from various sources, through a series of models up to the internal exposure. The main objective of our projects, i.e. testing the feasibility of the methodology, has been achieved. Our framework allows for substances of interest: -identifying areas of potential overexposure by analyzing the variations of the indicators in space, -analyzing sources and environmental compartments potentially associated with overexposure, -explaining the variability of exposure inequalities for pollutants and study areas, -estimating of internal exposure and link with human biomonitoring data.
This approach involved the implementation of different models, namely atmospheric dispersion modeling, spatial analysis for environmental and population data processing, a multimedia exposure model and a physiologically-based pharmacokinetic (PBPK) model. The models have been adapted and coupled to allow the integration of the output data of an upstream model as input data of the downstream model. The coupling also allows the integration of information on the sources of contamination, the quality of environmental media and resident populations on the same analysis medium, namely the reference grid. A Geographic Information Systems (GIS) thus provides the opportunity to cross the estimated exposure with biological impregnation data to provide interpretive elements of the environmental determinants of exposure. The coupling of numerical and statistical models has established a scienti c and technical basis for the integration, data processing and assessment of the transfer of contaminants from the environment to the populations. In that way, it is possible to integrate all available data, despite their heterogeneity in a common spatial support. The referent grid selected, allows the re ection of local variations, and the integration of environmental monitoring databases in France. The GIS modeling platform enables the coupling and interoperability of all spatial data via the reference grid on which the input data and the variables of interest are discretized after processing.

Integrated data from existing environmental health monitoring programs
Many different approaches can be used for quantifying environmental exposures: direct methods (measuring, monitoring or biomonitoring) or indirect methods, involving exposure estimations from measurements and existing data, like environmental monitoring, questionnaires and exposure models. The availability of data on the geographic area of interest for the pollutants evaluated is an essential prerequisite. The quality and usability of all environmental data should be assessed before employing them in the health or risk assessment processes, as many factors can bias environmental sampling results [9]. Ideally, direct measures of exposure (e.g., biomarkers or personal monitoring data) for all key stressors related health effects, throughout the critical time-period of exposure, and in the population of interest would be necessary10. However, exclusive use of biomarker data in exposure assessment to characterize EHI is currently not practicable when considering a large number of diverse chemicals due to analytical and resource limitations [11] speci cally when the assessment should cover a large territory and ne resolution. Environmental quality data are often available at a ne administrative or resolution level and enable the building of environmental indicators on a regional or national scale. The processing of variables for the identi cation and characterization of environmental inequalities depends on the reuse of this type of data, which is very diverse by nature regarding its initial intended objectives. Determining how representative those measured levels of contamination are of other locations or time frames is not always a simple task [12].
Databases in health and environment have been developed for several years. They evolve and are in full expansion. Actions to identify and monitor the quality of the environment for soils, water and air are conducted by different agencies, institutes or observatories. The production of this type of data and advances in computer technology allow their reuse in conceptual frameworks and with objectives different from those that prevailed in their implementation. The emergence of quality data and their integration into GIS make it possible to conduct territorial analysis work. These environmental data re ect the actual contamination of the environment and therefore of the global exposure of the populations. The indicators based on these data allow to characterize the population's exposure and its evolution regarding the implementation of public prevention policies. In the context of reuse of this type of data for the purpose of expology, a database must be set up in which the variables are associated with the modes of exposure (concentrations in the environmental and exposure media are present, eating behavior, spacetime budget, ...).
These variables must know several stages of process to allow the construction of indicators: -the identi cation of data sources allowing the construction of the different variables, -the acquisition of these data in view of the access modalities, the nancial, legal or human aspects, -the analysis of the quality and representativeness of the databases regarding the objective of the study (choice of a database, validity and representativeness of the data) sometimes involving the approximation or the application of simplifying assumptions, -the preprocessing of databases: cleaning the databases, rebuilding missing data, -the construction of ad-hoc data where the appropriate data sources are not available or exhaustive in relation to the objectives of the study, -data transformation (homogenization, aggregation or disaggregation of data).
The estimation of exposure requires knowledge of the concentrations of environmental compartments to which an individual or a population is exposed. These concentrations can be measured or modeled. A wide range of data might potentially be mobilized for integrated assessment. The database selection or study design de nition should be guided to reach the best compromise between data representativeness and method robustness, consistent with the objectives of the study.
Characteristics of air pollution (e.g., chemical components, particle properties) vary spatially [13] and may differ between areas near and far from monitors [14]. Automated monitoring networks operate in Europe providing detailed air quality information on a regular basis. The soil routes of exposure to humans are inhalation of dust and vapor coming from soil contaminants, ingestion of contaminated soil particles (mainly for children) or contaminated food, and dermal absorption through the skin. Once a site is considered as contaminated, it is necessary to provide enough accurate data to minimize lack of statistical representativeness and increase the spatial quanti cation. The time spent for evaluating the presence and extent of contamination can be reduced by an adequate sampling plan [15] which can at the same time, reduce the project costs [16]. A soil monitoring system could be the source of the comparable and objective data on the current state and evolution of soils. The database of the soil monitoring system allows the creation and maintenance of data for each of the monitoring sites of agricultural land as well as the preparation of data for further processing through specialized programs [17]. Position information provides a link to the GIS, and thus opens the possibilities for further spatial analysis, the identi cation of risk areas and their assessment. By example, in France, soil pollutant stocks and properties and most explanatory variables were derived from the French National Soil Monitoring Network (Réseau de Mesures de la Qualité des Sols or RMQS). The RMQS surveys soils and their properties on a regular 16 km grid across the French mainland territory (around 2,200 sites covering 550,000 km²) [18].
The Drinking Water Directive (80/778/EEC), and its successor (98/83/EC which comes in force in 2003), aims to ensure that water intended for human consumption is safe. In addition to microbiological and physicochemical parameters, a number of toxic substances such as pesticides, polyaromatic hydrocarbons, cyanide compounds, and heavy metals are to be monitored. This is because the raw supply may be contaminated, for example, with pesticides from agricultural land which have leached into groundwater or from contamination within the distribution system, such as lead from piping. In France, 300,000 samples are tested each year. Indeed, tap water is one of the most strictly controlled foodstuffs. Each year, the health agencies carry out close on 12.3 million tests covering all of the country's public water and wastewater services (both publicly and privately managed). In 2013, more than 8.1 million tests were carried out on services managed by private water company.
Work has been carried out by INERIS to identify environmental and spatialized databases for the purpose of characterizing exposures by associating the main producers and data managers identi ed [19,20]. It allows to propose elements for the speci cation of environmental health platforms and to improve the integration of data in the framework of building an environmental health tracking information system. However, spatial data used to characterize environmental exposures have not always been initially collected and collated to meet these objectives, resulting in use bias. Measuring frequencies or spatial densities of sampling are not always su cient. To partially overcome these problems, different techniques are adopted to speci cally address the different environmental, behavioral or population databases. The selection of a treatment method depends on the problem to be solved and the quality of the data available.

Statistic approaches to link and optimize data representativeness
The data available in a region of interest characterize levels of contamination at very speci c locations, over a given spatial support (i.e. the support on which the data is measured such as point, surface or volume), and for very speci c time frames. In order to construct the exposure maps from spatialized databases in the context of evaluating environmental inequalities, the development of methods is required to process and harmonize the available data, with respect to their speci cities (missing values, limited number of observations, etc.) in the same resolution and support.
In the mathematical eld of numerical analysis, interpolation is a method of constructing new data points within the range of a discrete set of known data points. During the last years the increasing availability of spatial and spatiotemporal data pushed the developing of many spatial interpolation methods, including geostatistics [21]. Spatial interpolation includes any of the formal techniques which study entities using their topological, geometric, or geographic properties. Spatial dependence is the co-variation of properties in a geographical space: features at nearby locations seem to be correlated. The fundamental principle is Tobler's rst law on geography: if the interrelation between entities increases with proximity in the real world, representation in geographical space and evaluation using spatial analysis techniques are appropriate [22]. These interactions are all stronger as the locations concerned are closer. In statistics, spatial autocorrelation measures the correlation of a georeferenced variable with itself. It makes it possible to measure the degree of similarity between neighboring observations. This spatial dependence implies the infringement of the assumptions made in the classical statistical techniques which suppose the independence between the observations. Spatial dependence should also be considered as a source of information. To characterize the different scales of local, regional and global variability of the phenomena studied, the analysis of spatial data structures through geostatistical tools (variogram, autocorrelation analysis) is often employed [23].
Several more sophisticated methods of spatial analysis can be applied to include additional information and take bene t from spatial and inter-variable correlation to improve data representativeness and characterize associated uncertainty [24].
For air, several methods for estimating exposure to air pollutants exist, including monitor-based approaches such as proximity-based assessments and statistical interpolation, as well as land-use regression and air quality modeling [25]. Using data from existing monitoring networks remains popular, due to cost considerations, data availability, and population coverage. Such statistical methods are aimed at using multiple types of information to inform exposure estimates and allow to estimate exposure in areas far from monitors. In addition to fused data, several other approaches have been developed to estimate individual-and population-level exposures, including various interpolation methods, land use regression (LUR) models, aerosol measurements obtained from satellites, and sourceand tra c-proximity analysis [26]. Stochastic methods such as kriging are preferred [27]. An issue commonly reported is the availability of data. Some databases include some limitation (as a limited number of observations by example) and therefore it is not possible to assess the population's exposure adequately. External drift kriging is then widely used in air and soil quality modeling, in order to combine different kind of information to include secondary information in the model.
Machine learning use algorithms and statistical methods to "learn" information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases. Machine learning allow for example to build a metamodel from a dataset of deterministic model outputs. The fundamental concepts of machine learning and its usages in spatially distributed data are given in Kanevskij et al [28].
In order to construct the exposure maps from spatialized databases in the context of evaluating health risks, methods have been developed to process and harmonize the available data, with respect to their speci cities (missing values, limited number of observations, etc) in the same resolution and support. A GIS-based modeling platform developed by INERIS for quantifying human exposure to chemical substances (PLAINE: environmental inequalities analysis platform [29]) aims to spatialize an environmental indicator related to human health using risk assessment methods and mapping environmental disparities at a ne resolution. The main aim of the PLAINE Project, developed in France, is to develop a platform of environmental and health data. This platform is developed for systematic collection, integration, and analysis of data on emission sources, environmental contamination, exposure to environmental hazards, and population and health. Ad-hoc methodologies are used to align the available data to the same pixels. Spatial analysis and statistical methods are employed to process (georeferencing, data controlling, pre-processing, re-formating) and assemble the databases for the purpose of the study, using R and QGIS. By example, atmospheric concentration data were collected in France in the context of regulatory surveillance for two years (2010 and 2011). Estimation of concentrations over France by classical interpolation method could lead to a misrepresentation of the spatial distribution due to the limited number of observations. To address this issue, auxiliary variables in the context of external drift kriging [30] were employed. The best auxiliary variable to de ne linear drifts was found to be the one that includes the atmospheric emissions as well as the population and the altitude. Measurements of PAH topsoil concentrations are available through the French Soil Monitoring Network. Qualitative data on the polluted sites localization are integrated by processing distance-topolluted soil proxy. These, along with 14 variables about physicochemical soil properties were combined in a hybrid regression-kriging and tted using Random Forest [31] models, were shown to outperform the traditionally used linear regression. Due to its hydrophobic nature, B[a]P is found in water in small concentrations; therefore, the exact measurement cannot always be reported. The observations under the detection limit rate is quite high, which requires careful handling. A complex multiple imputation method was developed in order to extract the maximum information from the available measurements without introducing too much bias in the results. This one permits to take advantage of the temporal aspect and correlations between substance of interest and other PAH substances. Spatial estimation of water concentrations was carried out by taking into account the multi-annual data and the network water distribution complexity using a bootstrap based expectation-maximization algorithm. The above methods permitted the construction of a representative spatial database in a 9 km 2 grid of reference on the whole France (550,000 km²) used to perform the integrated exposure assessment [3].

Outdoor air dispersion modeling
Atmospheric chemistry and dispersion modelling experienced important improvements in the last two decades. Nowadays, a large variety of modelling systems and options exist, from simpler to more complex ones, covering global or regional to urban and street level scales.
Air quality models simulate the atmospheric concentrations and deposition uxes to the Earth's surface of air pollutants by solving the transport equations that represent the emissions, advection, diffusion, transformations and removal of those air pollutants and associated chemical species.
Contemporary air quality models can be grouped into two major categories: models that calculate the concentrations of air pollutants near a source (source-speci c models). The Gaussian models simulate the atmospheric dispersion of non-reactive pollutants near the source (steady-state approach). Lagrangian models are also source-speci c models, which treat atmospheric dispersion of reactive substances as a source-speci c process; Eulerian models that calculate concentrations of reactive air pollutants over large areas ranging from an urban area, to a region, a continent and the globe (grid-based models).
Inputs to air quality models include the emission rates of primary air pollutants and precursors of secondary air pollutants, meteorology (three-dimensional elds of winds, turbulence, temperature, pressure, boundary layer height, relative humidity, clouds and solar radiation …), and boundary conditions (baseline or background conditions). For grid-based models, an emission model is used to translate an emission inventory into a spatially distributed and temporally resolved grid structure.
As an example, INERIS used BaP as a tracer of the carcinogenic risk associated with PAH has been the subject of several recent studies using the CHIMERE model at European scale [36,37]. The population exposure estimate shows that 20% of the European population is exposed to BaP background ambient concentrations above the EU target value and only 7% live in areas with concentrations under the estimated acceptable risk level of 0.12 ng.m -3 . Heavy metals have also been addressed using the CHIMERE model [38], modelling Pb, Cd, As, Ni, Cu, Zn, Cr and Se air background concentration in Europe. Evaluation of the model performance in order to see its capability to reproduce observed levels shows that more recent annual totals, information on snap activities for each metal, higher spatial resolution and a better knowledge of the temporal emission behavior are necessary to adequately model these air pollutants.

Multimedia exposure models
Spatially resolved multimedia fate and multi pathway exposure models facilitate the prediction of environmental concentration distributions, related levels of contaminants in different sources, and the fraction of a chemical release that will be taken in by the entire human population (the intake dose) at the regional or local scale. When spatial resolution of computations is low, usually variations in environmental characteristics tend to average out, and adoption of roughly selected representative or characteristic values allows the depiction of the correct orders of magnitude of outputs. Research has been starting to cope with spatially explicit models of fate and transport with increasing resolution, and now a few models with resolution from a few tens of km up to 1 km are available for calculations at the continental scale [39,40]. However, in any case the computational effort associated with this modeling strategy is generally quite high and limits routine applications when a large number of chemicals need to be evaluated.
A multimedia fate and exposure model called Modul'ERS [41,42] developed by INERIS is used to estimate intakes from air inhalation and soil, tapwater, marketed food products, as well as local-produced fruits and vegetables ingestion. Local foodstuff concentrations are estimated using atmospheric deposition of particulate pollutants, air (for POP) and soil concentrations. As mechanistic and dynamic models for plants required many input data that can be di cult to de ne (lack of data, di culty for estimating the magnitude of variability and uncertainty of data and even anticipating the qualitative effect of variation of input data on results), contributions of air gaseous and soil concentrations to edible organs of plants are estimated from bioconcentration factors, which are speci c to the different categories of fruit and vegetables cultivated in domestic gardens and time average concentration during culture duration.
Therefore, the inputs of the model for media concentration estimates are georeferenced environmental databases (with a direct reuse of the treated data incorporated int the GIS for tapwater and marketed food products).
In the model used, attention was focused on the quality of values used to de ne all inputs (exposure, environmental and chemical parameters). Analyses of available data were systematically conducted. For most of the parameters, all the data collected with their contextual information, as well as the selection criteria used are described in dedicated reports. Depending on the level of knowledge, the quantity and the relevance of the data available, the parameters are nally de ned with a point value, a range of values or a probabilistic distribution.
3. Physiologically-based pharmacokinetic (PBPK) models PBPK models are a speci c class of biokinetic models based on the physiology and the anatomy of the individuals that are able to predict the kinetics and metabolism of substances in the body. Those models describe the body as a set of compartments corresponding to speci c organs or tissues (e.g., adipose, bone, brain, gut, heart, kidney, liver, lung, muscle, skin, and spleen, etc.). Between compartments, the transport of substances is dictated by various physiological ows (blood, bile, pulmonary ventilation, etc.) or by diffusion [43,44]. The model structure can be described by a set of differential equations, with parameters representing blood ow rates, organ volumes etc., for which information is available in the published scienti c literature or may be obtained in vitro [45]. Numerical integration of that differential system computes the quantity and concentration of the drug considered in each compartment, as a function of time and exposure dose. A stochastic whole-body physiologically-based pharmacokinetic model over the human lifespan has been developed by INERIS [46] and integrated in the EHI context to predict internal concentration such as concentrations in blood but also in other tissue or biological matrices (urine) from multi-route exposure (inhalation, ingestion, dermal exposure). Those models are used to link exposure with biomarker data [47,48] and have proven to be successful in integrating and evaluating the in uence of age or gender-dependent changes with respect to the pharmacokinetics of xenobiotics throughout the lifetime [49, 50].
Each model represents a different component of the continuum emission-environmental qualityexposure-internal dose and effects). These models can operate in different spatio-temporal scales, which poses a challenge when coupling them in a coherent framework and can result in structural uncertainty and deep time calculation problem.

Conclusion
The exposome concept has been proposed as an emergent exposure science paradigm for conceptualizing the cumulative effects of environmental exposures across the whole human life. The need for risk manager to identify population at-risk in the context of substantial data de ciencies that hinder evaluation of cumulative health risks brings the operational declination of the concept at the territorial scale in the EHI characterization context. The characterization of the territorialized exposome implies the development of dynamic, multidimensional, longitudinal approaches, and information systems that require the adoption of transdisciplinary methods of data analysis. For example, integrated approaches bring together all information necessary for assessing the source-to-human-dose continuum using GIS, multimedia exposure and toxicokinetic model.

Consent for publication
Not applicable.

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.