Contribution of Bayesian Networks to Environmental Health Issues, Application to Etang de Berre


 Background: The aim of this study is to examine in detail the potential links between proven pathologies in the population and the atmospheric pollution to which this population is exposed, in an industrial context (Etang de Berre) and an abundant cocktail of pollutants, in connection with the diversity of emitting sources within that space (petrochemical industry, steel industry, refining, energy production, cement works, road, sea and air transport).Methods: A total of 178 variables of various natures (health, environmental and socio-economic) on a fine territorial scale (infracommunal) have been mobilised and simultaneously integrated within a Bayesian model based on artificial intelligence. Various unsupervised and supervised algorithms, and also sensitivity analyses provide the means to discover the links between these variables and the inhabitants’ living space.Results: By mobilising a high number of variables, broken up on a fine territorial scale, and by resorting to an appropriate tool to report on existing links between these variables, we were able to bring to the fore a number of relations between exposure and proven pathologies. For example, between cadmium and diabetes in the over 65 year-olds, or between vanadium and respiratory diseases. Threshold effects have also been revealed, notably for SO 2 which appears from a very low exposure threshold (6µg/m 3 ), therefore far below standard, which is set at 50µg/m 3 . It is also the case with hydrofluoric acid (HF), the effect of which is already felt from the 0.0028 μg/m 3 threshold. Also, exposure to cadmium in its particle phase, for an exposure between 0.214 and 0.250 μg/m 3 , disrupts insulin metabolism in the over 65. Finally, as soon as Benzo[k]fluoranthene (particle phase) exceeds the 0.672 μg/m 3 threshold, it results in a higher number of hospital stays for respiratory diseases in the 15-65 age group. A vulnerability differentiated according to age groups also appeared. Comorbidities known in literature have been found (respiratory and heart diseases). Also found was the influence of socio-economy on some pathologies (single-parent families and those without qualification at 15). Finally, diffuse PCBs pollution has been observed in the study area.Conclusion: The study has brought forward differentiated health profiles between the various IRISes 1 constituting the Pays de Martigues territory, and this should lead to the deployment of health services better suited to the needs of the population from a prevention perspective, and even to promote consultations with some specialists. For example, with a cardiologist, in those IRISes where the number of unqualified 15 year-olds is significant, since a relation has been found at that level, a relation most probably due to more difficult living and/or working conditions for these people. Likewise, prevention and social support measures should be taken to the benefit of single-parent families. At the same time, companies should go ahead with the desulfurization process of their installations, and even reduce their emissions concerning some noxious pollutants.Trial registration: Decision DE-2017-413 authorising the Métropole d’Aix Marseille Provence to carry out the processing of personal health data for a retrospective and local environmental health study titled “Air-Health study in the Pays de Martigues territory.” (Authorisation application No. 917120)


Background
Open onto the Mediterranean, on the shores of the Etang de Berre, the Pays de Martigues constitutes a living area for close to 71,000 inhabitants spread over 3 communes: Martigues, Port-de-Bouc, Saint-Mître-les-Remparts. In addition to its geography, Martigues is Europe's leading petrochemical complex and therefore has a high concentration of polluting industries impacting the quality of air and the health of populations.
It was only from 2013 that a number of scientific studies began examining the health effects of air pollution at the Etang de Berre level. First came the REVELA study conducted by Santé Publique France between 2013 and 2016, focused on kidney and bladder cancers and acute myeloid leukaemia, that showed incidence rates for bladder cancers higher than those observed in mainland France [1]. Then, the FOS EPSEAL [2,3] study launched in 2015 and conducted in Fos-sur-Mer and Port-Saint-Louis-du-Rhône attracted considerable attention because health data were collected on a random sample of inhabitants (participatory study), [4]. Nevertheless, the study revealed that chronic diseases and acute symptoms were a health experience shared in both cities. The prevalence in adults of cumulative asthma (starting most of the time in adulthood), of cancers (notably in women) and diabetes (notably type 1 diabetes) is higher in Fos-sur-Mer and Port-Saint-Louis-du-Rhône than in France on average. Respiratory ailments (hay fever excepted) concern almost one in two adults and one in four children. The INDEX [5] study conducted in 2016 at Fos-sur-Mer by the Institut Ecocitoyen pour la Connaissance des Populations (IECP) on blood and urine samples of a selection of 138 inhabitants revealed an over-impregnation of the population exposed by inhalation with lead, with two furans characteristic of industrial emissions and with benzene, but only in the oldest people. Gardening in an exposed zone was associated with a higher impregnation with total PCBs compared to the control zone. The fact of consuming vegetables from the garden was associated with a higher impregnation with cadmium in the exposed zone, whereas the effect was protector in the contact zone (Saint-Martin de Crau). Consuming local seafood (fish, shellfish) was associated with a higher level of impregnation with PCBs, dioxins/furans, mercury and chrome. Finally, the Scenarii and POLIS [6,7] studies conducted in 2010 by AtmoSud in 66 communes around Etang de Berre and 39 substances led to calculating an excess health risk in a number of overexposed sectors. This paper presents the results of a retrospective health geography study [8] of crossed data on pollution, impact of pollution on the environment (lichens), precariousness of residents and hospital stays by age group (i.e. 178 variables) carried out in the year 2015 in 3 communes of the Pays de Martigues, south of the Etang de Berre (Martigues, Port-de-Bouc and Saint-Mitre-les-Remparts) at the scale of the 30 Pays de Martigues IRISes, hence on a fine spatial scale.

Materials
Health data They consist of 18 variables relating to pathologies of interest according to the 3 standard age groups (<15 years, 15-65 years, > 65 years), they relate to patients having stayed either at the Martigues Centre Hospitalier (CHM), or the Martigues Clinique privée, or in one of the 5 public hospitals 2 or else at the Paoli Calmettes Institute (IPC), in 2015. Therefore, these data relate to numbers of hospital stays and not to numbers of patients. It was impossible in France to obtain "patient numbers" at the time when this study was carried out. The gender, in addition to the age segmentation, could not be retained because the "age" and "gender" association is too discriminatory and could lead to a potential identification of patients.

T to t3
The pathologies recognized in literature as being connected with this type of pollution [9,10,11,12,13,14,15,16,17,18,19,20,21] , are pathologies corresponding to the following ICD-10 3 codes: . I00 to I199 all diseases of the circulatory system in which are found for example ischemic heart diseases, coronary diseases, . J00 to J199 all diseases of the respiratory system: laryngitis, pharyngitis, sinusitis, tracheitis, bronchopulmonary diseases, bronchitis, bronchiolitis, asthma, etc… . E10 to E14 all diabetes types (type 1 insulin-dependent and type 2 non-insulin-dependent), . C34 all types of bronchus and lung cancer, . C67 for bladder cancer, . C64 for kidney cancer, The fact of dealing with a fine territorial scale imposes having enough cases per IRIS and consequently to work by batches of codes (for ex. I00 to I199) rather than by separated codes.

Air pollution data
These health variables have been related to air pollution data provided by an approved air quality monitoring association in the Sud region (AtmoSud), a partner of this study, and also date back to 2015. In recent years, AtmoSud has developed an IT tool producing mappings of the annual levels of pollutants in the ambient air [6]. Thus we had measurements concerning 44 pollutants, half in gaseous form, and the other half in particle form; their levels were estimated both on models (air emission inventories + dispersion), and on measurements (long histories or one-off campaigns carried out in recent years) (Tables No 1 & 2). We had a choice between pollution measurements at the IRIS scale or at the scale of urban areas (built-up areas to the perimeter of which a 100m buffer has been applied) ( Figures No 1 & 2). We preferred the second option, which enables us to have a more accurate measurement of pollution in the population's living spaces (Map No 1), and this, all the more so since some IRISes are vast and their population sometimes gathered in a single area, as can be seen on map No 1.
T to t4

Lichen readings: bio-indication and bio-impregnation
Together with these 44 air pollution variables, we have 109 others corresponding to measurements carried out on lichen samples in September 2017, by the Institut EcoCitoyen pour la Connaissance des Pollutions (IECP) on specific plots (See Map No 2). These plots were selected in cooperation with the project's various stakeholders, and adapted according to the presence or absence of trees on location, where readings could effectively be carried out. In regard to the plots' spatial validity, a 500 m radius around the GPS point retained for the bio-impregnation plots can be considered. Indeed, according to a recent study carried out by the IECP [22], a variability of 35% maximum for metals, and 30% for the various congenerics of PAH has been measured in a plot within a 500 m radius. The impregnation results obtained represent a 6-month integration. These measurements on living organisms such as lichens are useful to limit a confounding factor that could be linked with the residents' way of life more than with the effects of air pollution (notably addiction to smoking); incidentally, most of them are located in IRISes known for being socio-economically disadvantaged.

T to t7
Map No 2. Lichen sample plots, Source IECP Socio-economic data Finally, we have resorted to 7 socio-economic variables, because the effects of pollution will not affect everybody in the same way. The disadvantaged might be more exposed to them, but also might have less resources to counter them (postponing medical consultations by lack of means 4 , asthma and diabetes less well controlled). These variables come from the infra-communal databases of the National Institute of statistic and economic Studies (INSEE) in 2017, they concern: . the number of people in 2017 in households living there for over 10 years, this variable being very useful to address the exposure of people P14_PMEN_ANEM10P . the number of unschooled people of 15 or over and without qualification P14_NSCOL15P_DIPL0 . the number of people in households where the main family is a single-parent family C14_PMEN_MENFAMMONO . the number of people in main homes occupied by tenants P14_NPER_RP_LOC . the number of immigrants P14_POP_IMM 5 . the number of unemployed in the 15 to 64 age group P14_CHOM1564 6 T to t8 . median income DEC_MED14 These variables are useful to get to know the IRISes' level of precariousness or social disadvantage. We could have used a composite index, but we thought that it was important to keep these disaggregated variables, in order to see precisely on which variables differences between IRISes appear.
The whole of these variables (178) * 30 IRISes, corresponds to 5,340 cells in our database. These data are integrated within the same Bayesian model 7 , the probabilistic formalism of which is particularly suited to environmental health issues.

The Bayesian networks
Indeed, Bayesian networks allow to establish relations of dependence or independence among various variables, to analyse their interrelations, their combinations, by quantifying them using probabilities [23,24,25,26,27,28]. The context is that of a mathematical formalism of representation of uncertain knowledge, (probabilistic) particularly well suited to health risk issues, because relations between variables in health matters are not always determinist, but rather indirect. Moreover, Bayesian networks allow at the same time to model knowledge and produce new knowledge by revealing causal relations until now hidden (causal inference), or latent variables, and this, within the framework of unsupervised analysis (causal knowledge discovery). Very often, they bring added value in terms of knowledge, and constitute an interesting tool for the quantitative modelling of complex systems in uncertain fields, whether health or others. This is the reason why they are increasingly used worldwide, in fields as diverse as industry, finance, marketing, security and many others [29].
A Bayesian network is made up of two elements: a conceptual map and a database. The conceptual map defines the network's structure, it is a tool used for knowledge organisation and representation. This map is either built from data from the database, or from expert's knowledge of the subject studied. In our case, we decided to extract knowledge included in our database 8 .
The network's structure (or conceptual map) is visually represented by directed arcs representing the whole of the causal relations linking the variables, represented by nodes. 7 They are discretised as soon as they are integrated into the software, so as to be compared 8 All processing was carried out using the specific software devoted to Bayesian networks: BayesiaLab 7.0.1

T to t9
The variables are then represented in node form:

Figure No 3. The variables represented in node form
Between the nodes, causal links are drawn using an unsupervised artificial intelligence learning algorithm of the "maximum spanning tree" type, meaning that the algorithm will run until all the variables are connected with each other: T to t10 Thereafter, the conceptual map corresponding to our 178 variables appears in unsupervised form (AI), it represents the Bayesian network strictly speaking. (Figure No 5). In order to better distinguish the nature of the variables on the Bayesian network below, we have differentiated them by colours: orange for pathologies, green for variables resulting from lichen readings, purple for air pollutants, and blue for socio-economic variables.
T to t11 Let's consider the health variables in orange, bearing in mind that they are the very object of this study 9 and let's see with which other variables they are connected. The first concerns the number of households living in their IRIS for over 10 years (P14_PMEN_ANEM10P) and the relative abundance of lichen species 10 (this relation is 70% positive). It could be interpreted as follows: the higher the relative abundance of lichen species is the more people tend to stay where they live (amenities). There is even a negative correlation between the relative abundance of lichen species and CB194 (-0,70), however, a slight correlation (+0.6) links P14_PMEN_ANEM10P to Co (carbon monoxide), and cardiovascular diseases in the 15-65 year-old (I00I1991565ans, +0.65).
A second correlation can be observed in patients over 65 with cardiovascular (I00I19965ans) and respiratory diseases (J00J19965ans). It reveals a comorbidity which is well known by doctors [30], and already appears in younger patients T to t12 (aged between 15 and 65: I00I1991565ans and J00J1991565ans), it is explained by the chronicity of these pathologies (I00I1991565ans -I00I19965ans, +0.67), as is the case with diabetes (E10E1465ans -E10E141565ans, +0.73). We'll also note a rather strong relation (+0.78) between respiratory diseases in the over 65 and a dioxin-like PCB recognized for its toxicity: CB169 11 [31].
An "interesting" link is observed (+0.63) between cardiovascular diseases in the over 65 (I00I19965ans) and a socioeconomic variable: no qualification P14_NSCOL15P_DIPLMIN, and also between income (DEC_MED14) and exposure to a HCI pollutant (Hydrochloric acid, negative relation of around 0.65) meaning that the more the households income increases the less they are exposed to that pollutant. At this stage of the analysis, it is time to examine a little closer the relations that we have found between pathologies, pollutants and socio-economy. To do so, we are going to monitor these relations in order to observe how they interact among themselves. Indeed, the contribution of the AI is not limited to establishing very easily causal inferences between all variables considered simultaneously, and representing them in a highly didactic manner in a conceptual map.
We will take the 2 strongest relations concerning pathologies and a variable of a different nature (either environmental or socio-economic). Therefore, we are going to analyse the interactions between respiratory diseases in the over 65 and CB 169, and the interactions between diabetes patients over 65 and a heavy metal in its particle phase, cadmium (Cd_p).

Interactions between variables: monitoring
Interactions between respiratory diseases in the over 65 (J00J19965ans) & CB169 (R + 0,78) We have previously seen in analysing the relations that respiratory pathologies in the over 65 are linked with CB169 and that the relation is overall 78% positive. But what happens, more precisely, at the level of the various classes (modalities) of these 2 variables, how do these variables interact between each other? To answer this question, we are going to monitor them. The relation is first spotted in the Bayesian network (a), then we monitor the variables (b), and we see what happens at the level of some of their modalities (c). The number of modalities corresponds to the number of classes requested (K-means 5 classes). 11 Toxicity factor (TEF) CB169: 0.03 12 TEQ 2005 corresponds to the sum of PCBs and dioxins/furans expressed in toxic equivalents. The toxic equivalent of each congeneric is expressed by multiplying its concentration by its toxicity factor (TEF), which allows to weight the concentration of each congeneric by its "toxic efficiency". The TEFs used are those defined by the World Health Organisation (WHO) in 2005 (Van den Berg, 2006) 13 The Value of lichen diversity (standard EN 16413) 14  What is particularly interesting now is to force one of these classes, for example the modality corresponding to the highest number of stays (here: number of stays > 51.75), which then becomes green, and to observe how the associated probabilities of CB169 are then instantly updated. Thus, we can observe that if we focus on the highest numbers of stays, the values of CB169 increase, mainly for the 2 last modalities (for example evolution from 16.67 in the initial monitoring to 31.86 in d.1). Which means that when we consider the highest numbers of hospital stays recorded for respiratory pathologies in the over 65 in this territory, it is in relation with the highest values of CB169 found in the lichen readings 15 . The most frequent modality (44.38%) has also been forced without showing in that case notable changes in the monitoring.
Interactions between diabetes in the over 65 (E10E1465) & Cd_p R +0.52) A relation has appeared between diabetes in the over 65 and cadmium in its particle phase. Cadmium is absorbed in food (via deposits, air/soil pollution transfer) and could disturb insulin metabolism in the pancreas [32,33]. So far the analyses were global, on the whole of IRISes in the Pays de Martigues territory. It is now time to move to a finer scale and see how these relations vary according to IRISes, and whether some are more exposed than others. This will be achieved via a new process that will produce sensitivity analyses for each of the 30 IRISes included in our area of study (the design of the study, Figure 8 below).
T to t15

Sensitivity analyses per IRIS
From now on, the IRIS variable will become the most important variable, it becomes a target variable. Below is the result of the supervised processing algorithm of the "augmented naïve" type run on the IRIS target variable, for pathologies. T to t16  As an example, below is the sensitivity analysis on pathologies, carried out on IRIS 130560101 (Côte Bleue), in Martigues, which shows that respiratory diseases in the over 65 have a high probability to be found in that IRISapproximately 50% -followed by cardiovascular diseases in the 15-65 for 35%, and so on.

Figure No 11. Sensitivity/pathologies analysis for the Côte Bleue IRIS (13560101) in Martigues
Subsequently, we have taken the probability values 16 for each IRIS and solely in regard to pathologies, then these probability values have been divided by the % of individuals in each age group so as to contextualise them and eliminate possible size effects, in case the number of elderly people would be proportionally higher in one IRIS than in others. Therefore, a high value indicates a high number of stays for the pathology and the IRIS in question, and so independently from the % of elderly people in that IRIS. These values are subsequently integrated within an Ascending Hierarchical Clustering (below) so as to form groups of IRISes with a similar health profile. In total, 5 groups of IRISes can be clearly distinguished.
T to t18 . The second group differs from the others by less cardiovascular, respiratory and diabetes pathologies, but a higher rate of lung cancers in the over 65 (C3465ans).
What sets group No 3 apart are the high rates of cardiovascular diseases in the under 15 and above all cardiovascular diseases in the over 65 (1,21 on average), i.e. more than in the first group (0.81). The same is true for respiratory diseases from the age of 15 with rates almost twice as high as those of group 1 (1.33 against 0.70). Diabetes in the under 15 also individualises these IRISes. Bronchus and lung cancers in the over 65, without reaching the values of the second group, are also noteworthy (0.62 on average), just like bladder cancers in the same age group, without nonetheless exceeding group 1 (0.64 on average).
. Group 4 is in a way the reverse of group 2, insofar as we can observe more cardiovascular pathologies in the over 65 (1.13 on average), respiratory pathologies in the over 65 and at the same time in the under 15, and a lower cancer rate. There is a dissociation in terms of pathologies: the fact that we observe certain pathologies doesn't mean that there will be a higher rate of cancers (notably bronchus and lung cancer). This could be linked to the different nature of pollutants which therefore do not induce the same effects.
. Finally, the last group includes IRISes with pathology rates comparatively lower than in the other groups. It will be used as control group for our second verification of the model. Indeed, we may be tempted to take a control group "out-ofarea" which we estimate less exposed to pollution. Usually, a space in the "countryside" is selected. Now, the endocrine disruptors to which the population can be exposed via pesticides can induce the same pathologies (diabetes) [35,36], which could confuse the "control cases" comparisons, since we might end up with high diabetes rates on both sides. T to t19 Looking for causal factors Then we went in search of the causal factors of health profiles, which can be either of an environmental nature, and/or of a socio-economic nature. The combination of both can form what we call pathogenic spaces [34]. Then sensitivity analyses have been carried out on each IRIS, taking simultaneously into account all air pollutants provided by AtmoSud (44 pollutants), all 17 variables resulting from the lichen readings previously detected as playing a part on pathologies 17 , and the 7 socio-economic variables, i.e. a new total of 68 variables (Figure No 12). Indeed, it might be interesting to distinguish those IRISes where environmental and/or socio-economic factors have an influence.

T to t20
We distinguished 3 main families of IRISes: IRISes where pollutants were many and formed what is commonly called a cocktail of pollutants [37,38] which could contain either a majority of HAPs, or a majority of PCBs, or else pollutants that did belong neither to the HAP family, nor to the PCB family, and that we have therefore designated as "Others", but of which the potential impact on health deserves our attention nonetheless. So, in total we have 3 IRIS profiles: Cocktail of HAP pollutants (7 IRIS) Cocktail of PCB pollutants (8 IRIS) Cocktail of "Others" pollutants (15 IRIS) We saw that to the first variable that comes out corresponds a probability value spotted in the sensitivity analyses; in a way, it indicates its level of importance in the IRIS. At times, this importance has been qualified depending on the other surrounding variables. Indeed, if we take the example of the Plaine de Courouche IRIS, the first variable that comes out in the sensitivity analysis is Butadiene with 0.31, but we have preferred to retain CB20 which is just behind with 0.29, because it is a potentially more impacting PCB for health. Likewise, we have indicated when it was pertinent the socioeconomic variable which came first and was integrated into the cocktail of pollutants in order to get an idea of the said context. Socio-economy, as we know, can add to the effect of pollutants and consequently have an impact but it can also in some cases offset this effect when it is favourable (better access to health care, better quality of life, better diet etc.).
HAPs group together several pollutants emitted at the same time, their number varies by a factor of two (6 to 12), with a repercussion on pathologies. The more HAPs, the more severe is the pathology. This is due to the carcinogenic nature of each of these HAPs 19 (BaP, BaA, BbF, BkF, DahA, Fluoranthene, IcdP) and to their potential noxious mixing. The favourable socio-economy of the Jonquières Foulettes IRIS (130560107) seems to make up for the exposure of its residents (income above average), in the same way as for the Coudoulière IRIS (130560113), but to a lesser extent, because HAPs are less numerous in that IRIS.
Regarding the Cocktail of pollutants of the PCB type category, their health impact is essentially characterised, and even at low doses (due to their toxicity) by, for example, bladder cancers (IRIS Plan de Fossan -130560114) in the over 65.
Finally, the last category, "Others", refers to pollutants that cannot be classified in the previous categories. It includes "traditional" pollutants such as NO2, or even SO2, but also hydrofluoric acid (HF), not forgetting other potentially impacting elements such as: V_p, Hg, Zn_p, Cd_p, Cu_p and the PCDD_PCDF. Now let's consider the pathologies found in the IRISes and let's "correlate" them with the highest factors seen in these. The results are shown in the form of the maps below. 19 https://www.atsdr.cdc.gov/csem/polycyclic-aromatic-hydrocarbons/health_effects.html T to t21

Results
Group 1 of the HCA is exposed to particularly strong pollutants of the PCDD_PCDF type at La Lèque, CB28 20 at Plan Fossan, HF at PdB Centre and at St-Jean Bergerie, and to Dibenzo[a, h]anthracene (DahA_p) at Boudème. In 4 of the 6 IRISes, socio-economy (represented by white squares) combines with the pollutants and corresponds to single-parent families. In fact, it is the prevailing variable for the Rayettes IRIS (0,31).
Profile of IRISes according to the probability of the prevalence of environmental and/or socio-economic factors Map No 3. Group 1 HCA; IRISes characterised by an increased prevalence of hospital stays for the whole of the pathologies studied T to t22 As for group 2 IRISes, they are exposed to PCBs (Tassy Est CB20, St-Pierre & St-Julien PCB), to SO2 (Tassy Ouest), and to vanadium [39,40] in its particle phase, and at much higher levels than in other IRISes for those of Lavéra and Côte Bleue. The Jonquières Est IRIS is exposed to over 11 different HAPs, the Notre Dame Paradis IRIS to NO2, but in the latter, like in Tassy Est, socio-economy also seems to play a pernicious part (unschooled at 15 and unemployment). The nature of pollutants (PCB, vanadium) and their multiplicity (11 different HAPs), could explain the pathologies found in the IRISes of this second HCA group.
Profile of IRISes according to the probability of the prevalence of environmental and/or socio-economic factors Map No 4. Group 2 HCA, IRISes characterised by an increased prevalence of hospital stays related to lung cancers (˃15) and bladder cancers  T to t24 Starting from this 4 th group, the number of stays is relatively lower; Canto Perdrix and Les Comtes Est stand out in regard to cardiovascular diseases in the over 65, with a prevalence of BkF_p for Canto Perdrix and a rather disadvantaged socioeconomy and a seemingly deleterious cocktail for Les Comtes Est (DCE, V_p, Hg and HF), since the impact is also observed on respiratory diseases in the over 65.

T to t25
Finally, the control group is characterised by an overall number of pathologies lower than elsewhere, and above all less dangerous pollutants and/or in lower quantity than in the IRISes of the previous groups. Moreover, socio-economy being more favourable 21 , tends to offset their effects. This is particularly true for St-Mitre Centre. Nevertheless, it is "interesting" to note that the presence of SO2 alone has an influence on respiratory diseases in the 15-65 in the Figuerolles IRIS.
Profile of IRISes according to the probability of the prevalence of environmental and/or socio-economic factors Map No 7. Group 5 HCA, IRISes characterised by a lower prevalence of "Control group" pathologies 21 Except in Comtes Ouest T to t26 The summary map below depicts HAP emissions in the north-east part of the Pays de Martigues (Jonquières, Boudème), a vanadium and SO2 emission in Martigues' south neighbourhoods (Lavéra, Cote Bleue), and a PCB pollution over the entire territory. In the centre of the Pays de Martigues, disadvantaged socio-economy combines with the pollutants Profile of IRISes according to the prevalence of environmental and/or socio-economic factors Map No 8. Summary map

Discussion
Like all studies, this one has its limits which we bring to your attention. First, the health data relate to numbers of hospital stays and not numbers of patients, because when this study was conducted, we could not access patient numbers. Therefore, an overestimate of cases is possible, because some pathologies may have necessitated several hospital stays to be treated (this is particularly true concerning bladder cancers). Likewise, as we are working at a fine scale, the number of cases for some pathologies and in some IRISes can sometimes be low and induce biases. However, because Bayesian networks work, in a way, by comparing the evolution of proportion between variables, the risk of bias is minimised. The fact of having chosen annual averages for atmospheric pollutants rather than deciles tends to smooth out the effects of atmospheric pollutants. Pollutant data, like health data, are those of 2015, but we know that in the case of cancers there is a time gap that can be long between the exposure and its effect, but this seems to be less so for respiratory and cardiovascular diseases and diabetes. Finally, we only took into account residential exposure, and not professional or related to daily commuting. Finally, although the lichen plots are scattered over the entire study territory and located according to the IRISes' socioeconomic level, had we had more of them would have enabled us to ascertain our results even better. Studies carried out on environmental health are most of the time criticised because of the difficulty to get to the root causes of pathologies. Another frequently heard argument is the multifactorial character of an individual's health. Furthermore, the design of these studies is most of the time of the ecological kind and can also lead to attribute to individuals the effect of pollutants measured in a space (ecological fallacy), but this argument is itself put into question. [42] Nonetheless, by taking into account the highest possible number of variables in terms of pollutants and socio-economy which can potentially influence the health of residents, and doing so at a fine scale (the nearest to that of the majority of individuals living in urban areas), on the base of proven pathologies, and using an appropriate mathematical formalism, based on conditional probabilities, we can then presuppose a certain number of causal factors. Thus, we have been able to bring forward several relations between exposure to xenobiotics and proven pathologies, sufficient enough to have required an hospital stay in 2015, and which suggest the recommendations below.
Among the relations found, efforts must focus on reducing SO2 emissions, having observed that it has an impact from a very low threshold, far below standards. Companies must continue and ramp up the desulphurisation process of their installations, or, when possible, use a lower-sulphur fuel. However, SO2 exposure is not only due to companies operating in the territory, the annual volume of tankers bringing their load to the Fos-sur-Mer Industrial Port zone also plays a part. There again, resorting to a lower-sulphur fuel could contribute to improving the health of residents. It appears that in this territory there is diffuse PCB pollution, linked to respiratory diseases in the over 65, which should be investigated more closely in order to take appropriate measures quickly (site remediation by phytomanagement) [43,44].
Several thresholds have appeared, above which certain pollutants were particularly noxious, and in that sense, the Bayesian model will have contributed to revealing them, as is the case, for example, with hydrofluoric acid (HF), the effect of which already appeared at the 0.0028 µg/m 3 threshold, whereas exposure to cadmium in its particle phase, for an exposure between 0.214 and 0.250 µg/m 3 , disrupts insulin metabolism in the over 65. Finally, as soon as benzo[k]fluoranthene (particle phase) exceeds the 0.672 µg/m 3 threshold, it is reflected by a higher number of hospital stays for respiratory diseases in the 15-65.
The role of specific pollutants in the profile category such as chrome, chrome VI and vanadium in their particle phase, deserves appropriate monitoring. Therefore, emissions of these pollutants should be reduced, when it is technically and financially possible, below such thresholds (substitute products, less polluting industrial processes, more effective filters).