Benzo[a]pyrene sourcing and abundance in a Coal Region in Transition reveals historical pollution, rendering soil screening levels impractical. CURRENT STATUS:

Benzo[a]pyrene (BaP) is a hazardous compound for human health and for environmental compartments. Its transfer and deposition through the atmosphere affects soil quality. In this regard, soil screening levels are commonly used to monitor the degree of soil contamination. However, these thresholds are often established without considering historical anthropogenic activities that affect soil (diffuse pollution). In this context, this study had the following objectives: 1) to identify the potential pollutant load of BaP, as well as other Polycyclic Aromatic Hydrocarbons (PAHs), in the soil of an area in a prominent Coal Region in Transition in NW Spain; 2) to test whether soil screening levels are realistic and whether they reflect the complexity of regions closely linked to heavy industries and mining; and 3) to determine whether there is a relationship between PAHs and inorganic pollutants.

The municipality of Langreo (Asturias, NW Spain) is an area characterised by heavily industrialized mining activity. In this regard, multiple types of industries-some still in operation and others closed or abandoned-have released pollutants into the environment. By the mid XIX century, these industries had taken up coal mining as a primary activity. Since then, metallurgy, coal power stations, and chemical and fertilizer industries, among others, have successively occupied the main valley in the area [59]. Nowadays, the continuous transition towards cleaner energies and other technologies has impacted this area and many other similar regions in Europe [37], which have all experienced progressive abandonment that started with mine closures. In this context, the European Commission has recently drawn up a list of 14 Coal Regions in Transition, which includes Langreo. Along the same lines, in 2017, the European Commission launched a platform to encourage environmental studies of these areas, among other actions [3].
Potential soil pollution in the area of Langreo and also in specific brownfields has been previously addressed at various scales [9,10,22]. It has been concluded that soils of the zone present significant pollution by Potentially Toxic Elements (PTEs), in particular As, Cu, Hg, Pb and Zn in Langreo itself [9] and the neighbouring area [26]. In this context, the simultaneous presence of PAHs and PTEs is foreseeable since both types of pollutant can be released by the same sources, as is the case of coal power stations releasing Cd, Cr, Pb, Tl, or V together with BaP and other PAHs [38,61]. In this regard, soils, particularly in urban-peri-urban environments, can provide a geological record of multi point-source pollution [13,75,77].
Given the preceding considerations, this study had the following objectives: 1) to examine the presence of BaP and another 15 (EPA) priority PAHs in the soils of the municipality of Langreo (82 km 2 ), located in the heart of a Coal Region in Transition; 2) to evaluate the validity of Spanish RBSSLs for BaP in the context of regions highly linked to coal mining and heavy industries; and 3) to ascertain whether there is a link between the PAH and PTE pollution observed in Langreo in previous studies. To this end, the study applied a methodology combining multivariate statistical and geostatistical approaches with molecular ratios. We propose that the strategy used herein can be easily extrapolated to studies of similar characteristics.

Study area and sampling strategy
The municipality of Langreo (Asturias, NW Spain, Fig. 1) is a traditional mining and industrial area whose origins date back to 1850. The most predominant activities have been coal mining, ferrous metallurgy, coal power stations and chemical industries [42]. This long and extensive activity has impacted environmental compartments, as evidenced in previous studies, e.g. PTEs in urban air [46], soils [10] and groundwater [7], and the presence of a number of abandoned brownfields in the area [22].
In the context of geomorphology, the Nalón river crosses this 82 km 2 municipality, which is comprised of a series of valleys with altitudes ranging from 200 to 900 m. Our sampling consisted of a representative set of data collected from grasslands, forests, and natural soils in rural and peri-urban areas. The sampling design followed a stratified systematic strategy, covering 10 equidistant transects (250 m wide and 1000 m apart) perpendicular to the Nalón river [9]. This specific direction was chosen to achieve representativeness of the hydrological sub-basins, which in this case are perpendicular to the river. Under this premise, 150 samples were gathered, with the number of samples per transect determined proportionally to its length, the location of sampling within the transects being random (Fig. 1).
Samples were taken from the first 20-25 cm of soil in each vertex of a 1-m square and its central point. After this, composite samples were passed through a 2-cm mesh screen to remove large material such as organic matter, rocks and gravel. They were then transported to laboratory, stored at 4ºC to prevent biodegradation, and air-dried at room temperature to prevent evaporation.

Analytical procedures
Representative soil subsamples weighing 10 g were extracted with dichloromethane:acetone (1:1) in a Soxtherm system (Gerhardt). Concentrations of PAHs were then determined after sample injection into a 7890A GC System coupled to a 5975C Inert XL MSD with a Triple-Axis Detector (Agilent Technologies), following Environmental Protection Agency (EPA) method 8272 with modifications. A capillary column DB-5 ms (5% phenyl and 95% dimethylpolysiloxane) 30 m x 0.25 mm i.d. x 0.25 µm film (Agilent Technologies) was used, with He as carrier gas at a flow rate of 1 ml·min − 1 . The initial oven temperature was 70ºC (held for 2 min), and it was ramped up at 20 ºC/min to 220 ºC, raised to 270 ºC at 10 ºC/min (held for 1 min), then raised to 290 ºC at 10 ºC/min (held for 1 min), and finally ramped up at 10 ºC/min to 300ºC (held for 7 min). The gas chromatography (GC) injector was operated in splitless mode for 2 min, and its temperature was maintained at 260ºC. The mass spectrometer was operated in selected ion monitoring mode (SIM), and the quantification m/z relations were as follows: 128 (Naphthalene, abbreviated N); 152, 153 and 154 (Acenaphthylene, Acy; and Acenaphthene, Ace); 165 and 166 (Fluorene, F); 178 (Phenanthrene, Phe; and Anthracene, Ant); spectrometer was operated in electron ionization mode (EI) at 70 eV and was calibrated daily by autotuning with perfluorotributylamine (PFTBA). In addition, a qualitative full-scan analysis with a mass range acquisition from 45 to 500 (m/z) was also performed in relevant samples (see results).
In addition, the PTE content of the 150 samples was taken from a geochemical database reported in a previous study [9]. In brief, measurements were carried out by Inductively Coupled Plasma Mass Spectroscopy (ICP-MS) after 1:1:1 "aqua regia" digestion (ISO 9002-accredited Bureau Veritas Laboratories, Vancouver, Canada). The elements analysed and detection limits were: As (0.1 ppm); Ba

Outlier identification and clustering
Correct application of the multi-variate statistics requires previous detection of extreme values representing anomalous data. Therefore, outliers were identified by means of the percentile 95 of the robust Mahalanobis distance, which is an accepted method to identify anomalous data [44]. The general expression is defined by Eq. (1) as where Xa and Xb are the pair of objects and C is the sample covariance matrix. All samples identified as outliers through this equation were included in Cluster 1 (see below).
The remaining samples were grouped in terms of a HCA (Hierarchical Clustering Analysis) that was carried out using SPSS v. 24 software by means of the Ward algorithm and Euclidean distance [50].

Molecular ratios
PAHs were divided into two main groups of compounds on the basis of the number of aromatic rings; low molecular weight PAHs (LMW PAHs, 2-3 aromatic rings) and high molecular weight PAHs (HMW PAHs, 4 or more aromatic rings). HMW PAHs are produced mainly by pyrogenic processes while LMW PAHs have both petrogenic and pyrolytic sources. Thus, a concentration ratio of HMW/LMW > 1 may reflect the pyrogenic nature of PAHs [71]. In subsequent steps, a large number of molecular ratios can be calculated conceived, and some such ratios were used in this study to propose sources (Table 1). As seen above, ratios such as Fluo/(Fluo + Pyr) and Ind/(Ind + BghiP) provide a larger discrimination within pyrogenic sources, whereas others such as A/(A + Phe) are applied specifically to differentiate 8 between petrogenic and pyrogenic sources.

Spatial distribution of Benzo[a]pyrene
The spatial distribution of a variable can be determined through geostatistical techniques. Of these, Ordinary Kriging (OK) is a widely accepted interpolation approach used in environmental studies when data present a Gaussian distribution. In this regard, the performance of an interpolation by OK requires the design of a variogram, that is used to calculate the spatial variation structure of regionalized variables [27,31]. It is defined as follows (Eq. 2) where h is the distance argument, and Z(Xi) and Z(Xi + h) are numerical values of a variable observed at point Xi and Xi + h. N(h) is the number of pairs forming for an h distance. Therefore, it is the median value of the square of the distances between all pairs of points in the geometric field spaced at an h distance [43].
In this study, OK was used to determine the spatial distribution of BaP in the soils of Langreo. Given the known presence of outliers, normality conditions could not be achieved with raw data and thus a log-transformation was used. The geostatistical simulation was done using Biomedware's SpaceStat v.

Correlation Benzo[a]pyrene -PTEs
The degree of relationship between BaP and PTEs was studied by factor analysis. The number of factors was determined by the Kaiser/Guttman criterion while the principal components method was used for factor extraction, following recommendations for geochemical data [21,63]. These calculations were made using SPSS v. 24.

Results And Discussion PAH content in the study area
The eescriptive statistics for PAH concentrations of the 150 soil samples in Langreo are shown in Table 2. At first glance, a notable variability in the concentrations of all the compounds can be appreciated. This observation implies marked heterogeneity in these soils in terms of PAH content, especially those with four or more rings (HMW PAHs), as suggested by many high RSD values, which are concordant with the differences between the mean and the median. On the whole, data did not show normality, thus evidencing the presence of outliers.
In turn, our results also revealed the predominance of HMW PAHs. If we apply the most restrictive Spanish RBSSLs, namely "other uses" (in fact this category accounts for more than 90% of the soils where the samples were taken), around 5-15% of samples exceeded the threshold levels for heavy PAHs such as BaA, BbF, DbA and Ind. Of note, BaP surpassed the legal limits in 79% of the samples, presenting a median that was almost 20 times the value of the RBSSL.

Outliers and clustering
The high RSDs suggest the presence of outliers in the data set. To identify these outliers, percentile 95 of the Robust Mahalanobis distance was used. The results are shown in Fig. 2. Seven samples were identified as outliers and separated into an independent group (namely Cluster 1). Once outliers had been identified, the statistical analysis was repeated with the remaining 143 samples. An improvement in normality was observed, mainly through the reduction of the RSD coefficient (see RSD*, in Table 2).
The exclusion of outliers is also useful to reduce uncertainty and noise, thereby enhancing group formation in the HCA. In this context, the dendrogram resulting from this multivariate analysis of with increasing distance from the urban-industrial core (Fig. 3). Note that this cluster contained samples that did not surpass the RBSSL for BaP and also some that surpassed this value slightly. Consequently, this group corresponded to soils with a lower degree of contamination.
On the whole, the concentration of PAHs in the soil increased dramatically with proximity to the urban and industrial core (Fig. 3). The pollution was, therefore, located mainly in areas that held the the largest number of sources.

PAH sources
Of note, the distribution of individual PAHs revealed a relatively homogeneous fingerprint for the four clusters (predominance of 3-4-ring compounds such as Phenanthrene, Fluoroanthene and Pyrene), thereby suggesting similar profiles and common sources (Fig. 4), with the possible exception of Cluster 1.
Regarding the molecular diagnostic ratios, the mean concentration ratio of ΣHMW/ΣLMW was higher than 1 for all the clusters, initially suggesting that the PAH source was pyrogenic (Table 3) [80].
Furthermore, the A/(A + Phe) ratio reaffirmed the pyrogenic hypothesis, as values higher than 0.1 were found [58]. The values found for the Ind/(Ind + BghiP) and Fluo/(Fluo + Pyr) ratios suggest grass, wood and coal combustion [17,79]. According to these data and the industrial profile of the area, coal burning activities (power station, metallurgical activities, chemical industry, etc.) are most likely to be the main source of PAHs. However, emission from the coal-fuelled heating systems in houses cannot be ruled out as these systems were used for decades (still active in many locations) both in rural and urban areas. Nevertheless, it should be noted that these and other ratios have several limitations [71]. In fact, relative proportions of the organic compounds are usually assumed to be conserved between each emission source and the downwind measurement point [65], and they do not take into account possible alterations of the target compounds in soil (biodegradation, etc.). This implies that the diagnostic ratio results should preferably be used as a guiding methodology to reinforce a hypothesis and not as a conclusive tool. In this context, here we applied the molecular diagnostic ratios combined with complementary interpretation methods, such as the cluster analysis and the factor analysis (see below).
On the other hand, the previous statement (diffuse pollution caused by pyrogenic PAH sources) could be masking the additional existence of potential point-sources. In this context, although the identification of local point sources (associated with brownfields, spills, waste disposal, etc.) was beyond the objectives of this work, the two main outliers found (samples 76 and 29 as shown in Fig. 2) were analysed in-depth to exemplify the superposition of diffuse and point source effect., this was done using a qualitative GC-MS approach. As shown in Fig. 5a, the chromatographic fingerprint of sample 76 differed greatly to that of 29. Indeed, the former reflects the fingerprint of a coal tar distillate [19] while the latter shows a typical profile of an oil derivative [36]. However, at the same time, the star diagram revealed a very similar PAH fingerprint profile for both samples (Fig. 5a). For sample 76, additional information from specific compounds such as alkanes (Fig. 5b) revealed the abundance of odd carbon number n-alkanes, typical of terrestrial vegetation [57; 72], whereas the alkane fingerprint of sample 29 showed the abundance of light-medium n-alkanes and isoprenoids, as corresponds to petroleum products [57]. Finally, detailed insight into a representative PAH (phenanthrene, see Fig. 5c) showed a clear predominance of the parent-PAH in sample 76, thereby pointing to the coal tar hypothesis [67]. In contrast, the PAH alkyl-derivatives in sample 29 were much more abundant, suggesting a mixture of both pyrogenic and petrogenic inputs [79].

Benzo[a]pyrene: A real anomaly in the study area?
A geostatistical analysis of BaP content was carried out to determine its distribution across the study area. OK was applied after performing a log-transformation of the 150 data set of BaP analyses in order to improve normality conditions, which are mandatory in kriging. The resulting map is shown in  (Table 4). However, the background of Langreo would even surpass many of these limits. The wide range of toxicological data and methodologies used to establish such limits do not consider the effects of historical diffuse pollution. Consequently, these levels do not have the precision needed to evaluate BaP pollution in heavily industrialized areas, such as the one presented here or other sites with similar characteristics worldwide. BaP correlation with PTE anomalies A previous study reported the presence of PTE anomalies in the soil of Langreo [9]. Therefore, there may be a relationship between pollution by PTEs and PAHs, since the industrial processes described in previous sections facilitate the co-occurrence of both inorganic and organic contaminant emission. To this end, we carried out factor analyses, considering the multivariate database, including PTEs [9], together with the new BaP data. In order to identify different trends, we applied two approaches with enough samples to offer statistical significance. The first used the 47 samples from Clusters 1, 2 and 3 (corresponding to the soil with the highest PAH concentrations, as shown above), while the second used the 103 samples of Cluster 4, with the lowest concentrations. The results are shown in Table 5.
For the first case, the value of the Kaiser-Meyer-Olkin test for the factor analysis was a significant 0.782, providing reliable results. Three factors explained more that 80% of the total variance: The first factor comprised Cu, Pb, Cd, Sb, Zn, Ba and Cr, a typical association of heavy metals that contains some of the main pollutants of the area [9]. BaP showed a null relationship with this group, suggesting that the sources differed. In this regard, previous studies reported the presence of some of these potential pollutants linked to the heavy industry (metallurgy mainly) and traffic, especially Pb [39,60].
The second factor included high loads for V, As, Tl, and secondarily for Mo and Fe, together with the highest load of BaP (0.631). V and Tl are frequently associated with coal emissions [12,38], and in Langreo they have been previously linked to coal combustion and are therefore detected in PM 10 particles [53]. Moreover, As has also been identified as a common pollutant in activities involving coal burning [34]. This association with typical trace elements of coal emission supports the hypothesis of the sources of PAHs presented above. However, BaP presented a low communality (0.415), which suggests that it shows a distinct behaviour to that of these elements.
Co, Mn, Ni and Mg were included mainly in a third group. This association probably corresponds to geogenic elements present in background concentrations in the area.
The results for the "low pollution" Cluster 4 (KMO = 0.702) demonstrated BaP loadings (Table 5) divided among three distinct factors without special significance, and no clear relationship with any inorganic element. This finding suggests that the lower extent of pollution in rural areas dilutes the correlation between organic and inorganic emissions.

Conclusions
Here we conducted a high-density soil sampling and analysis (150 samples) in one of the historically most industrialized areas of Spain. Our results revealed high HMW PAH concentrations for BaA, BbF, DbA, Ind, and a particularly anomalous content of BaP.
Multivariate statistics were used to group samples within the study area after the identification of outliers via the Mahalanobis distance. We conclude that clustering is influenced mainly by the total concentration of PAHs, which is lower in soils of the rural or mountainous areas than in the urban and peri-urban zones. Most of the anomalous PAH contents were attributed to atmospheric deposition of pyrogenic particles originated in coal combustion processes, as revealed by molecular diagnostic ratios (diffuse pollution). However, some of the outliers found had local point-source origins, which should be addressed in further studies given the very high concentrations detected. The location of the main industries and the urban core, the predominant winds, and the topography of the area emerge as key factors in the distribution of the pollutants. Also, BaP showed a partial but significant correlation with PTEs commonly associated with coal combustion, such as V, Tl and As. This observation thus supports the hypothesis of pollution by atmospheric deposition.
Regarding the BaP anomaly, our geostatistical study confirmed that its concentration greatly exceeds the RBSSLs in the entire study area. In this context, a geochemical background of 0.24 mg·kg -1 was found, surpassing the Spanish RBSSLs currently in force (based on only toxicity features and soil use).
Consequently, this study questions the usefulness of RBSSLs as a valid reference for areas with a high density of heavy industries (past and present), particularly those linked to coal. This consideration should be taken into account when developing environmental policies for Coal Regions in Transition.
Here we conducted a high-density soil sampling and analysis (150 samples) in one of the historically most industrialized areas of Spain. Our results revealed high HMW PAH concentrations for BaA, BbF, DbA, Ind, and a particularly anomalous content of BaP.
Multivariate statistics were used to group samples within the study area after the identification of outliers via the Mahalanobis distance. We conclude that clustering is influenced mainly by the total concentration of PAHs, which is lower in soils of the rural or mountainous areas than in the urban and peri-urban zones. Most of the anomalous PAH contents were attributed to atmospheric deposition of pyrogenic particles originated in coal combustion processes, as revealed by molecular diagnostic ratios (diffuse pollution). However, some of the outliers found had local point-source origins, which should be addressed in further studies given the very high concentrations detected. The location of the main industries and the urban core, the predominant winds, and the topography of the area emerge as key factors in the distribution of the pollutants. Also, BaP showed a partial but significant correlation with PTEs commonly associated with coal combustion, such as V, Tl and As. This observation thus supports the hypothesis of pollution by atmospheric deposition.
Regarding the BaP anomaly, our geostatistical study confirmed that its concentration greatly exceeds the RBSSLs in the entire study area. In this context, a geochemical background of 0.24 mg·kg -1 was found, surpassing the Spanish RBSSLs currently in force (based on only toxicity features and soil use).
Consequently, this study questions the usefulness of RBSSLs as a valid reference for areas with a high Location of the study area in Spain and sampling design.