For this demonstrative analysis, PAHs in sediment dataset was obtained from waterway Superfund Sites. Example historical industrial operations that were located along and discharged into the waterways included manufactured gas plants, ship building operations, chemical manufacturing, tar distillation and power generation facilities, and petroleum terminals, among other industrial operations. Sewers receiving runoff from the watersheds surrounding the waterways, in addition to historical discharges from the industrial facilities’ wastewaters, discharged to the waterways via combined sewer overflows (CSOs) when water flows exceeded the sewer system capacities (referred to as bypass or overflow events). Description of CSO overflow examples is provided by Brokamp et al. (2017) and Al Aukidy and Verlicchi (2017).
The sediment samples were collected from three general layers representative of any waterway: 1) surface sediment (0 – 0.5 ft), 2) subsurface sediment, and 3) native sediment. In total, over 700 sediment samples were collected. Sample extracts were analyzed using a high-resolution gas chromatograph equipped with a flame ionization detector (GC/FID). High resolution GC/FID fingerprints were generated over a broad carbon range (approximately n-C9 to n-C40) that provided an overall assessment of the non-volatile hydrocarbons present in each sample. These fingerprints provided information on the dominant extractable hydrocarbons that might include pyrogenic PAHs, and petroleum products. The total concentration of these hydrocarbons was measured as total petroleum hydrocarbons (TPH). The sample extracts were also analyzed using a high-resolution gas chromatograph equipped with a mass spectrometer operated in the selected ion monitoring mode (GC/MS/SIM). The instrument was calibrated to allow for quantification of a broad range of 2- through 6-ring PAH, selected alkylated PAH homologues, and selected sulfur-containing compounds (dibenzothiophenes), among other waterway-specific contaminants. The demonstration presented in this paper is focused on the parent and alkylated PAH data set. The PAH dataset had a wide tPAH16 concentrations, ranging from below 20 mg/kg to thousands of mg/kg. Variable PAH fingerprints were found in the samples, reflective of the long industrial histories around the waterways, and mixing of the sediment.
Positive Matrix Factorization (PMF)
The EPA has sponsored the development of the multivariate receptor modeling technique, positive matrix factorization (PMF), by Paatero and Tapper (1994). In this work, PMF version 5.0.14 was used to analyze the sediment PAH data.
PAH data pre-processing
In the PMF run presented in this work, 44 PAH compounds were input for each of the sediment samples, with the input PAH concentration in each sample normalized to the total PAH concentration in the sample. Non-detect (ND) PAH compound concentrations were replaced with a small concentration value (ND = 0.001 mg/kg), because PMF does not accept a zero-concentration value. PMF requires the identification of uncertainty matrix associated with the PAH fingerprints data matrix. A fixed 10% uncertainty was used for all PAH compound concentration measurements (Saba and Su 2013; USEPA 2008).
Calculation of the number of factors
In PMF, the user is required to define the number of end-member PAH sources that resulted in the measured sediment PAH fingerprints. Defining the number of end-member PAH sources (called Factors in PMF) is an iterative process that requires knowledge of the potential sources, depending on the case details. Once the number of end member factors is defined, PMF calculates the end member source fingerprints that, when numerically mixed, provide the best fit to the input PAH fingerprints. The statistical goodness-of-fit between the field-measured PAH fingerprints and the PMF-calculated PAH fingerprints (referred to in PMF as Q(true)) can be evaluated to determine the optimal number of the end member Factors. Note that as the number of factors is increased, the Q(true) value becomes lower, by default, indicating better fit between the numerically mixed PAH fingerprints and the site-measured fingerprints. However, excessively increasing the number of Factors results in PMF calculation of factor fingerprints that represent noise in the data, as opposed to actual meaningful factors that represent end member sources. As such, a determination of the optimal number of factors needs to be decided by the PMF user.
For the demonstrative PAH dataset, Q (true) was calculated for different factor numbers. The result, presented in Figure 2, demonstrates that the optimal number of factors is between 3 or 4. Adding additional factors does not result in substantial decrease in the Q(true) value. The 3-Factor run presented three end member sources that closely resemble petrogenic, pyrogenic, and runoff source profiles (Figure 3). Increasing the number of end member factors to 4 retained the runoff profile generated from the 3-factor run, with R2 of 0.92, indicating matching profiles. The stability of the runoff end member source profile demonstrates that it is a stable source fingerprint. The petrogenic and pyrogenic end member sources obtained from the 3-Factor run were replaced in the 4-Factor run by end members representing petrogenic, pyrogenic, and an additional source representing a mix of petrogenic and pyrogenic. Adoption of 4-Factors may be reasonable, if such a fingerprint was associated with known source(s). For the purpose of the illustration presented in this work, the 3-Factor PMF run is more appropriate, as they directly relate to primary PAH source origins.
PMF Output Results and Analysis
The 3-Factor PMF run had a Q(true) value less than 1.5 times the Q(robust) value, which means there were no outliers that influenced the PMF analysis results. A different method of presenting the 3 Factor profiles is by presenting the percent of species sum (Figure 4). The percent of species sum determines the source of each of the PAH compounds, whether from the petrogenic, pyrogenic, or the runoff source factor. For example, Figure 4 shows that the BaP sources in the entire sediment dataset include: the pyrogenic source Factor (19.9%), the petrogenic source Factor (4.1%), and the runoff-like source Factor (76.1%). Figure 4 shows that the majority of the alkylated PAH compounds originated from the petrogenic source, consistent with the fact that crude oil and petroleum products are rich in alkylated PAHs. Runoff mostly contributed to the heavy molecular weight PAHs (i.e., BBF, BKF, BEP, BaP, PERY, INDY, DBHA, and BGHIP; Figure 4). Pyrogenic PAHs are contributing high percentages of the parent PAH compounds, including naphthalene, acenaphthylene, acenaphthene, and phenanthrene.
For each PAH compound, PMF presents plots comparing measured versus predicted concentrations. The plots can be used to identify samples with anomalous PAH profiles. For example, Figure 5 for anthracene, identify an anomalous anthracene result. Finding sample anomalies are to be expected when handling large datasets. One method to handling anomalies results is to remove them from PMF analysis, for separate anomaly sample-specific analysis. Another method is to increase the uncertainty of the measured compound concentrations for the anomalous results. Other PAH compounds are captured with high degree of accuracy in PMF such as benzo(e) pyrene (R2 = 0.92; Figure 6).
In addition to determining the factor profiles, PMF provides the percent contribution for each of the profiles to each of the sediment samples in the input database, as presented in Figure 7. The ability to calculate the percent contribution of each of the PMF source factors (i.e., petrogenic, pyrogenic, and runoff) to the sediment samples may provide a first step to apportionment or allocation of contamination between PAH sources. To illustrate, an example PAH profile is presented in Figure 8. Visual observation alone might not be sufficient to calculate the percent contribution of different sources to the PAH profile. However, the PMF analysis provides the percent contribution of each of the end member sources to the PAH profile. While the R2 presented in Figure 8 indicates a good match between the profiles (0.89), PMF should not be expected to match every PAH profile in the sediment with high R2. Indeed, Figure 9 presents R2 values between PAH fingerprints in the sediment samples and the PMF-generated PAH profiles from mixing the 3-Factors. The figure shows that most of the samples were replicated with R2 > 0.9. However, some samples had low R2 (< 0.5). These samples may be analyzed separately, if needed (e.g., separate PMF analysis, additional analysis to determine if the samples are associated with specific sources [either spatially or temporally], and review of laboratory reports to determine accuracy of the analysis results or whether the samples were associated with a specific sampling program).