In this study, non-target analysis was performed retrospectively on samples from Swiss WWTP effluents that had been collected as part of an existing regulatory environmental monitoring campaign. Instead of an exploratory approach that is still common amongst NTA studies, the research questions that directed this study were derived from regulatory priorities, thereby ensuring outcomes of direct and immediate relevance for environmental monitoring and protection.
Unknowns of regulatory interest were defined as those with the highest intensities and highest temporal frequency with point sources across all the samples of the sampling campaign. These features were prioritized in the data using enviMass, resulting in lists of m/z that were the starting point of this current work. The mass spectra of these m/z underwent prescreening and quality control (Fig. 2) to ensure their suitability for use in non-target identification. Quality control isolated measurements worthy of further identification efforts and eliminated those of poor standard, effectively resulting in data reduction (Fig. 3). The prescreening workflow was written in R and is now openly available within the package Shinyscreen (Kondić et al.).
Then, MetFrag (Wolf et al. 2010; Ruttkies et al. 2016) was employed to provide tentative identifications for these unknowns, leveraging its extensive metadata capabilities “post-relaunch”, as well as several open resources/information sources, including chemical information from regulators around the world. MetFrag analysis was performed via the command line using scripts based on ReSOLUTION (Schymanski 2020a) and RchemMass (Schymanski 2020b).
Tentative identifications for 22 m/z were obtained using MetFrag (21 at Level 3, 1 at Level 2a, whose identity was eventually confirmed to Level 1). These identifications were evaluated in terms of (i) a score distribution for the top candidates (Fig. 4) and (ii) Scenario Analysis (Table 3) according to the regulatory context and research questions underlying this work. Final candidate recommendations were given based on MetFrag Score breakdowns, thereby providing in-depth and transparent analyses of the spectral and metadata evidence for proposed candidates. For the 22 m/z analysed, 32–58 candidates were recommended for further identification efforts.
Quality control was a critical element in the prescreening workflow, as preliminary manual inspection of the data using XCalibur revealed variable data quality. In fact, most data (> 80% cases) were not fully suitable for the intended non-target identification. R scripts (now embedded within Shinyscreen package) were written to automate most of the quality control checks (Table 1, checks 1–5). Automated quality control allowed for quick and reproducible processing of the large quantity of data needed to answer the superlative research questions guiding this work. The variable quality of the data had several likely causes: (i) List B masses were not in the inclusion list, (ii) MS2 were not measured immediately after MS1, therefore sample degradation over long storage time between MS1 and MS2 measurements could have occurred, and (iii) possibly over-restrictive enviMass prioritization criteria. Thus, the small number of cases (~ 0.03% of total) passing all quality control checks and qualifying for MetFrag identification was not unexpected.
MetFrag was configured to comprise both Spectral and Metadata scoring terms, including chemical suspect lists and scoring terms from international regulators within the latter such as KEMIMARKET_EXPO, KEMIMARKET_HAZ, REACH2017, NORMANSUSDAT, and CPDAT_COUNT. Paired with CompTox as its candidate database, MetFrag was thus specifically customised to perform non-target identification of environmental unknowns in WWTP samples within a regulatory context in this work. Beyond using fragmentation information alone, using metadata to inform MetFrag’s identifications proved to be especially important in certain situations e.g., when spectral scores based on fragmentation were not informative enough to distinguish candidates from each other (Tables 7 & 8). Crucially, the information provided by metadata can serve as guidance for future regulatory actions in the context of the environmental protection aims of this study. For example, although certain candidate(s) may not be top-ranked or have strong spectral evidence (Table 6), potentially concerning hazard and exposure scores may qualify a certain candidate for serious consideration in future work in the spirit of applying the Precautionary Principle.
Regarding the components of the MetFrag Score, a total of ten scoring terms, three Spectral and seven Metadata, were used to score candidates. Compared to most previous studies which used MetFrag as mentioned in the Introduction, this number may seem large. However, adding extra scoring terms does not appear to compromise MetFrag’s identification capabilities. In fact, the additional scoring terms were beneficial because further bases for differentiating between candidates became available. In other words, using more scoring terms can provide more granularity when distinguishing candidates, which is important for candidate evaluation and recommendation. Further scoring terms based on physical-chemical properties could be integrated in the future such as correlation of the partitioning coefficient logKow (or log P) with retention time (Ruttkies et al. 2016). Such scoring criteria would filter out any unrealistic candidates based on objective criteria like ionisability and polarity. (Insufficient information was available to perform retention time correlation via MetFrag in this study.)
With respect to the individual terms, CPDAT_COUNT, INDACT, and OfflineIndividualMoNA proved to be relatively uninformative in this particular study, evidenced by their frequent zero-value scores. As a chemical products database, CPDAT’s limited applicability in wastewater studies such as the present one is unsurprising, and it instead may be more suitable for exposomics studies involving e.g., household dust. INDACT, the list of industrial activity chemicals known to be used in the vicinity of the WWTPs as disclosed to the regulator, had the strongest potential to improve the identification results. However, not a single candidate across all the MetFrag results was present on this suspect list, which could suggest that the disclosures made by the industries were either incomplete or unsuitable for identification purposes (e.g., parent compounds were disclosed but possibly only transformation products are present in the environment/detectable, UVCBs with unspecific chemical identities etc.). Lastly, while mass spectral libraries are inherently incomplete (Oberacher et al. 2020), a low OfflineIndividualMoNA score does not necessarily indicate poor spectral library matches. Rather, low OfflineIndividualMoNA scores could also signify that the candidate is not present within MoNA to begin with, or result from noisy experimental spectra even if the match would otherwise be good. Therefore, evaluating candidates on this scoring term alone must be done with these factors in mind, and improvements to its design to avoid possible faulty interpretations could constitute future work. Other future work on MetFrag itself could involve the addition of new Spectral scoring terms which do not require scaling via normalisation of the maximum value, as this maximum value is highly dependent on the candidate database chosen. For instance, a simple spectral similarity metric such as cosine similarity would evaluate how well the in silico and experimental fragmentation spectra align, independent of those of other candidates.
CompTox, the candidate database chosen here, remains one of the most environmentally-focused open databases of chemical compounds as it exclusively contains chemicals of environmental and toxicological relevance. Compared to other open databases like PubChem, CompTox is also smaller in size. Therefore, MetFrag paired with CompTox is likely to suggest smaller lists of candidates which are de facto environmentally-meaningful, making workflow runtimes shorter and candidate evaluation relatively easier. However, using CompTox has drawbacks, principally stemming from its lack of comprehensiveness when compared to PubChem, which is larger and covers a wider chemical space beyond just environmentally and toxicologically relevant chemicals. Therefore, false negatives can result should certain compounds matching the identification criteria not exist within CompTox to begin with. The forthcoming PubChemLite (Bolton et al. 2020; Schymanski et al. 2020) represents one complementary alternative to these issues, as it is by design essentially a subset of environmentally-relevant compounds based on compound classifications. Overall, the ability to subset databases based on usage and classification information of chemicals can be beneficial, as different regulatory bodies may have different mandates, and studies can be designed to align with those mandates accordingly e.g., focus only on chemicals with (i) known usage in industrial manufacturing, or (ii) agricultural chemicals, or (iii) pharmaceuticals etc.
Using scenarios as a framework to interpret MetFrag’s results was critical considering the specific regulatory aims of this work: tentatively identify pollutants of high priority (with minimum Level 3 confidence) to guide further monitoring and identification efforts.
Scenario Analysis revealed in detail whether Spectral, Metadata, or both contributed to a given MetFrag Score and in turn provided the rationale behind proposed candidates. As our evaluation has shown, multiple candidates are worth considering especially if they have very similar scores (e.g., Table 6), or have more compelling evidence represented by individual scoring terms (e.g., Table 13) as described above. In this way, Scenario Analysis as used here is highly suitable for transparently communicating scientific results in a regulatory context. On a larger scale, such analyses address a key weakness common to NTA studies: the current lack of ability to perform detailed data interpretation – especially in a high throughput, automatable and reproducible manner.
Furthermore, Scenario Analysis as used here can inform decision-making regarding the next steps. Besides addressing study priorities based on “depth vs. breadth” as discussed in the Results, the scenarios can be used to devise a prioritisation scheme for future work. For example, if authentic standards can only be purchased/analysed for 10 compounds due to resource limitations, those compounds should be the recommended candidates with MetFrag Scores from Scenario 1 > Scenarios 2/3 >>> Scenario 4. Alternatively, if it is known from the outset that spectral data may be poor quality, Scenario 2 candidates may take precedence over Scenario 3 candidates, as the former rely on high Metadata Scores and not high Spectral scores for their high MetFrag Scores. Additionally, applying the Precautionary Principle may motivate prioritizing identity confirmations of candidates with concerning metadata like high toxicity and/or exposure (corresponding to KEMIMARKET_HAZ and KEMIMARKET_EXPO scores), even if those candidates are not necessarily ranked highly by MetFrag.
Practically speaking, next steps in environmental monitoring based on the results here (besides identity confirmation using authentic standards) could include expanding suspect lists using the recommended candidates to improve future suspect screening activities. These new suspects could in turn be added to the inclusion lists of future measurements, thereby already gaining an analytical ‘upper-hand’ for future NTA studies. Expanding suspect and inclusion lists in this way, possibly in combination with using a rarity score (Krauss et al. 2019), represents an evidence-based approach towards more meaningful environmental monitoring in the long-run, as these candidate compounds were tentatively ‘observed’ and are therefore site-specific. Otherwise, suspect lists are typically expanded based on information from national or international chemical registration lists, whose applicability may be limited depending on the actual usage/exposure in the region of concern. Therefore, an additional outcome of this study is a means to bridge Target and Non-target Analysis by supplying meaningful candidates for Suspect Screening.
This work is one contribution to a much larger discussion surrounding (i) how NTA can support regulatory environmental monitoring and (ii) the practical feasibility of applying NTA in routine environmental monitoring. Regarding the former, this work demonstrates that NTA can be used to address the concerns of regulators by translating research questions arising from regulatory priorities into peak-picking/mass prioritisation criteria: in this case, high concentration unknown pollutants with point sources that occurred persistently were taken to be high intensity precursors found at one or few sampling sites at both sampling time points. Without the ability to perform quantification, the assumption that high ion intensity represents high concentration could be validated by using different chromatographic solvent systems as a test of ionisation efficiency in future work, or implementing ionization efficiency models (Liigand et al. 2020; Panagopoulos Abrahamsson et al. 2020).
On the feasibility of performing NTA as part of routine regulatory environmental monitoring, the overall method described here offers a highly automated approach via (i) feature prioritization via enviMass, (ii) prescreening and quality control (plus a manual step), and (iii) in silico identification, of which (ii) and (iii) were developed in this work. The results interpretation and candidate recommendation processes performed manually in this work form the basis of future efforts towards automated reporting based on Scenario Analysis, MetFrag Score distributions, and evaluation of critical parameters like thresholds for potential toxicities and exposure levels. Such automated reporting would not only allow scalability of future regulatory NTA studies, but could also eliminate potential biases in unknown identification – analysts would not be able to ‘cherry-pick’ candidates based on their familiarity with certain compounds because undescriptive identifiers e.g., DTXSIDs would be used up until the final results are delivered at the end of the entire method. Furthermore, while the prescreening, quality control, and identification workflow was applied retrospectively, the improvements to workflow automation detailed here could allow for quicker data analysis turnaround in the future, which would help guide future sampling and measurements planned in the short-medium term and prevent the long delays between remeasurements still commonly observed in NTA investigations – effectively, moving towards ‘real-time’ instead of retrospective NTA approaches. Two concrete follow-up initiatives are foreseen: (i) build an interface connecting Shinyscreen and MetFrag, including automated reporting features as previously described, and (ii) develop a set of ‘default’ scoring terms and settings tailored for NTA of wastewater samples. Further collaborations involving non-target wastewater studies and database hosts will help augment expert knowledge on more use cases, which would be leveraged to develop this approach further.