Upper Klamath Lake chemical and microbial composition
The mean microcystin concentration among UKL samples with detectable toxin was 8.7 ppb (Table 1), surpassing the United States Environmental Protection Agency’s recommended health advisory limit for drinking water of 0.3 ppb for pre-school aged children and 1.6 ppb for children and adults50. The minimum reporting, recreational, and drinking water limits for microcystin varies by state depending on water use and potential for exposure51. Of the 70 samples collected over 2018–2019, ten UKL samples and three canal samples were contaminated with microcystin at concentrations ≥ 0.3 ppb. Toxic samples mostly occurred in summer months (July – Sept), but occasionally in November, 2019, and occurred at all four lake sites (NAL, WBR, EPP, PEL) (Fig. 1). The highest microcystin concentration was 469 ppb from NAL in September, 2019. Environmental parameters varied widely in UKL (Fig. 1; Table 1) and toxic samples were sometimes associated with high temperature, chloride, pH, POC, PON, chlorophyll, ammonium, and conductivity; however, no significant correlations were observed with microcystin concentration and any single parameter measured at UKL (Fig S1).
Table 1
Environmental parameters collected at Upper Klamath Lake, OR. Microcystin concentration is in bold in the top row. ‘Low cost’ parameters are shaded grey and ‘high cost’ parameters are white.
Environmental parameter
|
Abbreviation
|
Minimum
|
Maximum
|
Mean
|
Standard deviation
|
Microcystin (ppb)
|
TOX
|
0
|
469.51
|
8.72
|
56.03
|
Chlorophyll (µg/mL)
|
CH
|
0
|
6.8
|
0.24
|
0.94
|
Temperature (°C)
|
TEMP
|
-0.1
|
26.87
|
17.62
|
6.65
|
pH
|
PH
|
7.22
|
10.22
|
8.59
|
0.94
|
Conductivity (S/m)
|
COND
|
61.4
|
133.6
|
102.9
|
12.6
|
Particulate organic carbon (µg/mL)
|
POC
|
0.45
|
432.32
|
18.38
|
65.53
|
Particulate organic nitrogen (µg/mL)
|
PON
|
0.066
|
100.20
|
3.96
|
15.12
|
Chloride (ppm)
|
CHL
|
2.392
|
50
|
4.39
|
6.94
|
Sulfate (ppm)
|
None
|
1.70
|
50
|
4.46
|
6.97
|
Nitrate (ppm)
|
None
|
0.2
|
15.54
|
0.73
|
2.26
|
Phosphate (ppm)
|
None
|
0.11
|
50
|
1.37
|
7.35
|
Ammonium (ppm)
|
AMM
|
0.01
|
5.43
|
0.41
|
0.82
|
Untargeted volatilomics detected 229 m/z + 1 values in samples collected at UKL and associated canals during 2018 and 2019. Seven m/z + 1 values were present in significantly discriminating amounts between toxic and non-toxic samples (Fig. 2), but using these m/z + 1 values in a multiple linear regression model failed to predict microcystin contamination or concentration (R2 = 0.08; p-value = 0.89). Volatilomes clustered well by sampling date, and samples collected in 2018 mostly clustered separately from those collected in 2019 (Fig S2). Volatilomes of toxic samples did not demonstrate clear clustering (Fig S2).
The relative abundances of four phyla, Cyanobacteria, Bacteroidota, Pseudomonadota, and Actinobacteria represented 79–99% of the 16S rRNA sequences in all UKL samples during 2018–2019 (Fig. 4). The class Cyanophyceae were only ~ 10% of the microbial community in May and peaked in September, 2019 when they were up to 75% of the community before decreasing in the autumn months. The four bloom-forming and potentially microcystin-producing cyanobacteria genera in ULK were Aphanizomenon, Anabaena/Dolichospermum, Microcystis, and Gloeotrichia. Anabaena/Dolichospermum sequences were always the dominant Cyanobacteria, contributing 75->99% of the sequences in all samples. The relative abundance of Microcystis represented 5–25% of sequences in August through December and May but was absent in June and July.
16S rRNA-based phylogenies are so far unable to resolve Aphanizomenon and Anabaena52. Cell morphologies characteristic of Aphanizomenon, which is the dominant Cyanobacteria during the mid-summer in UKL53, were commonly observed in UKL samples inspected by light microscopy (Fig S3). Nevertheless, few sequences were placed within Aphanizomenon and instead sequences often grouped with representatives of Anabaena sp. strain 90, Dolichospermum circinate strain ACBU02, and Anabaena sp. strain WA 102.
Microcystin toxin prediction using the volatilome
We developed elastic net regularized regression models using the volatilome with outputs that were either linearly predictive of microcystin concentration (linear models) or predictive of microcystin concentration ≥ 0.3 ppb (logistic models) to facilitate different water management approaches (Table 2). Linear model M1 and logistic model M2 were developed using only the 229 m/z + 1 values. Linear model M7 and logistic model M8 were developed using the 229 m/z + 1 values and ‘low-cost’ environmental parameters (e.g., buoy data such as temperature, pH, conductivity, which are rapidly retrieved by current technologies) (Table 1). Across the four elastic net models, variable selection identified 29 of the 229 unique m/z + 1 as being important to predicting microcystin contamination (Table 3), and their relative concentrations are shown in Fig. 1. Nine m/z + 1 values were selected in two elastic net models, and four m/z + 1 values (151.119, 157.157, 199.189, and 203.185) were selected in three elastic net models (Table 3).
Table 2
Models developed for prediction of microcystin contamination
Model number
|
Model type
|
Input variables
|
Output type
|
M1
|
Linear elastic net
|
VOCs
|
Continuous
|
M2
|
Logistic elastic net
|
VOCs
|
Binary
|
M3
|
Linear regression
|
Low cost environmental parameters
|
Continuous
|
M4
|
Logistic regression
|
Low cost environmental parameters
|
Binary
|
M5
|
Linear regression
|
Low + high cost environmental parameters
|
Continuous
|
M6
|
Logistic regression
|
Low + high cost environmental parameters
|
Binary
|
M7
|
Linear elastic net
|
VOCs + low cost environmental parameters
|
Continuous
|
M8
|
Logistic elastic net
|
VOCs + low cost environmental parameters
|
Binary
|
Table 3. Putative chemical formulas and identifications for m/z+1 values identified in models predicting microcystin contamination. An ‘x’ indicates the m/z+1 value was selected; m/z+1 values selected in two models are shaded light orange, and m/z+1 values selected in three models are shaded dark orange. m/z+1 values in grey were also important in predicting bacterial relative abundance (Fig 5). Chemical identifications were made using either the GLOVOC (G superscript) or Ionicon PTR viewer integrated database (I superscript).
Four additional regression models based on the ‘low cost’ environmental parameters (M3, M4) or the full collection of environmental parameters (‘low + high cost’, M5, M6; Table 2) were developed to compare against the skill of the VOC-based elastic net models. Similar to previous studies54, ‘low-cost’ linear M3 was weakly predictive of toxin concentration and retained only pH and chlorophyll (Table S1). POC, PON, and AMM strongly boosted the predictive power of linear M5. Neither logistic ‘low-cost’ M4 nor ‘low + high cost’ M6 were able to discriminate toxic and non-toxic samples at greater than 50% probability (Fig. 3, Table S2, Fig S4).
All of the VOC-based models outperformed the ability of ‘low-cost’ comparator models to predict UKL toxicity (Fig. 3). Addition of “low-cost” environmental parameters to the training data did not improve VOC-based model performance (Fig. 3), and except for “month” in M8, were not retained in the final equations (Table 3, Table S1, Table S2). The high Akaike Information Criterion (AIC) in logistic M2 and M8 are partly attributable to the number of selected variables and were strongly balanced by area under the receiver operating characteristic curve (AUC) values that were 0.78 and 0.88 compared to 0.50 (no better than chance) for M4 and only 0.22 for M6 (Fig. 3; Fig S4).
VOCs were effective predictors of UKL toxicity. Our ability to rapidly and inexpensively measure volatile metabolites in water samples (5 min PTR-MS measurement of raw water samples) provides a unique platform to explore relationships between the volatilome and ecosystem health and the potential for VOCs to be leveraged in cyanotoxin monitoring. Low volatility of toxins, including microcystin, makes their detection by PTR-ToF-MS unfeasible. Direct toxin measurement by ELISA or mass spectrometry is the current gold standard for monitoring but remains too expensive for the widespread and frequent application needed to provide timely public health advisories19. The metabolome is increasingly used to evaluate human health55,56,57 and ecosystem status, such as shifts in soil microbial ecology58. Similarly, the success of the volatilome to provide information about toxin presence and concentration suggests that unique collections of VOCs in UKL are produced depending on organism physiology and community composition.
Predicting microbial community composition using the volatilome
Elastic net models were also developed using the relative abundances of the four most abundant phyla, classes, and cyanobacteria genera as dependent variables and the 229 m/z + 1 values as independent variables. The 12 resulting models selected a total of 71 m/z + 1 values (Tables S3 and S4). All twelve elastic net models performed well, yielding mean squared prediction errors (MSPE) 0.75–1.02 and SD 0.08–0.54 (Fig S5). The m/z + 1 value 205.204 was an important predictor of the relative abundance of Cyanobacteria phylum, Cyanophyceae class, and all four Cyanobacteria genera (Fig. 5). Twelve of the 18 m/z + 1 values predictive of the Cyanobacteria phylum relative abundance were also predictive of Cyanophyceae (class) relative abundance and 13 were predictive of the relative abundance of at least one of the Cyanobacteria genera. Similarly, seven of the eight m/z + 1 values predictive of Actinobacteriota relative abundance were predictive of Actinobacteria (class) relative abundance (Fig. 5). Six m/z + 1 values identified in models predicting microcystin concentration were also identified in models predicting the relative abundances of Cyanobacteria genera (Table 3).
Elastic net regularized regression yielded a collection of VOC-based models that were highly effective at predicting the relative abundance of key cyanobacteria, including Microcystis, which is thought to be the primary source of microcystin in UKL. The success of these models is likely a consequence of seasonal changes in the microbial community composition and taxonomic and physiological differences in the spectrum of VOCs released32, 59–61. We are unaware of studies that have leveraged the metabolome to describe microbial community composition; however, neural networks and linear regression approaches are being used to integrate metabolomic, metagenomic, and taxonomic data62–65. In our study, elastic net machine learning applied to volatilomes yielded models that were strongly predictive of ecosystem toxicity, microbial community composition, and ecosystem stress.
The molecular weights of the catecholamines epinephrine (adrenaline, 169.18 g/mol), norepinephrine (noradrenaline, 183.20 g/mol), and L-3,4-dihydroxyphenylalanine (L-Dopa, 197.19 g/mol), are 2 mass units greater than m/z + 1 values, 171.171, 185.185, and 199.189, which were retained with positive coefficients in each of the four VOC-based models predicting microcystin toxicity: M1, M2, M7 and M8. Further, m/z + 1 171.171 was positively correlated with Phylum Cyanobacteria and Class Cyanophyceae. The biosynthetic pathway leading to epinephrine originates from oxygenation of tyrosine yielding L-Dopa, which is decarboxylated to dopamine. Dopamine hydroxylation yields norepinephrine, which is methylated to epinephrine (Fig. 6). We assign the three m/z + 1 values to catecholamines based on their common biosynthetic pathway and likelihood for double protonation in our system. The primary (L-Dopa and norepinephrine) or secondary (epinephrine) amine sites of all three compounds readily protonate in water. Either phenolic site will accept a proton66 during the proton transfer reaction in the PTR-MS. PTR-MS operating with H3O+ as the proton source ionizes VOCs with proton affinities greater than H2O (691 kJ mol − 1) by proton transfer. The proton affinity of dopamine is 934 ± 6 kJ/mol. The proton affinities of norepinephrine and epinephrine are not known but are higher than that of dopamine67 because of their high pKas (8.5–13.1)66. Although protonated amines will decrease norepinephrine and epinephrine proton affinities 236.4 and 241.5 kJ mol− 1 at 0 K, respectively68, the resulting proton affinities favor the proton transfer reaction from H3O+.
The molecules in the epinephrine pathway could be indicators of system-wide stress. Acute stress, either caused directly by microcystin exposure or indirectly by ecosystem hypoxia, an abundance of reactive oxygen species, or shifts in food web dynamics, could upregulate the catecholamine pathway. Catecholamines are important neurotransmitters and neurohormones in plants, animals, and invertebrates. Epinephrine induces inflammation and immune responses during low and ongoing stress69. Cyanotoxins, including microcystin, can function as neurotransmitter agonists or antagonists that bind neuroreceptors in rotifers, zooplankton, fish and mammals70. Neuroreceptors across a wide range of taxa are sufficiently homologous that treatment of oysters, crustaceans, and vertebrates with norepinephrine and epinephrine negatively impacts mortality, reproduction, and/or growth71–73, making these compounds effective antifouling agents against settlement and growth of bacteria, algae, plants, and animals in underwater structures. In UKL, the identification of L-Dopa, norepinephrine, and epinephrine as important predictors of microcystin concentration suggest they are sensitive signals of environmental stress, which can be directly used in monitoring and conservation practices.
Other m/z + 1 values in our models suggest that those compounds mediate interactions between cyanobacteria, microcystin toxicity, and the environment. For example, a sesquiterpene, m/z + 1 203.185, was retained with positive coefficients by three models predicting microcystin toxicity and in models predicting relative abundances of Phylum Cyanobacteria, Class Cyanophyceae, and Anabaena. Sesquiterpene synthases are present in Anabaena species74, and the recurrence of m/z + 1 203.185 in our models is consistent with the abundance of Anabaena in UKL and release of sesquiterpenes and microcystin during cyanobacterial senescence75.
β-ionone was assigned to m/z + 1 193.153 based on spectral similarity to a previous study using PTR-MS76. m/z + 1 193.153 was retained with negative coefficients in M1 and M7 predicting microcystin toxicity and three models predicting relative abundance of non-cyanobacterial taxonomic groups. m/z + 1 193.153 was positively correlated with phylum Cyanobacteria, class Cyanophyceae, and Anabaena (Fig. 5). β-ionone and other norcarotenoids are products of carotenoid oxidation during photo-oxidative stress and inhibit photosystem II37,76,77,78,79. Oxidative stress in UKL may have induced production of β-ionone in Anabaena80,81, thereby decreasing Microcystis and microcystin production. Nontoxic Microcystis strains employ peroxidases in response to oxidative stress, but toxic Microcystis strains may produce microcystin to combat mild, chronic oxidative stress82. The different pathways employed by cyanobacteria to tolerate oxidative stress point to β-ionone as a potentially important compound that mediates interactions within the cyanobacterial community, including microcystin production. β-ionone may also be a taste-odor compound in potable freshwater sources76 that could be rapidly identified using our approach.
Limonene, a monoterpene produced by planktonic and benthic cyanobacteria83, is the likely identity of m/z + 1 137.129. This m/z + 1 value was retained with a negative coefficient in M7 and a positive coefficient in the model predicting relative abundance of Aphanizomenon. m/z + 1 137.129 was also negatively correlated with Microcystis and Gloeotrichia (Fig. 5). Limonene can inhibit photosynthesis44,84 and lyse M. aeruginosa85. These UKL data suggest that limonene produced by Aphanizomenon was associated with lower Microcystis abundance and perhaps consequently, lower microcystin concentrations.