Lakes in this study are in Vermont, USA (Fig. 1) and were sampled by the VT DEC between 2007–2021 as a part of multiple long-term, ongoing regional and state efforts, including the National Lake Assessment (NLA), Vermont Next Generation Lake Assessment (VT NGLA), and Vermont Reference Sentinel Lakes program. All water quality sampling and sediment core collection occurred between April and September of respective sampling years. Land-use data for each watershed was obtained using the 2011 National Land Cover Database (NLCD) data at 30-meter resolution. Study lakes have diverse morphometry and span wide altitudinal, land-use, bathymetric, and trophic gradients (Table 1).
Table 1
Table of lake catchment land-use, morphometric, and altitudinal variables.
Variable | Min | Mean | Median | Max |
Max. Depth (m) | 0.3 | 12.3 | 9.9 | 50.9 |
Agriculture (%) | 0.0 | 3.2 | 0.0 | 39.3 |
Forest (%) | 34.9 | 79.2 | 81.2 | 94.9 |
Development (%) | 0.0 | 2.0 | 1.6 | 9.7 |
Elevation (m) | 113.1 | 393.2 | 390.0 | 728.5 |
Water chemistry
Water chemistry sample collection took place at the time of the sediment core collection and is assumed to be representative of the conditions that existed during the deposition of the diatom valves in the surface sediment. Water chemistry data was obtained for all 102 lakes (Table 2), including total phosphorus (TP, µg L− 1), total nitrogen (TN, mg L− 1), pH, and alkalinity (mg CaCO3 L− 1), as well as dissolved oxygen (DO, mg L− 1), temperature (℃), and Secchi depth (m). In the 14-year monitoring period, 15 lakes were sampled twice, 4 lakes were sampled 3 times, and 2 lakes were sampled 4 times. No lakes were sampled more than once in a single year. This method of sample collection yielded 131 water chemistry data points, with six missing TP and 26 missing pH values. Epilimnetic samples were collected with a 2-meter integrated sampler for deep lakes (maximum depth [Zmax] > 3m) or a grab sample for shallow lakes (Zmax < 3m). TP and TN were both measured colorimetrically with ascorbic acid (Standard Method [SM] 4500-P H) for TP or using a cadmium reduction column (SM 4500-N-C) for TN. Alkalinity was measured using manual or automatic titration (SM 2320 B or 4500-H + B) to pH 4.5 as determined using a pH electrode. Both pH, conductivity, dissolved oxygen, and water temperature were measured in the field using a multiparameter sonde (Hydrolab).
Table 2 Summary statistics of lakes in the study. Total of 131 observations (NA = missing samples).
Variable | n | Mean | Median | Min. | Max. | NA |
Alkalinity | 116 | 38.97 | 35.5 | 2 | 121.1 | 15 |
TN (mg L− 1) | 123 | 0.34 | 0.3 | 0.09 | 1.51 | 8 |
TP (µg L− 1) | 125 | 15.99 | 12.3 | 5.2 | 120 | 6 |
pH | 105 | 7.41 | 7.5 | 4.84 | 8.85 | 26 |
Elevation (ft) | 36 | 393 | 390 | 113 | 728 | 14 |
GR | 120 | 3.86 | 2.31 | 0.72 | 42.12 | 11 |
WA:LA | 118 | 27.5 | 14 | 1.4 | 350 | 13 |
Zmax (m) | 120 | 12.31 | 9.95 | 0.3 | 50.9 | 11 |
DO (mg L− 1) | 124 | 8.06 | 8.1 | 4.1 | 12.84 | 7 |
Secchi (m) | 128 | 3.61 | 3.15 | 0.25 | 10.7 | 3 |
Temp (℃) | 128 | 22.27 | 22.58 | 10.22 | 27.73 | 3 |
* Geometric Ratio (GR) = Lake Area0.25/ Zmax; Measurement of a stratification potential of a lake.
** WA:LA = watershed area / lake area
Sediment Cores and Sample Preparation
The sediment diatom dataset used in this study was provided by the VT DEC. Sediment cores were extracted from the deepest point in each lake using a gravity corer and were frozen shortly after extraction and sectioning. The top sample was taken from the upper 1–2 cm of the core. Bottom core samples were taken as a 1 cm section from the bottom 3 cm of the core based on the core integrity. Length of cores ranged from 22 cm (Little Averill Pond) to 57.1 cm (Little Hosmer Pond). Diatom valves were identified and counted by an individual taxonomist under an oil immersion objective (1400x) to at least 500 valves, identifying diatoms to species level whenever possible. Relative abundances of each species were calculated by dividing the valves of each species by the total number of valves counted in the sample.
Calibration set development
Diatom samples from surface sediment from ~ 80% of lakes in this study were incorporated into a training dataset and lake selection was randomized; the remaining 20% were retained as a test set. The resulting training set contains lakes with a wide TP gradient, with discrete values ranging from 5.3 µg/L to 67.8 µg/L, with roughly 50% of data below 12.30 µg/L. The pH gradient spans 4.84 to 8.85, with 10% of data below the pH of 6.44, reflecting the subset of 12 lakes in the state that are classified as acid impaired (Fig. 2). A TP outlier (TP = 120 µg/L) recorded in Walker Pond in 2014 was assumed to result from contamination and removed for the purpose of the analyses below. Only those species that occurred in at least three samples and at least three times at relative abundance of 1% or higher were kept for the analyses.
All analyses were performed using R software (R Core Team, 2013). All ten chemistry and physical watershed variables, except pH, were log transformed (TP, TN, alkalinity, temperature, Secchi depth, DO, Zmax, elevation, WA:LA and geometric ratio (GR)) and species relative abundance was square root transformed.to meet model assumptions of normality. Diatoms species response to environmental gradients was evaluated using detrended correspondence analysis (DCA; Hill and Gauch, 1980) in the R vegan package (Oksanen et al., 2013), following suggestions by Lepš & Šmilauer (2003). A DCA primary axis gradient greater than three standard deviations indicates a unimodal species response to an environmental variable (ter Braak, Cajo, & Prentice, 1988). Informed by the DCA, data were examined to identify patterns in species distribution along 11 environmental variables listed in Table 2 using a constrained correspondence analysis (CCA) following a unimodal-response model.
CCA
Variables used in the CCA were inspected for a variance inflation factor (VIF) to assess for collinearity, and those with high VIF (> 10) were removed (Akinwande, Dikko & Samson, 2015). We used ANOVA to test for significant axes (Legendre, Oksanen & tel Braak, 2011), and constraints were tested using an automatic forward-selection model (p < 0.05) with 999 permutations (Blanchet, Legendre & Borcard, 2008). Determining the fit of the chemistry and physical watershed data for transfer functions was informed by running multiple CCA models as described above constrained by only one of the 11 variables each time, with 999 permutations. Eigenvalue for Axis I was divided by Axis 2 (CCAλ1/CAλ,2) to confirm the robustness of the variable for reconstructions (Hall & Smol, 1992). We then applied a generalized additive model (GAM) to test the explanatory power of pH and TP for species response along these environmental gradients using a residual maximum likelihood (REML) technique in the mgcv and rioja packages (Juggins, 2020, Wood, 2023).
Transfer Functions
We chose to reconstruct TP and pH because they are known stressors to VT lakes (Matthews and Merrell, 2018; Driscoll et al., 2001) and have the greatest variability across lakes in our dataset. Weighted averaging partial least squares regression (WA-PLS) (ter Braak & Juggins, 1993; ter Braak et al., 1993) and weighted averaging (WA) methods (ter Braak & Looman, 1986; ter Braak & van Dam, 1989; Birks et al., 1990) were used to reconstruct historical diatom-inferred (DI) TP and pH. For the WA method, both inverse and classical de-shrinking were performed, using the training set to find the optimal model for the dataset. All variations of WA models were run with tolerance-downweighing. Both the pH and TP models were derived from the optima of 106 different species and the WA-PLS models were verified using the randomization t-test (van der Voet, 1994). The optimal model was selected based on the R2, root mean squared error (RMSE), and root mean squared error prediction (RMSEP), and average and maximum bias. Average RMSEP was calculated using a mean of bootstrapped standard error of prediction (SEP) for the test observations and this linear model-fit between model-predicted and observed values was tested for significance (n = 28 for pH, and n = 28 for TP). If a modern measurement of pH or TP was not taken at the time of the core collection, the measurement closest to that date was used. In some cases, this meant that spring TP or pH were used. We also utilized these test sets to assess the extent of difference between past and present pH (n = 13) and TP using the previously produced transfer function.
Previously enumerated diatom stratigraphy from a 210Pb-dated core from acid-impaired Beaver Pond (Zmax = 25.18 m; Diamond et al. 2022) was used to reconstruct pH and TP. Because the transfer function was run with log-transformed TP data, the data were back transformed with an exponent function for visualization. Long-term monitoring pH data spans back to 1986, while long-term monitoring TP data has been collected since 2007. We compared both monitored records against DI-pH and DI-TP to validate how closely our transfer function predicted the measured values for both variables. Accuracy of the reconstruction was estimated using a mean standard error of prediction (SEP).