2.1 Study area
The Archipelago Sea is a semi-enclosed archipelago at the southwest coast of Finland between the Baltic proper and the Bothnian Sea (59°45'-60°45'N and 21°00'-23°00'E) in the northern Baltic Sea. Depending on the definition of island, the area is estimated to contain up to 60,000 islands, of which some 41,000 are named in regional charts (Väänänen et al. 2020). In this respect, it is the biggest archipelago in the world with complex and variable topography and has mainly wind-driven water mass movement patterns. The total area of this brackish water sea is 9,436 km2 with a water volume of 213 km3 and salinity content of 4 to 6 PSU (Voipio 1981). The total catchment area of the Archipelago Sea is approximately 8,900 km2 (with a lake area of less than 2% and arable land 28%) (Hänninen et al. 2000). The average water depth is only 23 m as the deepest hollows reach 140 m. The wind-caused sea level variation is generally low with mostly ± 0.5 m variation compared to the theoretical mean level with the insignificant tidal fluctuation (Voipio 1981). The sea is characterized by strong seasonality with the summer temperature of seawater reaching 20°C and with a more than 90% probability of an annual 1-5-month ice cover during the winter (Leppäkoski & Bonsdorff 1989).
The special topographical characteristic of the Archipelago Sea is its zonation in accordance with the relative shares in the land or seascape areas. Originally Häyrén (1900), by using this biogeographical criterion, divided the archipelago into three zones. He showed that zonation ranges from the sheltered inner archipelago, where the landscape dominates; to the intermediate zone (even proportions) and finally to the more open, seascape dominating outer archipelago. The existence of these zones is caused by the slow postglacial uplift of the tilting coastal plain, around 0.5 cm annually, and this uplift has estimated to be proceed some 10,000 years, still (Johansson et al. 2002). Hänninen et al. (2000) proved that zonation could also be traced in water quality, i.e. in nutrient concentrations in seawater, which form hydrographical zonation comparable to those found in biogeographic studies. Kirkkala et al. (1998) and Erkkilä & Kirkkala (2000) showed that similar gradients could also be revealed through seawater Secchi depth transparency (i.e., a method to estimate the intensity or rate of biological primary production in the open sea environment) and microalgae concentration (chl-a). Later, Hänninen et al. (2007) confirmed that other pelagial marine biodiversity patterns and also biodiversity of benthic biota, generally followed transitional zones corresponding to presented hydrographical gradients.
2.2 Profiling buoy
The technical description of the profiling buoy and examples of produced time-series visualisations are presented in detail on the Seili Environmental monitoring programme webpages at: https://saaristomeri.utu.fi/home/, which is administered by the current authors from the Archipelago Research Institute, University of Turku and Turku University of Applied Sciences. Therefore, the text of this chapter is an amended update of captions presented therein under license CCBY-4.0. (Seili Environmental monitoring programme 2020a).
At-sea monitoring is important for the comprehension of the oceanographic processes and further development of the environmental state of the sea, and it could provide essential information for environmental management purposes. The database provider in the present study is an automated monitoring buoy located in the vicinity of island of Seili, in the northern Baltic Sea, SW Finland (Fig 1). The water depth in the study location is ca. 50 m. The intention of the buoy monitoring is to produce detailed data on seawater quality, vertical stratification as well as seasonal and inter-annual oceanographic changes occurring in the sea. Combined short-term (hours, days, weeks) and long-term observations (months, years) also reveal trends and patterns that can help interpret experimental results or provide new research hypotheses. Since 2006, the buoy has been in operation annually during the growing seasons from the early spring to the late autumn. This Seili Environmental monitoring program is carried out by the Turku University of Applied Sciences in close collaboration with the University of Turku (the Archipelago Research Institute), other Finnish Marine Research Infrastructure (FINMARI) consortium partners (the Finnish Environment Institute and the Finnish Meteorological Institute) and the visualisations of the buoy measurements are produced by the Archipelago Research Institute (Seili Environmental monitoring programme 2020a).
The monitoring station is composed of a YSI 6952 buoy platform with a YSI 6000 multiparameter probe (YSI Inc. 2012), which measures seawater salinity (PSU), temperature (°C), dissolved oxygen (O2 mg/l and saturation %), turbidity (NTU), chl-a (fluorescence, expressed as µg/l) and cyanobacteria or blue-green algae (BGA; phycocyanin fluorescence, expressed as cells/ml) concentrations (Fig. 1). The probe is operated vertically in the 2-40 water column by winch, and the station produces four vertical profiles per day with a measuring interval of every 0.5 sec, when the winch is active. The data are sent and stored to an external server twice a day via GSM, and profiles can be visualized and downloaded online at https://saaristomeri.utu.fi/odas_en/. The buoy is deployed to operate in the spring as soon as the ice breaks and is removed in the early winter before the sea ice forms. (Seili Environmental monitoring programme 2020a). Detailed technical information for the profiling buoy operations are given in Loisa et al. (2017).
In 2015, the station was equipped with a Vaisala weather station (WXT-520), which is intended to provide information of the closely linked interaction between the weather conditions just above the sea surface and uppermost sea-surface water layer. The weather station measures air temperature (°C), air pressure (mbar), humidity (%), precipitation (mm), wind speed (m/s) and wind direction (°). This is the first weather station in Finland to be installed on a floating automated monitoring platform. Similar to water quality measurements, the weather station data are sent and stored to an external server twice a day, and it can be visualized and downloaded online at https://saaristomeri.utu.fi/ buoyweather/. (Seili Environmental monitoring programme 2020b).
2.3 Data and modelling
For this study, we focused on two variables: chl-a and seawater temperature (Fig. 2). The temperature (hereafter, temp) is a reasonable and practical candidate predictor of, for example, chl-a concentration (e.g. Fox et al. 2005, Kavak & Karadoğan 2012), as its increase in spring is the main trigger that launches biological activities and accelerates the speed and rate of the biological processes in the Baltic Sea seawater after the cold and icy winter. The seawater temperature is closely connected by the air temperature above the sea surface – when we have a warm period, the heat energy is very rapidly transferred to the upper water column and mixed by the local winds, and during the cool period the dynamics will go in the same but opposite way (e.g. Hall 2016). Moreover, the temperature is one of the key variables in all environmental monitoring programs, and its future predicted values can be obtained reliably from various weather information services.
To avoid having to model the water depth as a continuous variable, we discretized it into two classes, being surface (depth < 20 m) and hypolimnion (depth ≥ 20 m) water columns, and worked with the daily means of the observed values of chl-a and temp over the two depth categories. The reason for choosing of these categories comes from the biological events occurring differently in the layers (see later in the text) and by the choosing of two separate layers we wanted to firstly, make difference between these layers, and secondly, decrease the variation in data due to different kind and speed of biological processes occurring in these layers. We selected daily means to compensate for occasional missing values and to decrease internal variation within data, as we used daily weather forecasts as predictors for the chl-a changes in the latter analysis. However, daily weather forecasts were used only in final predictions for the “Seili-index” and its visualization for the webpages (the index is illustrated detailed in Discussion) but not in the actual model construction. There, we used air temperature data from the Vaisala weather station, which was mounted on the buoy platform.
The biological grounds for the selected two water columns are the formation of thermocline during the summer season between the upper and lower water layers at the depth of around 20 m. Thermocline is a real physical boundary, in which the vertical seawater column has a strong temperature gradient within 1-2 m of depth difference. Its formation in early summer is due to increased solar energy due to the onset of the thermal growing season that warms up seawater in the upper layer, which is constantly mixed by local winds. Biologically, the thermocline is essential for the pelagic ecosystem functioning to support the majority of the primary production of microalgae during their growing season, which occurs within this photic zone above the thermocline (Fig. 2) (e.g. Voipio 1981, Hänninen et al. 2000).
Finally, as the biological year does not necessarily coincide with the calendar year, we needed to align the data from the different years based on some landmark in the yearly observations of these two variables. A useful choice for this is the vernal blooming of the diatom microalgae, called the “the spring bloom”, which occurs regularly at different intensities during the spring months of every year. Blooming is triggered not only by increased light, but also by fast-melting snow and ice in the catchment area of the Archipelago Sea resulting in nutrient-rich freshwater runoffs into the sea (Hänninen & Vuorinen 2011). We located the blooming peak value of chl-a for each year and aligned the data so that the peak always occurs on day 0. For years 2017 and 2018, the blooming had occurred already before the buoy measurements started, and we aligned these data using the blooming of year 2016 as a benchmark because of the similar spring conditions in the terms of snow and ice melting (Fig. 3).
Our model of choice is the Generalized Additive Mixed Model (GAMM) (Wood 2017), in which the conditional expected value of a normally distributed response, Y, is modelled as a linear combination of random effects and smooth functions of fixed effects:
E(Y│X) = β0 +f1(X1) + ... + fp(Xp) + β1Z1 + ... + βqZq,
where X1, …, Xp are the fixed effects, f1, …, fp are unknown smooth functions estimated using ten thin plate regression splines via restricted maximum likelihood (REML) and Z1, …, Zq are the random effects. We used the function gamm in the R-package mgcv (Wood 2017) to fit the model with depth as a categorical variable, depth-wise temperature and depth-wise day of the year as smoothed fixed effects and year as a random effect (shared by all observations of a given year, avoiding the oversaturation of the model with a large number of year and parameter interactions). GAMMs, we think, are good models for this kind of nonlinear data as they keep a balance between flexibility and interpretability, where we especially consider the later are desirable feature. The analyses below reflect, the from our opinion best models, and in the supplement further details about the models and alternative modelling approaches are provided. Additionally, experimentation revealed that making other distributional assumptions together with other link functions in the above modelling framework yields worse predictions.
Two separate analyses supporting each other were performed for the daily mean data. The first analysis, named as the Raw values model, was conducted with the original, untransformed data, depicted in Fig. 4, as the response variable, Y, was used to study temporal changes and dynamical features of chl-a concentrations in seawater during the season. The second, the Difference-day model, was used instead of the daily differences of the original values as the response. The first analysis was included due to its capability to accurately infer the effect of temperature on chl-a concentrations in seawater, which gave us valuable insight on the temporal differences of microalgae dynamics between years. The second analysis, while more difficult to interpret due to the modelling of differences instead of the raw values, was included from the viewpoint of local prediction. Namely, unlike the Raw values model, the Difference-day model allowed for aligning the predicted chl-a concentrations with the true observed values in a local time frame (see the Results section for a more detailed explanation). However, in order to compare the two models’ capabilities, we still considered prediction also with the Raw values model. Note that, whereas the GAMM-model itself already allows capturing the trend and large-scale behaviour of the response variable in time through the smoothed day of the year predictor, the difference-day model further takes the time dependency in the data into account through the differenced values.
The prediction study was conducted in a moving-window fashion. First, “training data” comprising of the measurements for the years 2011-2018 along with first 25 measurement days of the year 2019 were formed, as the latter set of data was needed to estimate the effect of the year 2019. These data were then used to fit the GAMM model described above, and the fitted model was then used to predict chl-a for the next 10 days of 2019 using the true air temperature data of the days obtained from Vaisala weather station. To make the predictions, the index required a forecast of the sea surface air temperature for the prediction period, which we obtained from Norwegian online weather services (https://www.yr.no/) (note that this “prediction using predicted values” naturally accumulates more error in the results than would be the case if the predictor values were fully accurate). Next, the days 26-35 were appended to the training data, and the process was repeated, which again predicted the next 10 days of 2019. This cycle was continued for a total of 18 times until the end of the measurement period of the year 2019.