Vibrio spp. abundance (Vibrio Total) and prevalence of V. parahaemolyticus (Vp).
In the study area, 12 sampling points were identified (Figure 1) where 191 samples were obtained over 16 months, obtaining 371 isolates from different species, 321 (86.5%) of which belonged to 16 species of the genus Vibrio. 6 (37.5%) of these species are considered to be human pathogens; they were, in descending order of prevalence: V. alginolyticus (88.5%), V. parahaemolyticus (26.2%), V. harveyi (11.5%), V. fluvialis (7.3%), V. furnissii (1%), and V. vulnificus (0.5%). The first two were isolated at all the sampling points (Supp.Table 1). Vibrio spp abundance (vibrio total) ranged from 0 to 2.48 log10 with a mean of 1.30 ± 0.55 log10 cfu x 500 ml-1 and V. parahaemolyticus ranged from 0 to 2.30 log10 with a mean of 1.34 ± 0.51 log10 cfu x 500 ml-1 (Table 1).
Spatio-temporal distribution.
Vibrio abundance (Vibrio total) had significant Spearman correlation with longitude (rho -0.41, p<0.05) and latitude (rho 0.29, p<0.05) (Supp.Figure 1) and was significantly associated with sampling_point (Krustall Wallis test, p<0.0001). The Wilcoxon test identified that point 04ca was the only one with differences with other sampling point. Since 04ca corresponds to a fishing port we investigated whether the differences observed there are due to the quality of seawater. To do this, we grouped the water according to their origin into three categories (beach, mouth_port and fishing_port) noting that only fishing_port is the one that differs significantly from the other two categories, so we regrouped it in a new variable, sea_quality, with two categories (clean that groups beach and mouth_port and no_clean corresponding to fishing_port), noting that they maintain significant differences between the two (Figure 2). A spatial association was also observed for the occurrence of V. parahaemolyticus (Vp), although, unlike vibrio total, vp at 04ca was the lowest of all the localizations sampled (Figure 2). Some 98% of samples where V. parahaemolyticus was detected were taken at localizations categorized as clean seawater (OR 7.17, Pearson's Chi-squared test, p=0.029). Nevertheless, this result should be interpreted with caution since the expected frequency in no_clean seawater was less than 5. Sampling point was also associated with SST (Supp.Figure 2) showed a significant moderate correlation with latitude (rho 0.45, p<0.05) and a weak negative correlation with longitude (rho - 0.29, p<0.51) (Supp.Figure 1), confirming that the waters are warmer further to the north west of the study area and closer to the mouth of the Guadalquivir river (Figure 1). There were no significant differences in salinity and density between different sampling points. The minor variation is probably indicative of the fact that estuaries were not included. Both vibrio total and vp were significantly associated with year and season_year, but not with month, season or julian_day (Supp.Table 2). In 2018, vibrio total was significantly higher than in 2017 (1.4 vs 1.2 log10; p=0.004) as was vp (35.3% in 2018 vs 15.7% in 2017. OR= 2.9, p= 0.003) (Supp.Figure 3). Vp presented a marked heterogeneity in the seasonal series being in any case more prevalent in summer.
Environmental variables.
The effects of SST, salinity and density were analyzed. Density was not significantly associated with vibrio total or vp. SST varied between 12ºC and 28.2ºC, with a mean of 21.9 ± 3ºC. Salinity varied between 26 and 44 ppt, with 26 being an atypical value, with a mean of 39.5 ± 1 ppt (Table 1). Vibrio total was significantly correlated with SST (rho=0.29; p<0.05) and indirectly with salinity (rho=-0,15; p<0.05) (Supp.Figure 1). Vp was not associated with salinity or SST, however, when salinity was segmented into two groups (greater than and less than 39 ppt), a significantly association was observed (ORs<=39; s>39: 2.5, 95%CI: 1.25-5.26) (Supp.Table 2). The coefficient of variation (CV: SD/mean) of salinity indicates that its SD represented only 4% of the mean, while that of SST was 14%. The parallel observed between the CV of SST and mean abundance of Vibrios (Figure 3) suggests that the influence of SST on Vibrio total may be related more to variations in SST than to mean of SST.
Some authors have hypothesized that the statistical significance of the associations between SST and Vibrio abundance depends on the range of the data, since the strength of the correlation with temperature varies according to season (Oberbeckmann et al., 2012; Froelich et al., 2013). In our study, we also observed that the association SST - Vibrio total was stronger in spring than in summer (R2 SSTspring: 0.18 vs R2 SSTsummer: 0.064 and rho spring =0.40 vs rho summer =0.24. p<0.05), and that the greatest variation in SST was in spring (CVspring: 10.9% vs CVsummer: 9.8%). In autumn and winter, it is not significant due to the scarcity of data.
Adjusted effect of all variables: Multivariate analysis.
Effects on Vibrio total. Multivariate linear regression models were developed using the logarithmic transformation log(cfu+0.5), vibrio total, as the dependent variable, which contributed to compliance with the assumptions of regression. After eliminating variables that did not comply with the assumptions of regression, two significant models were obtained (Table 2). Model 1 included the longitude, sea_quality and season_year. This model satisfied the assumptions of regression analysis and explained 32% of the variability of Vibrio total. Model 2, on the other hand, used the SST*salinity interaction instead of season_year and obtained a model that explained 27% of the observed variability. It satisfied the assumptions of regression except for the normality of residuals. It should be taken into account however that the residuals of the bivariate regression model with the SST*salinity interaction were normally distributed and the interaction between salinity and temperature is so well known that it justifies its persistence in the model, even despite the limitations for drawing inferences from our results.
The difference between model 1 and 2 therefore is the exclusive selection of season_year or the SST*salinity interaction, since the two variables are collinear and the season_year coefficient already accounts for the effect of SST and salinity. We chose model 2 since salinity and SST facilitates its interpretation and the interaction helps to understand the combined effect of both variables. The effects are explained in Figure 4. We find that at the same SST, the effect on Vibrio total is greater with low salinities and decreases until salinity equal to 42 ppt, from which the effect is reversed (Supp.Figure 4).
Temperature and salinity interact in such a way that each degree of temperature variation modifies (0.54-0.01salinity) times the average Vibrio total and for each unit of salinity variation it modifies (0.19-0.01SST) times the average from Vibrio total. Therefore, the final effect depends on the interaction of both, and it is not possible to analyze the isolated effects of SST and salinity. This interaction explains why while SST and Salinity were lower in 2018 than in 2017, total vibrio and vp were higher (Figure 5). Polynomial regressions of different orders were tried, but none proved adequate. The standardized coefficients indicate that the variable with greatest influence on the model is sea_ quality, followed by SST, salinity and longitude.
Effects on V. parahaemolyticus ocurrence (Vp). To explain the presence/absence of culturable V. parahaemolyticus, model 3 was developed using multivariate binary logistic regression and the variables nº_spp, origin, and s_seg2, which is the segmentation of salinity into two categories (>39 ppt and <=39 ppt), since the numerical variable was not significant. SST and its categories were not ultimately significant but were left in for explanatory purposes. The model explained 32% of the ocurrence of Vp in the sampled seawaters (R²Nagelkerke = 0.32), satisfied the assumptions of logistic regression and showed that the model was well calibrated (Hosmer-Lemeshow test p= 0.324) with good discriminatory power (AUC: 0.80, 95%CI: 0.74-0.86) (Table 3).
Figure 6 explains the effects. ROC curve, sensitivity and specificity are shown in Supp.Figure 5.
Supervised machine-learning classification techniques were used to test ten different models based on the nine algorithms described in statistical analysis and one using a stacking ensemble method. Supp.Figure 6 shows the accuracy of training and test results applying the 9 algorithms. Since none of the accuracies fully exceeded the baseline accuracy set (0.738), nor that of multivariate logistic regression model 3 (0.74), we finally chose model 3 as the best explanatory and predictive model of vp.