Here, we build on previous studies to develop a standardized workflow for identifying predated individuals in acoustic telemetry studies (Fig. 6). We used tag sensor technology, unsupervised machine learning, and supervised machine learning to address the issue of “predation bias” in the field of telemetry and showed that using data collected from tag sensors to train supervised models provides the greatest accuracy for fate classification of tagged fishes (Fig. 5).
When comparing the assigned cluster fates to the known fates of individuals within clusters, as determined through detection data, 2019 was the most accurate year because the majority class in each cluster was the same as the assigned cluster fate (Table 2). However, almost all clusters in all years contained a mixture of individual fates. The mortality cluster (cluster 1) in 2019 contained an even split of mortalities and predations, but the predations in this cluster were identified in freshwater by mobile tracking and therefore the behavioural metrics resembled mortalities more closely than the predations detected in tidal water by stationary receivers. The nature of mobile tracking downriver allows for only a few detections of a given tag in a single location which is insufficient to pick up distinct behaviour. Additionally, the most likely freshwater predators, brown trout (Salmo trutta) or chain pickerel (Esox niger), are relatively stationary species so detection data resembles a dropped tag or dead smolt rather than the active striped bass behaviour we were testing for. The 2018 cluster assignments were also consistent with individual fates, however, two of the clusters were mostly comprised of predations (Table 2). Cluster 1, which was assigned the predated fate, contained only two predated individuals while cluster 3 contained the majority of predated individuals (15) but was assigned a mortality fate based on the behavioural metrics. The 2017 clusters were difficult to distinguish based on behaviour and individual fates due to the high number of predations that year, predated individuals were spread between all three clusters (Table 2). Compared to predation tag data, the cluster analysis reduced predation estimates by 30% in 2017, 30% in 2018, and 3.5% in 2019 (Table 3). Unsupervised clustering methods are capable of fate classification but are less accurate than supervised methods (Fig. 5).
Table 3
Percent of smolts belonging to each fate as determined by the V5D predation tag sensor and detection data, unsupervised cluster analysis (CA), and supervised random forest (RF).
| 2017 | 2018 | 2019 |
Tag sensor | CA | RF | Tag sensor | CA | RF | Tag sensor | CA | RF |
S | 14% | 0% | 4% | 46% | 48% | 42% | 62.5% | 60.7% | 57.1% |
M | 38% | 82% | 16% | 18% | 46% | 10% | 16.1% | 21.4% | 12.5% |
P | 48% | 18% | 80% | 36% | 6% | 48% | 21.4% | 17.9% | 30.4% |
Random forest algorithms consistently increased the percent of individuals classified as predated and resulted in a reduction of estimated migration success and mortality classes compared to the numbers obtained from the predation tag sensor and detection data (Table 3). Predation rates increased by 32%, 12%, and 9% in 2017, 2018, and 2019, respectively. Similar to the cluster analysis, the 2019 random forest algorithm did not successfully differentiate the six freshwater predations from the mortalities.
Data from 2017 showed the greatest disparity of fate assignments amongst the three different classification methods (Table 3). In addition to overall model classification accuracy, balancing accuracy amongst classes is important especially for unbalanced data sets because models will ignore minority classes to achieve greater overall accuracy (Chen et al., 2004; Brownscombe et al., 2020). The small number of successful migrants compared to the number of mortalities and predations in 2017 made it difficult for these individuals to be recognized by both types of machine learning approaches. The few successful migrant smolts were masked in the cluster analysis by the behavioural characteristics of the other fate classes (Table 2), and despite the addition of class weights, the random forest model was still unable to accurately classify successful migrants (in-sample class error = 1.00). In contrast, the percentage of successful migrants was relatively stable amongst all three methods in 2018 and 2019, while mortality and predation classes had larger disparities, especially for the 2018 cluster analysis (Table 3).
The amount of time a tag is retained within a predator and continues to function can impact a model’s ability to accurately classify it as a predation. The retention time of tags in the gastrointestinal tract of predatory fishes depends on several factors including water temperature, predator size, prey size, and tag size (Romine et al., 2014; Halfyard et al., 2017; Daniels et al., 2019; Klinard et al., 2019). The longest known retention time of predation tags is over 149 days observed in an acoustic telemetry study of bloater (Klinard et al., 2019). Additionally, acoustically tagged rainbow trout (Oncorhynchus mykiss) and yellow perch were retained in predatory largemouth bass (Micropterus salmoides) for 1.1 − 11.5 days (Halfyard et al., 2017). In species more comparable to this study, gut retention time of tagged juvenile chinook salmon (Oncorhynchus tshawytscha) consumed by striped bass ranged from 1.2–2.7 days, with a negative relationship to water temperature (Schultz et al., 2015). Here, tags triggered as predated were detected for an average of 2.9 days (range 0-32.7 days). After this period, tags were either evacuated through the gastrointestinal tract, the predator left the study area, or the tag ceased signal transmissions. The longer a tag is in a predator, the easier it is to identify it as a predation because there will be more detections tracking predator behaviour (Daniels et al., 2018). Predations where the tag is ejected quickly and distinct predator movements are not captured are then more likely to appear as mortalities. This is prevalent in the 2018 cluster analysis where 13 of the 15 predations in the mortality cluster (cluster 3) had retention times shorter than the average 2.9 days, the same is true for 15 of the 16 predations in mortality cluster 1 in 2017.
The supervised random forest was the most accurate of the three fate classification methods (Fig. 5). This method increased predation rates greatly beyond estimates made by the tag pH sensor alone and by the unsupervised cluster analysis, however, total mortality only showed a large increase in 2017 (Table 3). The cluster analysis also only increased estimates of total mortality from tag sensor estimates in 2017. Predation accounted for a majority of all smolt mortalities (71–83%) under the random forest estimates while predation tag estimates showed predations as accounting for just above half of all mortalities (56–67%). Whether mortality was attributable to predation or unknown causes, total mortality did not differ greatly between methods. Both predation rates and total mortality decreased from 2017 to 2019 for both the random forest and tag sensor methods (Table 3). The variation in migration mortality rates among years could be due to a number of factors including changes in predator and prey abundance, changes in the timing of the striped bass spawning period, or differences in sampling methods.
We emphasize distinguishing predation from other forms of mortality due to the substantial bias it introduces into telemetry study results and interpretation if not addressed. Previous researchers who have used classification algorithms to identify predation of tagged fish found that without these analyses the spatial and temporal movement of 81% of bonefish would have been biased (Moxham et al., 2019), mortality rates of salmon smolts in freshwater compared to the estuary were underestimated by 10% (Daniels et al., 2018), and survival estimates of salmon smolts were overestimated by 2.4–13.6% (Gibson et al., 2015). Here, even with the use of predation sensor tags, random forest revealed survival estimates were overestimated by 4–10% due to undetected predation events. Therefore, identifying predations in telemetry studies is vital to management not only to investigate sources of mortality in a population but also to ensure accurate conclusions are drawn about the ecology of the study species and population survival rates.
Here, we show that there is value in using predation tags combined with modelling methods to identify predated individuals (Fig. 6). Data including individuals with known fates that have been determined by detection data and a pH or other tag sensor increases confidence in model results and improves model accuracy. The unsupervised cluster analysis had model accuracies ranging from 38.2–83.4%, while the supervised random forest was 81.6–94.4% accurate at in-sample fate classification (Fig. 5). The k-means clustering method was able to classify individuals based solely on behavioural metrics, but it can be difficult to discern which cluster represents which fate group and the decision is likely to be subjective. Assigning fates to clusters was dependent on distinct and predictable predator and prey behaviour with smolts moving downstream and striped bass exhibiting multiple reversals. However, it is possible that smolt mortalities could exhibit upstream movement if they were being carried by the tides and in successful migrants as a response to osmotic stress. The random forest algorithms were trained on smolts of known fate and classified suspect smolts on an individual basis compared to the cluster analysis where smolts were classified by group, leading to a mixture of fates in each cluster.
Differences in model results and prediction accuracies among years highlight the importance of having a large sample size not only for greater power in model predictions but also in an attempt of balancing classes for individuals of known fates. Random forests are among the least sensitive classification algorithms to reductions in sample size (Maxwell et al., 2018; Moghaddam et al., 2020) however, issues of class imbalance and potentially unrepresentative data remain when using small training data sets (Chen et al., 2004; Brownscombe et al., 2020). A recommendation for machine learning in general is to have a training sample size ten times the number of predictor variables, but the minimum recommended sample size for classification algorithms specifically is dependent on the type of data and algorithm (Indira et al., 2010; Maxwell et al., 2018).
Other considerations to optimize model performance are receiver configuration and coverage, which are vital to capturing the distinct behaviour needed to differentiate predator and prey species. The distances between receivers in a river system limits the accuracy of distance travelled and speed calculations because the movement of the individual between receiver detection ranges is unknown. It is therefore not ideal to have large gaps between receivers but the number of receivers available is often limited, especially for large study areas. The behavioural metrics required for machine learning approaches are context-specific and must be tailored to the prey and predator species of interest. Deciding on behavioural metrics prior to receiver deployment can aid in array design to ensure receiver coverage is adequate for calculating the necessary metrics. However, it is possible to have multiple or unknown predatory species in a study system, calculating metrics or concentrating receiver coverage for only one species could mask predation by another. Additionally, avian predation typically resembles mortalities in terms of detection data and could therefore not be identified here, other researchers have identified avian predation by searching colonies or nesting sites for evacuated tags (Evans et al., 2012). For tracking salmon smolts specifically, good up and downstream receiver coverage of the river is important for distinguishing predator and prey movement based on smolt migration behaviour. Predation tags are recommended when tracking smolts due to the high predation pressure from various species during out-migration.
A limitation of the modelling approaches used here is that a timestamp for the moment of predation is not provided. A benefit to using predation tags is that detection histories can be truncated to represent only movements of the live prey based on the change in tag ID and estimated signal lag time (Fig. 2). A fine scale or gridded receiver array where the position of the tagged fish can be triangulated allows for more accurate calculations of speed and turning angle, which can be used for behavioural change point analysis. Behavioural change point analysis identifies significant changes in movement parameters across a time series (Gurarie et al., 2009) so not only can it be used for identifying predated individuals based on behavioural anomalies, but it can also provide a time estimate for when the predation occurred. However, triangulation is difficult to achieve in rivers given their size and shape.
K-means clustering underestimated the number of predations and due to type II error, the tag sensor did as well. Random forest modelling and the example workflow we provide, allows one to study predation by using predation tags, therefore removing the need to tag predators, while also accounting for sensor malfunctions. We recommend combining acoustic tag sensors with supervised machine learning approaches to identify mortalities and predations of tagged fishes thereby increasing confidence in telemetry study results.