Species Distribution Model Predictions of the Critically Endangered Grey Nurse Shark in Australia

Species distribution models (SDMs) are commonly used to forecast how threatened species are influenced by climate change. The grey nurse shark ( Carcharias tauras ) is a critically endangered species inhabiting both the east and west coasts of Australia, with negligible genetic interchange between the two populations. I used Generalized Linear Models (GLM), Maximum Entropy (MaxEnt) models and Boosted Regression Trees (BRT) to predict the distribution of the grey nurse shark. The data were a sample of presence-only data, derived from the known grey nurse shark sighting locations, from the east coasts of Australia, with pseudo-absences generated and bootstrapped from a restricted background. I verified these models using leave-one-out cross validation and model metrics including AICc, BIC, percentage of deviance explained, leave-one-out cross-validated R 2 , AUC, maximum Cohen’s Kappa, specificity and sensitivity. Cross-validated R 2 was used as an overall comparison method across model types. I performed out-of-source validation by comparing model projection with the distributional range of the ragged tooth shark ( Carcharias taurus ) in South Africa. The prediction of the selected model was consistent with the current distributional range of the ragged tooth shark.


Introduction
Species distribution modelling has been widely used in conservation biology and ecology 1 for discovering new species 2 , prediction of distributional ranges 3,4 , modelling invasive species' potential distribution 5 , and detecting climate impacts on the habitat suitability 6 . The species distribution model has been used in many domains, in combination with ecological niche theory, at both individual and community levels 7-10 .
These distribution models use different mechanisms to infer the characteristics of the preferred habitat of target species 11 . Many methods have been subjected to model comparison and performance evaluation, with no obviously best method that is suitable for all situations 12 . Model accuracy is difficult to quantify because the evaluation methods (threshold-dependent and threshold-independent criteria) are susceptible and there is typically a lack of sufficient data to do independent, out-of-sample performance tests 13,14 .
The selection of pseudo-absence data can also exert a strong influence on subsequent predictions 15 . For instance, the questions of how large a background region to cover, and how many pseudo-absence points are optimal, are not settled issues. Some studies have shown that the relative importance of predictors and performance statistics such are sensitive to scale and background 16,17 . However, the relationship between species response curve fitting and pseudo-absence data selection remains an inadequately resolved problem 8 .
As an alternative to making assumptions about pseudo-absences, presenceonly models such as MaxEnt 18 , GARP 19 , and ENFA 20 can be used when no absence data is available. They detect the marginality of the species in the entire background 21 . Again, however, the question of how background scale and frequency distribution influence the model performance to the entire environment is not well studied.
I developed a case study of the grey nurse shark (Carcharias taurus, Rafinesque 1810), a coastal marine predatory fish with a small presence sample size and very wide distribution along the coast of Australia, to test the abiotic factor impact on this top marine predator's distribution. In Australia, several broad-scale surveys in the states of New South Wales, Queensland and Western Australia have been done to identify the species' aggregation sites and migration patterns; however, its west-coast distribution is poorly quantified and only one aggregation site was identified 22 . Historically, the east and west coast populations have been separated and have negligible genetic interchange 23 . It has been argued that the model with better transferability is better algorithm 24 .
As such, I used the east coast sighting locations for model fitting and projected the best fitted model in South African coast for model testing. Presence-absence and Presence-only models GLM, BRT and MaxEnt were fitted. was used to calculate the bathymetric slope data set. The SST mean and variance predictors were generated from the 4 km Advanced Very High Resolution Radiometer (AVHRR) Pathfinder Version 5.0. The SST mean and variance were bilinearly interpolated, so that the resolutions of the four predictors were the same.

Species
Logarithm transformation was applied to bathymetric slope and January SST mean. The square of the log-transformed January SST mean was calculated for GLM and BRT. I scaled bathymetric depth and interannual SST variance according to the following equation (1): (see Figure 1). I also eliminated the background data points lying on the grids where the bathymetric slope values were zero. The number of pseudo-absence points in each data set was set to 1350, which was 10 times as big as the number of presence points.
I bootstrapped the pseudo-absences from the restricted background using randomly sampling method with replacement in R statistical software (R Core Team).
I generated 500 bootstrap pseudo-absence samples to form 500 data sets. Each data set had 135 presences and 1350 pseudo-absences. I then randomly split each data set into 135 training data sets and 135 testing data sets to achieve leave-one-out cross validation.
Model evaluation and out-of-source validation  37,39,41 were also calculated by leave-one-out cross validation. I then estimated the median, 5 th and 95 th percentiles for each model metric 42 .
I projected the median cross-validated R 2 selected candidate model in the coastal area of South Africa as an out-of-source validation. In this validation, the bathymetric depth was restricted and the points where bathymetric slope values were zero were eliminated. All the variables in the out-of-source dataset were transformed and scaled according to the methods mentioned before.

Model evaluation
I used cross-validated R 2 , which is asymptotically equivalent to AIC-type criteria, as a universal method for model selection across the three model types. The model ranking results were shown in Table 1, Table 2 and Table 3.
According to the median values of these model metrics, one of the GLM candidate models gained the highest median value of cross-validated R 2 .

Discussion
The predicted habitat suitability was mainly dependent on the multivariate response surface 44 . Carefully examining the multivariate response surface is important. The response curve of January SST mean of the cross-validated R 2 selected GLM model was a narrow bell-shaped curve, which may resemble the realised niche of the species. Grey nurse sharks' aggregation sites are always found around inshore islands and rocky reefs with pinnacles, gutters and rocky caves. Thus, the bathymetric slope might be used as a surrogate for depicting the complex sea bed structure that the shark prefers. As for the bathymetric depth, the model indicated a trend that shallow waters were more preferable than deep waters for the species. However, although the predicted habitat suitability in waters deeper than 90 m was close to zero, the maximum depth to which the shark dives is deeper than 200 m, indicating a sampling bias influenced the model performance.
Various model evaluation methods were explored in this study. Overall, although the median value of cross-validated R 2 of the best fitted GLM model was not very high, the model's projections were consistent with the current distributional ranges of this species in South Africa and Australia, indicating that it captured the inherent responses to the environmental gradients 8 . From the model selection results, it was easy to observe that the SST mean, bathymetric slope and depth were dominant variables. In this study, the interannual SST variance was highly correlated with the SST mean, which may result in model selection bias.

Data availability
The datasets generated and analysed during the current study are available from the corresponding author on reasonable request.