Database Compilation
As appropriate, we followed preferred reporting items for systematic reviews and meta-analyses (PRISMA) protocol50. We compiled data from studies that experimentally quantified thermal tolerance across populations by searching the published literature using the Web of Science (Clarivate Analytics), with the following search string: (Thermal OR temperature) AND (Lethal OR “Thermal Tolerance” OR “Thermal Limit” OR CTmax OR LT50 OR CTmin OR “freezing tolerance”) AND (“Local Adapt” OR “Latitud* Var*” OR Intraspecific). Literature searches were performed on August 24, 2019 and updated on July 28, 2020. We also included a small number of studies that we were aware of but were not returned in the search.
We screened papers based on several criteria for inclusion, retaining only studies that: reported upper or lower thermal limits in °C (e.g., rather than units of time), quantified thermal limits for at least two populations (as defined by authors), recorded organismal scale measurements of thermal limits (e.g. - CTmax or LD50, with the exception of electrolyte leakage methods for plants51), reported sample size for each population (as the number of thermal tolerance measurements made), and quantified tolerance in individuals that were acclimated to common conditions across all populations. We excluded studies that measured thermal limits in populations that arose from cultivars, domesticated species, non-native populations, or post-selection generations of experimental evolution studies.
For studies that met the above criteria, we extracted thermal tolerance values and metadata from the main or supplemental text, tables, and/or raw data associated with the study. When required, data was extracted from figures using WebPlotDigitizer52. In some cases, we contacted authors to acquire data or metadata that was not reported in the study. At the beginning of the data extraction process, a random subset of studies was processed by multiple authors to verify consistent data extraction. All errors reported in the studies were converted to standard deviations. Each thermal tolerance measurement was classified as either an upper or lower thermal limit. We also classified each study as examining “latitudinal” or “elevational” patterns, and each taxon as either motile or non-motile. We based this classification on an individual's ability to exploit thermal heterogeneity in the surrounding environment, which in turn has two components: 1) motility of the species, and 2) the scale of environmental heterogeneity relative to that motility. The number of thermal limits retained after the main filtering steps is summarized in Supplementary Fig. 1. A list of the studies included in the data set is provided in Supplementary Table 1.
Latitudinal Patterns
Using this thermal tolerance data set, we examined latitudinal patterns in thermal adaptation across the four major ecosystems (Fig. 1). To compare intra-specific patterns with inter-specific data, we estimated the change in thermal tolerance per degree latitude for each study by regressing thermal tolerance data against latitude. These regressions included no random effects or interaction terms. Separate regressions were estimated for each species examined in a study. We then compared these intra-specific patterns with the inter-specific values reported in Sunday et al.26 (i.e., the latitudinal slope estimates from the non-covariate model for critical thermal limits). These latitudinal patterns were examined for upper thermal limits of taxa from all four ecosystems, and for lower thermal limits of terrestrial taxa. There was substantial variation in thermal limits over elevational gradients, but data from these studies was excluded from the latitudinal slope estimates as the short horizontal distance covered by these studies resulted in inflated latitudinal slope values.
Differentiation across ecosystems
To examine differentiation across ecosystems and potential environmental drivers of divergence we used an unweighted and weighted approach. First, we used unweighted pairwise thermal limits within each study, only comparing within-study groupings (sex, life-stage, acclimation temperature). We used a common control design when generating these pairs, comparing all populations within a study to the population from the highest latitude sampling site. For each population pair, we calculated the difference in thermal limits (i.e., unweighted raw mean differences in thermal limits between populations), the linear distance in km between sites, and the difference in maximum annual temperatures. We modelled pairwise differences in thermal limits using a linear mixed effects model with the distance between populations and the difference in maximum temperatures between the sites as factors. The two-way interactions between these factors and environment were also included, along with random effects of species nested within phylum. Data was restricted to just population pairs that were <4000 km apart, as all marine population pairs were less than 4000 km apart and only a small proportion of terrestrial, freshwater, or intertidal population pairs fell beyond that threshold (n = 32 beyond compared to n = 371 below).
Second, we used inverse weighted meta-regression to account for varying levels of precision in tolerance estimates across studies34. This analysis included only studies that have sample size greater than one and reported a measure of spread (e.g. standard deviation, standard error, variance) and examined the relationship between thermal limit contrasts and two moderators, distance between sites and difference in maximum temperature. Note that these criteria result in the exclusion of studies that used thermal tolerance metrics like LD50, which was commonly estimated from a single survivorship curve per population in our data set. Intertidal and marine taxa were lumped together for this analysis to account for the smaller number of studies that met the necessary sample size criteria. Effect sizes were estimated as pairwise standardized mean differences (Hedges’ g) using the ‘metafor’ package in R53,54, using common-control pairwise contrasts within a study. Effect sizes were estimated for each within-study group (acclimation temperature, life stage, sex, etc.) separately. We account for the repeated use of the common control by implementing a variance covariance matrix to address non-independence.
To test for environmental drivers of differentiation among populations, we extracted climate data (mean annual temperature and annual maximum temperature) for each collection site. For marine species we used Bio-Oracle v2.055, which contains 2000-2014 monthly sea surface temperatures at 9.2 km spatial resolution sourced from the Global Observed Ocean Physics Reprocessing product (http://marine.copernicus.edu). For terrestrial, freshwater, and intertidal species we used CHELSA56, which contains 1979-2013 monthly temperature data at 1 km spatial resolution sourced from the ERA-Interim reanalysis57. Freshwater-specific climatologies58 closely matched the data extracted from CHELSA (Supplementary Fig. 5). Because the freshwater-specific data set returned environmental data for fewer sites, we used CHELSA derived temperatures for all freshwater sites. We recognize that intertidal species generally experience high body temperatures driven by multiple factors including aerial and water temperature, as well as conductive and convective heat transport mechanisms59. We used aerial temperature for intertidal sites as a proxy because there is little body temperature data derived from biomimetic loggers or mechanistic models for species in our dataset60,61. Temperature data was averaged over a 1 km region around coordinates for each site. If the 1 km region failed to return environmental data (e.g., coastal studies) we used a 2 km region.
Our analysis includes a full model with effect size as a function of ecosystem, maximum temperature difference, distance, and included all interactions and crossed random effects of study and Phylum (or Division for plants). Covariates were centered and scaled prior to analysis. We then used model selection to compare the full model and all possible iterations, which yielded a single best model (no other models had a ΔAIC value < 2). The best model excluded the two-way interaction between distance and maximum temperature difference and the three-way interaction between all moderators (Supplementary Table 2). We used this model to estimate the effects of distance and temperature difference on our effect size response. We note that a model averaging approach yielded the same conclusions. We used funnel plots to evaluate the possibility of publication bias. Funnel plots depict effect sizes as a function of precision (error) (Supp. Fig. 8). Asymmetrical funnel plots would suggest the possibility of publication bias36. Analyses with the entire data set indicated some skew (Supp. Fig. 8a) but removal of these outliers revealed a balanced funnel plot and no change in the analysis outcomes.
The effect of motility on the divergence of thermal limit measurements was examined using two separate analyses, one for unweighted raw mean differences and another for the Hedges’ d effect size estimates. In both cases, the absolute magnitude of divergence was compared between taxa classified as motile and non-motile using a one-way ANOVA. All divergences were examined together, rather than separated out by ecosystem because the proportion of studies dropped between the unweighted mean difference analysis and the Hedges’ d effect size analysis was not equal across ecosystems. No non-motile species are retained from terrestrial studies for example. Only for marine taxa was there a large enough sample size in both analyses for a robust comparison between motile and non-motile taxa. The results of this comparison did not differ from what was observed across the pooled data points.
Vulnerability to climate change
For each population, we estimated a warming tolerance, defined as the difference between upper thermal limits and the maximum temperature at the site of collection origin. To account for potential field acclimatization (phenotypic plasticity), we estimated a corrected thermal tolerance value that accounts for differences between the mean temperature at the site of collection and the acclimation temperature used before thermal tolerance measurements were made. If studies included thermal tolerance data for multiple acclimation temperatures, thermal tolerance in the field was predicted directly from the thermal tolerance reaction norm for each population. These norms were estimated by regressing thermal tolerance against acclimation temperature, and then using this regression to predict thermal tolerance at an acclimation temperature equal to the mean temperature at the site of collection. For studies that did not evaluate the potential for acclimation capacity to affect thermal tolerance, we used the reaction norms described above to predict Acclimation Response Ratios (ARRs) for each population. ARR values were estimated as the slope of each reaction norm, which were then modeled as a function of thermal tolerance and ecosystem as interacting factors32. This model was then used to predict an ARR value for each population bsed on its thermal tolerance and the ecosystem. This predicted ARR is then used to adjust thermal tolerance (TT) based on the difference between acclimation temperature and the mean temperature of each population’s collection location:
Adj. TT = Raw TT + (ARR * (Mean Field Temp. - Acclimation Temp))
These two approaches are illustrated in schematic form in Supplementary Fig. 6.
Predictions of near-term (2040-2050) environmental data (mean temperature and maximum temperature of the warmest month) were retrieved for an intermediate climate scenario (RCP 4.5 / SSP 245) from Bio-Oracle marine sites, and WorldClim for terrestrial, freshwater, and intertidal sites. We note that while the current and future temperature data for marine sites has the same spatial resolution, the predicted future air temperature data set has a spatial resolution of ~4 km (2.5 minutes), rather than the ~1 km resolution (30 arc sec) of the recent climate data. The resolution of the future air temperature data is still high enough to differentiate climates for different populations, as no populations from latitudinal studies were less than 10 km apart. Future mean and maximum temperatures were used to estimate a warming tolerance in the same way as for current temperatures. Briefly, thermal limits were adjusted to account for the difference between acclimation temperature and the mean temperature at the site of collection using either species-specific estimates of ARR or ecosystem-specific estimates of ARR based on the population’s thermal limit. This adjusted thermal limit was then compared with the predicted maximum temperature (max. temp. – adjusted limit) to estimate a warming tolerance under future conditions.