Are comparable studies really comparable? Suggestions from a problem-solving experiment on urban and rural great tits

doi:10.21203/rs.3.rs-4027997/v1

Download PDF

Research Article

Are comparable studies really comparable? Suggestions from a problem-solving experiment on urban and rural great tits

https://doi.org/10.21203/rs.3.rs-4027997/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Performance in tests of various cognitive abilities has often been compared, both within and between species. In intraspecific comparisons, habitat effects on cognition has been a popular topic, frequently with an underlying assumption that urban animals should perform better than their rural conspecifics. In this study, we tested problem-solving ability in great tits Parus major, in a string-pulling and a plug-opening test. Our aim was to compare performance between urban and rural great tits, and to compare their performance with previously published problem solving studies. Our great tits perfomed better in string-pulling than their conspecifics in previous studies (solving success: 54%), and better than their close relative, the mountain chickadee Poecile gambeli, in the plug-opening test (solving success: 70%). Solving latency became shorter over four repeated sessions, indicating learning abilities. However, the solving ability did not differ between habitat types in either test, and showed only a weak among-individual correlation between the two tests. Somewhat unexpectedly, we found marked differences between study years even though we tried to keep conditions identical. These were probably due to small changes to the experimental protocol between years, for example the unavoidable changes of observers and changes in the size and material of test devices. This has an important implication: if small changes in an otherwise identical set-up can have strong effects, meaningful comparisons of cognitive performance between different labs must be extremely hard. In a wider perspective this highlights the replicability problem often present in animal behaviour studies.

cognitive ability

urban and rural environment

string-pulling

plug-opening

experimental replicability

Replicability and generalizability are two important measures of the validity of scientific studies. While conceptual replications, i.e. studies investigating the same hypothesis with different methodology, are relatively common, direct replications of scientic experiments are rare (Brecht et al. 2021; Farrar et al. 2021). Both types of replications are used for looking at the generalizability of the results and for comparisons within and between species (Kabadayi et al. 2016; Kabadayi et al. 2017; Isaksson et al. 2018; Urhan et al. 2023). However, the validity of such generalizations and comparisons are often questionable, considering that replications can be challenging: even if experimenters try to create as identical lab conditions as possible, there will be by necessity large differences between different set-ups. Even in intraspecific, but especially in interspecific comparisons, most animals have been tested in different labs by different experimenters and with different apparatuses. In the study we are presenting in this paper, testing problem-solving abilities in great tits (Parus major), we have found unexpected differences between study years and observers, which has drawn our attention to this problem, as relatively minor and apparently non-instrumental changes to the experimental set-up seemed to cause large differences in the responses of the tested birds.

The cognitive ability we tested was innovativeness, defined as the ability to solve new problems or to find new solutions to old problems and to remember these solutions so that they can be used to exploit new resources (Reader and Laland 2003).Innovation per se is very hard to study as it happens very infrequently. When a new foraging behaviour spreads in a population through social learning, the occasion when an individual first used it, i.e. the innovation, will rarely be known. Instead, the common way to test innovativeness is to present animals with problem-solving tasks they have not encountered before (Cole et al. 2011; Griffin and Guez 2014). In such studies, animals are typically required to solve a task in order to get a reward, frequently consisting of some desirable food (Benson-Amram and Holekamp 2012; Griffin and Guez 2014). However, solving success in these tasks may be affected by other characteristics besides innovativeness, such as experience, current motivational state, and other cognitive traits (Rowe and Healy 2014; Horik and Madden 2016). Furthermore, solving success may depend on experimental design; therefore, designing tests that adequately estimate innovativeness in animals in a standardised manner is a challenging task. Additionally, comparing the change in solving success and latency over repeated sessions can also be a test of individual learning ability: increasing success and/or decreasing latency may indicate that the tested animals are learning from experience (Boogert et al. 2006; Morand-Ferron et al. 2011).

Although most studies of avian cognition have been conducted in corvids and large parrots (Lambert et al. 2019), an increasing body of evidence suggests that many smaller passerines are also quite cognitively capable (Audet et al. 2023), with parids (tits, titmice and chickadees) being particularly good at solving cognitive challenges (reviewed by Urhan et al. in prep). Within the parid family, the great tit seems to outperform its relatives in cognitive tasks and learning (Sasvári 1979; Exernová et al. 2006; Johnsson and Brodin 2019; Urhan et al. 2023). In nature, the great tit is a generalist species known for its innovative foraging behaviour (Overington et al. 2009; Morand-Ferron et al. 2011; Johnsson and Brodin 2019), which may facilitate good problem-solving abilities.

One very well-known cognition test that has been performed on many different animal species is known as the string-pulling test. In this test, a reward, typically food, is attached to a string. In order to get access to the reward, the animal has to pull the string. In tests of birds, the string is typically hanging vertically in a position where it is directly inaccessible to the bird. To reach the reward the bird has to hold the pulled-in loops of the string so that the reward does not fall down out of reach again (Jacobs and Osvath 2015). Before our experiment, there were three studies where wild great tits have been brought into the lab and tested in cages in string-pulling experiments (Thorpe 1956; Vince 1956; Cole et al. 2011). However, solving success in these studies varied greatly, to an extent that it is less likely to be due to natural variation between populations, and more likely due to differences in methodology.

The original primary aim of our study was to test for differences between urban and rural birds, as it is not well understood how urbanization affects cognitive performance. It has been suggested that urban populations should perform better than rural ones, because innovativeness can be beneficial both for colonizing a novel habitat (Sol et al. 2002) and for exploit novel, anthropogenic resources (Rodewald et al. 2011). Despite this, previous studies on various species yielded mixed results (Griffin et al. 2017; Lee and Thornton 2021; Vincze and Kovács 2022). The great tit is common in its original forest habitat as well as a successful colonizer of anthropogenic habitats such as city parks and suburban gardens. In this species, two studies demonstrated that birds in more urbanized habitats tend to perform better in problem-solving tasks compared to their less urbanized conspecifics (Preiszner et al. 2017; Grunst et al. 2020). However, both of these studies were performed on breeding great tits at their nests in the wild. Therefore, it is not well-known how well urban and rural great tits perform in problem-solving tasks outside the breeding season, under controlled indoor conditions.

We aimed to test the following questions with the study: (i) how do great tits perform in the string-pulling test compared to earlier studies (Thorpe 1956; Vince 1956; Cole et al. 2011)? (ii) How do great tits perform in another problem-solving task, a plug-opening test, compared to their American relative, the mountain chickadee Poecile gambeli (Kozlovsky et al. 2015; Kozlovsky et al. 2017)? (iii) Do the birds show a learning effect in the above two tasks, i.e. does their problem-solving performance improve over repeated sessions? (iv) Is there a relationship between the performance in the two tasks? And most importantly: (v) is there a difference between the problem-solving performance between urban and rural great tits under controlled captive conditions? We also discuss the unexpected differences between study years, which was included in our models as a control variable.

Subjects and housing

We captured great tits (N = 66) in three urban areas (a city park in Malmö, 55.6001° 12.9899°, population density in 2020: 4 150 people//km²; and two sites in Lund, 55.7144° 13.2069° and 55.6976° 13.2472°, population density in 2020: 3 535 people/km², source: https://www.citypopulation.de/en/sweden/cities/) and eight rural sites (seven within 10 km from the town of Höör, 55.9346° 13.5278°, and one at Stensoffa in the Svedala region, 55.6947° 13.4494°; population density: <5 people/km²) in Scania, Southernmost Sweden (Table S1), using mist nets set up next to bird feeders that we previously set up. The Malmö and Lund sites consisted of an urban matrix of large buildings surrounded by major roads, pedestrian walkways, and lawns interspersed with a mix of native and non-native tree species (for details on species composition in Malmö, see Jensen et al. 2022). The rural sites were in forested areas, with no active farms or inhabited houses near the capture locations. The most common trees in these forest habitats were common oak (Quercus robur), lime (Tilia cordata), elm (Ulmus glabra), birch (Betula sp.), Norwegian spruce (Picea abies) and hazel (Corylus avellana).

We captured and tested 20 birds in 2015 (September to December) and 10 birds in 2016-2017 (December to February). As the results were inconclusive, we resumed the experiment by capturing and testing 36 additional birds in 2021-2022 (September to February). We did not capture birds from March to late August when great tits are breeding and moulting. After capture, we marked each bird with one unique numbered metal ring as well as one or two plastic colour rings for visual identification in the lab. We used plumage characteristics to age and sex them. We then transported the birds in individual cotton bags to an indoor animal facility at the Department of Biology, Lund University. The transport took a maximum of 30 minutes.

We housed the birds in individual 55 × 56 × 36 cm cages that we had positioned on shelves in an enclosed compartment along a wall in the room. The cages were placed two by two so that each bird had visual contact with one neighbour. The room had lighting with an outdoor light spectrum and computer-controlled light and temperature regimes. In mornings and evenings, an automatic one-hour dimming function simulated dawn and dusk, following outdoor day length patterns. We kept the temperature constant at 14°C, which is a temperature that works well in this type of experiment (Brodin and Urhan 2014; Brodin and Urhan 2015; Isaksson et al. 2018). Before we started any training or experimental sessions, we allowed the birds to get accustomed to the environment in the lab for at least two days (i.e. started the tests no earlier than the morning of their third day in captivity), which is sufficient according to our experiences from previous studies.

The birds had ad libitum access to a food mixture of seeds and nuts, a suet cake and water that was changed daily. The water was enriched with a commercial vitamin supplement for birds. We cleaned the cages every day. Before each testing session, we visually inspected the birds and made sure that they were in good condition. We avoided handling the birds during the experimental sessions to minimise stress. When we had finished all experimental sessions on a bird, we released it at the same location as it was originally captured, 10 to 26 (mean ± SD = 17.8 ± 4.6) days after capture. Before we released a bird, we checked whether it was in adequate body condition (i.e. no injuries or feather damage and sufficient fat reserve). The study complies with Swedish and EU animal welfare legislations and regulations.

Neophobia test

Animals may frequently be wary of new and unknown objects, a phenomenon known as neophobia, causing them to avoid novel objects. Hence, there is a risk that an animal’s inability to solve a task may depend on neophobia towards experimental objects rather than an inability to pass the test (Greenberg 2003; Audet et al. 2016). To control for this, we performed a neophobia test, two to five days after capture in 2015 and 2021-2022, and following another experiment in 2016-2017 (Isaksson et al. 2018). The test was performed on the birds in their home cages, visually separated from their neighbours. We started the test with a control stage in which we presented a mealworm (Tenebrio molitor) on a ceramic dish (diameter: 10 cm) that the birds had been familiarized with before the test. We repeated this procedure five times with each bird to control for within-individual variation in feeding latencies. If the bird refused to take the mealworm for over 30 minutes in this control stage, we terminated the experiment and repeated it the next day. All but one bird (an urban adult female from 2021) took the five mealworms in the control stage in either the first or the second neophobia test. As we could not calculate a neophobia score for this one bird, it is excluded from models in which neophobia is included as a covariate (see Statistical analyses).

After the fifth session, we presented the mealworm on an unfamiliar plastic plate (diameter: 35 mm in 2015-2017, 85 mm in 2021-2022) that was painted with broad red and green stripes, placed on top of the ceramic plate. Such a plate with a striking novel colouration should be a good reason for neophobic behaviour to manifest. The most common neophobic action is that it takes a longer time to approach the novel plate than the familiar one. We observed the bird until the bird consumed the worm from the coloured, novel, plate. There was one bird (an urban adult male, also from 2021) that consumed all five mealworms in the control stage but did not consume the worm in the neophobia stage for 30 minutes; we terminated the test for this bird and considered its latency to be the maximum allowed time, 1800 seconds. We calculated neophobia in the same way as Audet et al. (2016), as the difference in seconds between when the bird took the mealworm from the new brightly coloured plate and the old, non-painted plate. For the latter, we used the mean of the five sessions in the control stage. We then log-transformed the neophobia score (adding 400 seconds to all data points so that we get positive values for all of them) to get the variable closer to a normal distribution.

General experimental protocol

Following the neophobia test, each bird participated in four problem-solving experimental sessions that were performed at least 24 but no more than 48 hours from one another. The problem-solving sessions consisted of two 20-minute tests, the string-pulling test and the plug-opening test. For all birds in 2015 and half of the birds in 2021-2022, the string-pulling test always preceded the plug-opening; for the other half of the birds in 2021-2022, the order was reversed, i.e. the plug-opening always preceded the string-pulling. In 2016-2017, the plug-opening and string-pulling tests were performed separately instead of directly after one another; multiple tests of either or both types were performed on the same day; and there could be several days long gaps (up to 12 days) between two tests of the same kind. In spite of these irregularities in the experimental regime, the 10 birds in this group showed similar learning patterns to the 56 birds with stricter regime (see Results), so we opted to include them in our models.

At the start of the experimental sessions, the focal bird in its home cage was visually (but not acoustically) isolated from the other birds by moving the cage to a separate shelf (2015-2017) or a desk in the same room (2021-2022) and closing off the housing compartment. After turning off all lights, the observer removed all food from the cage and set up the experimental device for the first test. The lights were turned off so that the bird could not see the device getting set up. In 2015 and 2022, the perches, except for the one next to the test device, were also removed from the cage; in 2016-2017 and 2021, all perches were kept in the cage. Following this, the observer moved to a booth covered by dark, one-way glass to get out of the bird’s sight and turned on the lights for the bird. After the bird had solved the task or succeeded to eat the worm by other means (see below) or lost the worm by dropping it where it could not reach it, or, after the maximum time of 20 minutes, the observer turned off the lights again, replaced the device for the first test with the device for the second test, and repeated the above protocol. All sessions were video recorded. Regardless of whether or not a bird solved a problem in the first session, we performed four experimental sessions on all birds to test whether their solving performance improved with each repeat, indicating learning. All neophobia and problem-solving sessions were recorded on camera (type: Toshiba Camileo S20); however out of the 516 problem-solving sessions, 66 recordings are not available due to technical malfunctions during recording or file saving. See Online Resource 1 for a sample of these videos.

String-pulling

Our test device consisted of a small (35 mm diameter) petri dish (with a bottleneck-like plastic rim attached to it to reduce the risk of the reward accidentally falling out) attached like a hanging bucket to a 17 cm string, hanging inside a vertically positioned transparent plastic tube with the opening facing upwards (Figure 1a). In the dish, we had placed the food reward (two mealworms in 2015, reduced to one mealworm after it seemed sufficient from 2016 onward) that was visible but not directly accessible to the bird until it pulled up the string. We discarded the tests from three birds (two rural males and one rural female) in 2015 because they were presented with a test prototype where they had no plastic tube around the string. The remaining 27 birds from 2015 to 2017 had the string hanging into a thin-walled plastic tube crafted from plastic cups and stabilised with a wooden frame. The 36 birds in 2021-2022 had a sturdier plastic tube (150 mm tall and 70 mm wide, with a 3 mm thick wall), mounted on an upside-down ceramic dish, around the string; in 2022, a thinner rim was added to this sturdy tube.

We considered a session as solved when a bird pulled up the string and took out the worm from the dish. Out of the 252 trials of 63 birds included in our analyses, 16 had to be terminated early because the mealworm fell out of the dish before the bird could pull up the string (in 12 cases because the bird was shaking the string, and in four cases because the worm crawled past the rim of the dish). These were counted as unsuccessful tests, and in the analyses these birds were given maximal latencies. In eight trials, the birds successfully pulled up the string but lost the worm, dropping it back into the tube. These were counted as successful despite the fact that they could not get the prey, because the birds still went through the right set of motions to get the prey. In five trials the birds pulled up the string, dropped it outside the tube, and took the worm from the hanging dish; in one trial the bird stretched downward to reach all the way down to the rim of the dish and pulled it up before taking the worm out. Although these were both unconventional solutions, we still counted them as successful because the bird pulled up the dish in some innovative way. However, in four trials, the bird dived into the tube and ate the worm while in there, then attempted to get out. These trials were counted as unsuccessful despite the bird getting the worm, because this “solution” did not require innovation; three out of four of these birds managed to solve the problem in the conventional way afterwards.

Plug-opening

In this test, we placed a mealworm inside a transparent tube that was closed by a cotton plug at its bottom end (Figure 1b). In 2015-2017 and 2022, we used a 75 mm long and 11 mm wide glass tube; in 2021 it was a slightly larger, 100 mm long and 15 mm wide plastic tube. At the start of each session, we introduced this test device to a bird’s home cage attached to the cage wall next to a perch. In Groups 2015 and 2022, the tube’s bottom was 41 cm above the cage’s floor, whereas in 2016-2017 and 2021 it was only 26 cm above the cage floor. If a bird removed the plug, the mealworm would fall to the bottom of the cage and become accessible to the bird.

We considered a test successful when the bird removed the plug so that the worm fell out. Out of the 264 trials of 66 birds, there were six trials where the bird pulled out the cotton plug but lost the worm before eating it: in five trials it fell outside the cage and in one the bird could not find it in the cotton. In a seventh trial, the bird pulled out the cotton, but the worm got stuck in the tube. We counted these trials as successful despite the birds not getting the food reward. In 6 trials, the bird, instead of pulling the cotton with its beak, grabbed it with its foot and pulled it out. These solutions were also counted as successful. However, in one trial, the bird pulled out the cotton with its foot clearly by accident, as it did not pay attention to the tube and did not eat the worm afterwards; this trial was counted as unsuccessful. In two trials, the cotton fell out of the tube without the bird touching it, and in two other trials, the worm escaped from the tube, squeezing by the cotton plug, before the bird could solve the task. These trials were also counted as unsuccessful.

Statistical analyses

For each task, we quantified problem-solving latency as the time (in seconds) from the start of the test until the bird solved the problem (took out the worm from the dish in the string-pulling test, pulled out the plug so that the worm fell out in the plug-opening test). We decided to use the start of the test rather than the first interaction with the test device because the bird was in a small enclosed space and could inspect the feeder already before interacting with it. For the unsuccessful sessions, we assigned a maximal latency value of 1201 seconds, even if they had to be terminated early due to the bird losing the worm. We assigned a separate latency value for each of the four tests of the same type therefore, each bird was in the model with four trials.

Base models

We run all our statistical analyses in R (version 3.6.1). We analysed problem-solving latency with Cox mixed-effects proportional hazard models (separate models for string-pulling and plug-opening) using the “coxme” R package (Therneau 2012). Survival models like the Cox proportional hazard model simultaneously handle variation in the probability of an event (such as solving success) and variation in latencies, making them well-suited for analysing behavioural latency data when there are individuals who do not show the focal behaviour (e.g. solve the task), as long as the proportional hazard requirement is met (Jahn-Eimermacher et al. 2011; Andersen et al. 2021). Therefore, they are often used in problem-solving studies (e.g. Cook et al. 2017; Preiszner et al. 2017; Prasher et al. 2019). In these models, we used solving latency as the response variable, treating tests with maximal latencies (i.e. tests where the bird did not solve the test) as censored data. We included the following explanatory variables in our model: sessions number (1 to 4 for the four consecutive test sessions of the same type on the same individual) as a covariate, and habitat type (urban vs rural), sex (male vs female), age (first-year vs older) and year (four levels: 2015, 2016-2017, 2021 and 2022) as factors. The variable “year” also controls for the identity of the experimenter, as it was always the same person within a year but a different person each year except in 2021 and 2022. We treated 2021 and 2022 as separate years because we implemented changes in the methods between December 2021 and January 2022 (see above), whereas 2016-2017 was treated as a single year because there were no such changes in the protocol. We also included bird ID nested within capture site as random factors to control for autocorrelation within individual and within population, respectively. As stepwise model selection based on p-values, despite being frequently used, is also often criticized (Garamszegi et al. 2009), we opted to present the estimates both from the full models and from reduced models where explanatory variables with P-values over 0.1 were eliminated. We refer to explanatory variables with P-values below 0.05 as “statistically significant” and those with P-values between 0.05 and 0.1 as “tendencies” or “trends”. For pairwise comparisons between the four years, we extracted parameter estimates by using the ‘emmeans’ function of the ‘emmeans’ R package (Lenth et al. 2019); we opted to not use the P-value corrections built into the package, as these methods reduce the statistical power of the models (Nakagawa 2004).

As a sensitivity analysis, we also tested the effect of the above variables on solving success and solving latency in separate models, using the glmmPQL function of the MASS R package (Venables and Ripley 2002). For solving success, we built mixed-effects generalized linear models with binomial error distribution. The response variable in these models was a binary factor in which successful and unsuccessful sessions were included with a value of 1 and 0 respectively. For solving latency, we built mixed-effects linear models with latency as response variable, which we log-transformed to bring closer to a Gaussian distribution. In this model we excluded unsuccessful sessions and included only the subset of sessions where the bird solved the task (therefore, unsuccessful birds in all sessions were excluded, reducing the sample size). In both models, the fixed and random effect structure was identical to the above Cox models.

Neophobia

As we could not quantify neophobia for one individual (the one that did not take the food item in the control phase of the neophobia test, see methods), and excluding this individual would have led to data loss, we opted not to include neophobia in the above models. Instead, we tested for an effect of neophobia on problem-solving success by building separate, extended Cox mixed-effects proportional hazard models, including all the fixed and random variables from the above full models, plus log-transformed values of neophobia as a covariate. As the string-pulling model (but not the plug-opening model) was sensitive to variable order due to the relatively large number of censored data, we kept the variable order the same as in the base model and added neophobia as the last fixed term. As the other variables yielded estimates qualitatively similar to the base model, we do not report the full model output, only the neophobia estimate. To avoid multicollinearity, we also tested whether neophobia was affected by any of our tested factors in a single linear model with habitat type, sex, age and group as covariates; none of these variables had a significant effect on neophobia (Table S2).

Habitat, sex and age differences in learning speed

We tested whether learning speed (i.e. the change of latencies over the four sessions, included in the model as the variable “session number”) differed between habitat types, sexes and age groups by adding interaction terms between session number × habitat type, session number × sex or session number × age to our Cox models. We tested each of these interactions in separate models rather than all three in the same model to avoid over-parametrization. Like with the neophobia models, we only report the interaction estimates; the variables not included in the interaction yielded qualitatively similar results to the base model.

Relationship between performance in the two test types

We tested whether problem-solving performances in the string-pulling and the plug-opening tests are related to each other by adding solving latency from the test with the other test device (log-transformed for better model fit) as a covariate to our Cox models, in which non-solvers were given maximal latency values of log(1201) = 7.091. In this test, the sessions were paired by session number, i.e. the first string-pulling test with the first plug-opening test, the second string-pulling with the second plug-opening, and so forth. Like in our other extended models, we only report the estimates belonging to the effect of one solving latency on another.

We also investigated the relationship between the success rates of the plug-opening and string-pulling tests with Pearson’s chi-squared tests, one comparing overall solving success between the two problem-solving tests (each bird that solved a test at least once was counted as “successful” and only those that failed 4 out of 4 times were counted as “unsuccessful”) and one comparing the solving success in the first session, but only birds that solved in the first session were treated as successful.

String-pulling:

Overall, this test was solved by 34 out of 63 birds (54.0%) at least once; there were 9 successful solutions (14.3%) in the first, 25 (39.7%, including successful solutions by birds who solved in the previous sessions; 16 first solutions) in the second, 28 (44.4%; 5 first solutions) in the third and 32 (50.8%; 4 first solutions) in the fourth session. According to the full Cox model (Table 1a), solving latencies on average decreased over the four sessions, and females tended to solve faster than males, whereas there was no significant difference between either urban and rural or adult and juvenile birds. Furthermore, the birds in 2021 and 2022 solved the problem significantly faster than the birds in 2015, with the 2016-2017 birds having an intermediate value not significantly different from the other three (Table 1a, Figure 2a). After removing the non-significant effects of environment and age, the effect of session number (coef ± SE = 1.167 ± 0.127; Z = 9.180; P < 0.001), sex (coef ± SE = -1.730 ± 0.922; Z = -1.877; P = 0.061) and year (2021 vs 2015: coef ± SE = 2.444 ± 1.114; Z = 2.195; P = 0.028; 2022 vs 2015: coef ± SE = 2.411 ± 1.214; Z = 1.705; P = 0.088) remained qualitatively similar. The GLM models showed that solving success was significantly affected by session number, sex (females being more successful) and year (birds from 2015 were less likely to solve (5 out of 17 birds, 29.4%) than those from 2021 (15 out of 24, 62.5%), 2022 (9 out of 12, 75.0%) and tendentially 2016-2017 (5 out of 10; 50.0%); Table S3a), plus a non-significant trend of rural birds solving faster (Table S3a); session number was the only variable that significantly affected the solving latency of successful birds (Table S4a). Neophobia had no significant effect on solving latency (coef ± SE = 0.405 ± 0.791; Z = 0.510; P = 0.610). Furthermore, learning speed did not differ between urban and rural (coef ± SE = 0.182 ± 0.217; Z = 0.840; P = 0.400), male and female (coef ± SE = -0.123 ± 0.223; Z = -0.550; P = 0.580), or juvenile and adult birds (coef ± SE = 0.155 ± 0.209; Z = 0.550; P = 0.580).

Table 1: Effects of our explanatory variables on problem-solving latency in the string-pulling test (a) and the plug-opening test (b), extracted from summary tables of our Cox mixed-effects models; pairwise comparisons between years are estimated marginal means. More positive values indicate increasingly faster solving (i.e. shorter latencies) for covariates (session number) and faster solving by the compared level (listed first) than the reference level (listed second) for factors. Statistically significant effects and trends are both marked bold.

String-pulling
Fixed effects	Coefficient	± SE	Z	P
Session number	1.120	± 0.123	9.090	<0.001
Environment (urban vs rural)	-1.042	± 0.769	-1.354	0.180
Sex (male vs female)	-1.404	± 0.771	-1.822	0.068
Age (juvenile vs adult)	0.742	± 0.838	0.886	0.380
Year (2016-2017 vs 2015)	1.772	± 1.207	1.468	0.142
Year (2021 vs 2015)	2.185	± 0.908	2.406	0.016
Year (2022 vs 2015)	2.289	± 1.095	2.089	0.036
Year (2021 vs 2016-2017)	0.414	± 1.042	0.326	0.717
Year (2022 vs 2016-2017)	0.517	± 1.304	0.392	0.692
Year (2022 vs 2021)	0.103	± 1.047	0.099	0.921
Random effects	SD
Site	0.283
Bird ID nested in Site	2.128
Plug-opening
Fixed effects	Coefficient	± SE	Z	P
Session number	0.778	± 0.093	8.370	<0.001
Environment (urban vs rural)	-0.158	± 0.594	-0.266	0.791
Sex (male vs female)	-0.266	± 0.611	-0.434	0.664
Age (juvenile vs adult)	0.456	± 0.661	0.690	0.490
Year (2016-2017 vs 2015)	0.131	± 0.861	0.152	0.879
Year (2021 vs 2015)	-2.158	± 0.707	-3.052	0.002
Year (2022 vs 2015)	0.584	± 0.851	0.686	0.493
Year (2021 vs 2016-2017)	-2.289	± 0.836	-2.739	0.006
Year (2022 vs 2016-2017)	0.453	± 1.026	0.441	0.659
Year (2022 vs 2021)	2.741	± 0.817	3.356	0.001
Random effects	SD
Site	0.192
Bird ID nested in Site	1.835

Plug-opening:

Altogether, 46 out of 66 birds (69.7%) solved the task at least once; there were 25 successful solutions (37.9%) in the first, 36 (54.5%, including successful solutions by birds who solved in the previous sessions; 13 first solutions) in the second, 39 (59.1%; 5 first solutions) in the third and 43 (65.2%; 3 first solutions) in the fourth session. Our full Cox model showed that solving latencies decreased over the four sessions, and there was no significant difference between urban and rural, male and female, or adult and juvenile birds (Table 1b). Year had a significant effect: birds in 2021 solved significantly slower than from the other years, which were not significantly different from one another (Table 1b; Figure 2b). After removing the non-significant effects of environment, sex and age, the effect of session number (coef ± SE = 0.881 ± 0.093; Z = 8.390; P < 0.001) and year (2021 vs 2015: coef ± SE = -2.105 ± 0.709; Z = -2.969 ; P = 0.003; 2021 vs 2016-2017: coef ± SE = -2.163 ± 0.848; Z = -2.551; P = 0.011; 2021 vs 2022: coef ± SE = -2.737 ± 0.807; Z = -3.391; P = 0.001) remained qualitatively the same. The GLM models showed that solving success was significantly affected by session number and year (birds from 2021 were less likely to solve (10 out of 24 birds, 41.7%) than those from 2015 (16 out of 20, 80%), 2016-2017 (8 out of 10, 80%) and 2022 (12 out of 12 birds, 100%); Table S3b); solving latency of successful birds was affected by session number and by year (birds from 2021 being slower than those from 2015 or 2022; Table S4b). Neophobia had no statistically significant effect (coef ± SE = -0.041 ± 0.596; Z = -0.070; P = 0.940). Furthermore, learning speed did not differ between urban and rural (coef ± SE = 0.051 ± 0.166; Z = 0.310; P = 0.760), male and female (coef ± SE = 0.100 ± 0.167; Z = 0.600; P = 0.550), or juvenile and adult birds (coef ± SE = -0.197 ± 0.172; Z = -1.150; P = 0.250).

Relationship between the two tests:

Out of the 63 birds that participated in both tests, 27 solved both tests at least once; 7 solved the string-pulling but not the plug-opening; 16 solved the plug-opening but not the string-pulling; and 13 did not solve either test (χ²_df=1= 3.199; p = 0.074). If we look at only the first sessions of each bird, 6 solved both tests, 3 solved the string-pulling but not the plug-opening; 19 solved the plug-opening but not the string-pulling; and 35 did not solve either (χ²_df=1= 2.014; p =0.156). The Cox models revealed that birds with short string-pulling latencies also tended to be faster at solving the plug-opening test (Figure 3), whether string-pulling latency was the response variable and plug-opening latency was the explanatory variable (coefficient ± SE = -0.213 ± 0.112; Z = -1.910; P = 0.057) or the other way around (coefficient ± SE = -0.226 ± 0.112; Z = 2.010; P =0.045); note that in both cases negative coefficients and Z-values mean positive correlations.

In our study, we investigated the problem-solving abilities of urban and rural great tits in a string-pulling and a plug-opening task. We found higher problem-solving success in the string-pulling test compared to earlier studies, and decreasing latencies over repeated sessions in both task types, indicating individual learning, as well as a weak relationship between solving successes and latencies in the two tasks. Furthermore, we did not find any significant difference in either mean solving latencies or the decrease of solving latencies over repeated sessions between urban and rural birds, between juveniles and adults, and only a slight difference in the string-pulling task between males and females. However, we found marked differences between the solving latencies of birds from different years in both tasks, which is likely due to methodological factors (summarized in Table 2), and therefore highlights the issue of replicability of studies. We discuss these results below.

Table 2: Summary of methodological differences between the four years.

Year	2015	2016-2017	2021	2022
Experimenter	A	B	C	C
Sample size	20 (5 UM, 5 UF, 5 RM, 5 RF)	10 (2 UM, 2 UF, 4 RM, 2 RF)	24 (7 UM, 5 UF, 7 RM, 5 RF)	12 (2 UM, 4 UF, 2 RM, 4 RF)
Urban sites	Ekologihuset, Malmö	Ekologihuset	Ekologihuset, Linero	Linero
Rural sites	Backen, Orups sjukhus, Växsjön, Stensoffa	Gäddangen, Karlsund, Linekulsvägen, Växsjön	Gäddangen, Ormapumpan	Karlsund, Linekulsvägen
Period	September to December	December to March	September to December	January to February
Inter-trial intervals	24 to 48 hrs	1 to 288 hrs	24 to 48 hrs	24 to 48 hrs
Test order	Always string first	Random	Varied among birds (but not within)	Varied among birds (but not within)
Other perches	Removed	Not removed	Not removed	Removed
String cover tube material	Thin plastic	Thin plastic	Sturdy plastic	Sturdy plastic, thin rim
Plugged tube size and material	Small (75 × 11 mm) glass	Small (75 × 11 mm) glass	Large (100 × 15 mm) plastic	Small (75 × 11 mm) glass
Plugged tube place	High (41 cm)	Low (26 cm)	Low (26 cm)	High (41 cm)
Worms	2	1	1	1

String-pulling success and latency

Interestingly, the percentage of successful birds in the string-pulling test was higher than in previous string-pulling tests in this species (Thorpe 1956; Vince 1956; Cole et al. 2011). Vince (1956) performed the test on 12 birds, out of which only 1 solved the problem upon the first trial (8.3% solving success). Thorpe (1956) found that 4 out of 28 birds (14.3%) managed to pull up the string, whereas Cole et al. (2011), with slightly different experimental set-up, found that 93 out of their 365 birds (25.5%) solved the test successfully. In our study, in the first 20-minute trial, 14.3% of our birds solved the test, a very similar ratio to what Thorpe (1956) found; however, by the end of the fourth 20-minute trial, 54.0% solved the string-pulling task at least once, double of the 25.5% that Cole et al. (2011) found. The difference between ours and previous results is even more striking if we look at the years separately: in 2015, only 29.4% of the birds solved the task at least once, which is only slightly above what Cole et al. have found, but by 2022, this numer increased to 75.0%.

Several factors can explain these relatively high solving success rates. First, our birds may have been more acclimatised to captive conditions. Rather than testing the birds only a day after capture like Cole et al. (2011), our birds had been in captivity for at least three days before the first session of the problem-solving tests. This extra time may have helped the birds become less stressed when interacting with the test device (but see Butler et al. 2006). Second, our birds may have been more motivated than those in previous studies. Motivation plays an important role in problem-solving success (Horik and Madden 2016). Unlike Vince (1956), who baited the test device with seeds, Cole et al. (2011) and we both baited it with more attractive live insect prey (waxworms or mealworms). Instead of fasting the birds like Vince (1956) and Cole et al. (2011) did, we motivated them by keeping them on a seed diet prior to the experiments. Great tits are better at problem-solving tasks with live insect food as the reward when kept on a seed diet than when they are kept on an insect diet (Davidson et al. 2020). Third, rather than participating in a single, up to an hour-long trial like the birds in the previous studies (Vince 1956; Cole et al. 2011), our birds’ learning time was broken down to four relatively short (up to 20 minutes long) sessions. Usually, great tits will inspect new items quickly and then lose interest after a few unsuccessful attempts (Johnsson and Brodin 2019). Therefore, it is possible that our birds, after losing interest upon failing to solve for the first time, regained their motivation by the time of the next session. This idea is supported by the fact that 23 of our 34 successful first solutions (67.6%) happened in the first half of the successful session (median solving latency in this subset: 361 seconds).

Plug-opening success and latency

The plug-opening task has never been performed on great tits, but multiple times on mountain chickadees (Kozlovsky et al. 2015; Kozlovsky et al. 2017). While most birds eventually solved the task in these studies, it usually took them several one-hour sessions. Compared to this, our great tits solved much faster, with 38% of them figuring out the solution by the end of the first 20-minute trial, and 70% of them having solved at least once by the fourth 20-minute trial. This could be a difference between species and could be explained by the differences in their ecology: mountain chickadees are specialized food-hoarders, making them specialized in spatial memory tasks (Croston et al. 2016; Sonnenberg et al. 2019), whereas, great tits are non-hoarding opportunists that benefit more from innovativeness and exploiting novel food sources compared to other parid species (Sasvári 1979, Urhan et al in prep.).

Learning effect

After the birds successfully solved one task, most of them consistently solved it afterwards, with their solving latency becoming shorter with each consecutive session. This suggests that the birds were learning over the four trials, memorizing the correct solution and improving with experience. Interestingly, even the birds tested in 2016-2017 showed this learning pattern, despite the fact that there were sometimes longer gaps (up to 12 days) between two trials of the same type. There were only three birds that failed to solve a string-pulling session after being successful in a previous session, due to trying to pull the string too vigorously and shaking the worm out. Similarly, there were three birds that lost the food reward after successfully solving the plug-opening test, and then did not solve the task in consecutive trials; presumably, losing the worm served as negative reinforcement in these cases. Other studies on the string-pulling ability of great tits found similar learning patterns. When Cole et al. (2011) recaptured 47 birds and performed the string-pulling test on them for a second time, 27 birds (57.4%) were successful. Vince (1956) trained some of the unsuccessful birds with strings of increasing length; this training was successful for 4 out of 9 individuals (44.4%). These results all support the idea that great tits have good learning abilities, which was also demonstrated in other types of cognitive tasks (Sasvári 1979; Brodin and Urhan 2015; Urhan et al. 2023).

Relationship between the two tests

We found only a weak tendency that birds that solved the string-pulling task were more likely to also solve the plug-opening task. This means that although there were some birds being overall better and others being overall worse at solving problems in our sample, the solving success in our tests appeared to be task-specific rather than a general cognitive ability. This is contrary to Cole et al. (2011), who found a relatively strong relationship between solving success in the string-pulling task and a cognitive test in which the great tits could access a food item on a platform by pulling a lever. This may be because this lever-pulling task may require more similar motor and cognitive skills to the string-pulling task than the plug-opening task. By contrast, Preiszner et al. (2017) found no correlation between solving success in an obstacle-removal task and a lid-opening food acquisition task. This lack of correlation may be due the very different rewards in the two tests (food items versus access to nestlings) and thus had very different motivational drives, unlike Cole et al. (2011) and our study where the reward was food in all tests.

Effects of urbanization

Our initial prediction that urban individuals will be better problem-solvers than their rural conspecifics was not supported by our results. This appears to be contrary to a number of studies that found that urban animals perform better than rural animals in cognitive tasks (Liker and Bókony 2009; Sol et al. 2011; Audet et al. 2016; Kozlovsky et al. 2017; Solaro et al. 2019; Chow et al. 2021; Mazza and Guenther 2021), including two other studies on great tits (Preiszner et al. 2017; Grunst et al. 2020). A possible explanation for this discrepancy is that the latter two studies were performed on breeding pairs in the wild, whereas we performed our study indoors on wild-caught birds. Animals in captivity may perform either better (Morand-Ferron et al. 2011; Benson-Amram et al. 2013) or worse (McCune et al. 2019) in cognitive tasks compared to their wild conspecifics. Indoor environment is more standardized and easier to control, which may remove differences due to environmental conditions. For example, in forest habitats, a high abundance of insects may decrease the birds’ motivation to solve a food extraction task compared to urban habitats with lower insect abundance (Preiszner et al. 2017), whereas our captive birds were kept on an identical seed diet in order to make them equally motivated.

Alternatively, the lack of urban-rural differences in our and some other studies (Papp et al. 2015; Cook et al. 2017; Morton et al. 2023) and the better performance of rural animals in a few others (Prasher et al. 2019; Johnson Ulrich et al. 2021) can be explained by the poor nutritional conditions (Seress et al. 2018) and other forms of environmental stress and that has been suggested to occur in urban habitats (Birnie-Gauvin et al. 2016). Physiological condition and stress both can affect problem-solving performance (Bókony et al. 2014; Cook et al. 2017; but see Grunst et al. 2020), counteracting the stronger necessity for cognitive abilities to cope with such habitats.

Differences between years

In both problem-solving tasks, the factor with the strongest effect was the year in which the experiment was performed. Besides a temporal effect, this variable also encompasses an observer effect (one observer in 2015, another one in 2016-2017 and a third one in 2021 and 2022) and differences in methodology (Table 2, Online Resource 1). String-pulling success was higher in each following year than the previous one. While it is theoretically possible that our study populations have gradually become better at the string-pulling task over the course of the years, it is rather unlikely that they would have encountered problems similar to the string-pulling test in their natural habitats. Alternatively, the between-year differences could be explained by differences between the environmental (e.g. weather) conditions that the birds experienced before getting captured. However, these would have had the most apparent effect on juveniles, the majority the birds in our study (45 out of 66) were adults.

Instead, the most likely explanation is that problem-solving success was affected by the slight differences in experimental methodology between years. For example, the birds could solve the string-pulling task easier if they could perch on the edge of the plastic tube into which the string was hanging rather than the perch to which the string was tied. In 2015-2017, the string was hanging into a thin-walled plastic tube made from two plastic cups and mounted on a wooden frame, which was somewhat unstable and difficult to perch on. By contrast, in 2021-2022, it was hanging within a sturdy plastic cylinder, firmly mounted on a ceramic dish, providing a rather stable surface the bird could perch on and hold the string to (even after we added a thinner plastic rim in 2022), explaining their greater problem-solving success.

In the plug-opening task, the birds from 2021 had much worse problem-solving success (42%) than from 2015 (80%), 2016-2017 (80%) or 2022 (100%). There were several differences in the experimental methods potentially explaining this between-year variation. Most notably, in 2021, the plugged tube was somewhat bigger and made of plastic, whereas in the other years it was a slightly smaller tube made of glass. This could have affected the visibility of the mealworm and thus the motivation of the birds, as well as the size and the resistance of the cotton plug and thus the effort required to solve it. Furthermore, in 2015 and 2022 all perches other than the one next to the plugged tube were removed, therefore the birds were physically forced to spend time in the proximity of the device, whereas in 2021, there were other perches in the cage, providing more opportunities for the birds to not to interact with the device. However, other perches were also available in 2016-2017, meaning that this latter methodological mismatch, by itself, cannot explain all observed differences between years.

Replicability and comparability between studies

The differences in solving success depending on small differences in our lab highlight a more general problem about the replicability of behavioural experiments. Replicability is an important measure of the reliability of scientific studies, yet behavioural experiments are seldom repeated, and when they are, they often yield different results (Brecht et al. 2021; Farrar et al. 2021). It is important to differentiate between conceptual and direct replications. The former is when the same research question and hypothesis are tested with different methods. Despite the variation in methodology, these studies are often used for within- and between-species comparisons (Kabadayi et al. 2016; Kabadayi et al. 2017; Isaksson et al. 2018), and often form the basis of meta-analyses. It is perhaps unsurprising that these comparisons are often inconclusive: for example, Vincze and Kovács (2022) found large heterogeneity in their meta-analysis of studies comparing cognition of urban and rural conspecifics, but it is unclear whether that is due to differences between study species or differences between methodologies.

By contrast, a direct replication is when a study tries to replicate an earlier study’s exact experimental methods. In our case, reprising the 2015-2017 experiment in 2021-2022 could be seen as an attempt at a direct replication. These are uncommon, partly because they lack novelty, which reduces their publishability (Brecht et al. 2021), and partly because they are often difficult to perform, especially across labs but sometimes even within the same lab. This is often due to logistic constraints (e.g. it is difficult to sample the exact same populations, or the exact same equipment is not available) or inadequate communication (e.g. the methods are not described in enough detail). The differences we found between years in both problem-solving tests suggest that fine methodological details which are often overlooked, such as the size and material of the test devices, can affect the results.

On the other hand, slight changes to the experimental protocol can also work as a refinement of the methods. Regarding the string-pulling test, we improved upon the methods used by Vince (1956), Thorpe (1956) and Cole et al. (2011), at the same time as we tried to improve the experimental protocol over the years when we encountered problems. The increase in problem-solving success across years that we found in the string-pulling test indicates that we managed to improve the protocol of this particular test. This suggests that the cognitive performance of animals may be underestimated due to experimental methods less suitable for the species.

Overall, the strong difference between study years, despite our effort to keep the methodology consistent across years and experimenters, has an important implication: if experiments on the same populations with only small, seemingly inconsequential differences in methodology can yield such different results, then studies performed by different researchers at different labs, on different species or different populations of the same species, at different geographical locations, using non-identical experimental protocols must be even harder to compare. Therefore, we need to always be careful when drawing conclusions from such comparisons between studies.

Furthermore, we would like to emphasize the importance of a detailed description of the experimental protocol, and advise other authors not to exclude certain details from their methods description just because they are subjectively assumed to be irrelevant. In this digital age, video recordings are easy to make and share, which can be a helpful visual aid when replicating experiments. These methodological details must not be overlooked if we want to make meaningful generalizations and between- and within-species comparisons.

Competing interests: We declare that we have no competing financial or non-financial interests.

Acknowledgements: We thank Veronika Bókony, Hwei-Yen Chen, and Masahito Tsuboi for their advice on statistical analyses.

The project was funded by the following grants: MSCA-2019 SE 2021-01102 by the Swedish Innovation Agency (Vinnova), NKFI-PD-134958 by the National Research, Development and Innovation Office of Hungary, both awarded to EV, and 2020-00719 by the Swedish Research Council (Vetenskapsrådet) awarded to UU.

Author contributions: Conceptualization and design: Anders Brodin, Utku Urhan and Ineta Kačergytė; capturing the subjects: Anders Brodin, Utku Urhan; experiments, data collection: Ernő Vincze, Ineta Kačergytė, Juliane Gaviraghi Mussoi; statistical analysis: Ernő Vincze; figures: Ernő Vincze, Juliane Gaviraghi Mussoi; writing: Ernő Vincze, Ineta Kačergytė, Anders Brodin; editing and approval of final draft: all authors.

Ethical approval: The study complies with Swedish and EU animal welfare legislations and regulations. We performed the study under permit M-106-13 (2015-2017) and 4716/2018 (2021-2022) from the Malmö-Lund ethical permit board for animal experiments.

Funding: The project was funded by the following grants: MSCA-2019 SE 2021-01102 by the Swedish Innovation Agency (Vinnova), NKFI-PD-134958 by the National Research, Development and Innovation Office of Hungary, both awarded to EV, and 2020-00719 by the Swedish Research Council (Vetenskapsrådet) awarded to UU.

Andersen CR, Wolf J, Jennings K, Prough DS, Hawkins BE. 2021. Accelerated Failure Time Survival Model to Analyze Morris Water Maze Latency Data. J Neurtrauma. 38:435–445. doi:10.1089/neu.2020.7089.
Audet J-N, Couture M, Jarvis ED. 2023. Songbird species that display more-complex vocal learning are better problem-solvers and have larger brains. Science (80- ). 381(September):1170–1175. doi:10.1126/science.adh3428.
Audet J-N, Ducatez S, Lefebvre L. 2016. The town bird and the country bird: problem solving and immunocompetence vary with urbanization. Behav Ecol. 27(2):637–644. doi:10.1093/beheco/arv201.
Benson-Amram S, Holekamp KE. 2012. Innovative problem solving by wild spotted hyenas. Proc R Soc B. 279(August):4087–4095. doi:10.1098/rspb.2012.1450.
Benson-Amram S, Weldele ML, Holekamp KE. 2013. A comparison of innovative problem-solving abilities between wild and captive spotted hyaenas, Crocuta crocuta. Anim Behav. 85(2):349–356. doi:10.1016/j.anbehav.2012.11.003.
Birnie-Gauvin K, Peiman KS, Gallagher AJ, de Bruijn R, Cooke SJ. 2016. Sublethal consequences of urban life for wild vertebrates. Environ Rev. 24(4):416–425.
Bókony V, Lendvai ÁZ, Vágási CI, Pătraş L, Pap PL, Németh J, Vincze E, Papp S, Preiszner B, Seress G, et al. 2014. Necessity or capacity? Physiological state predicts problem-solving performance in house sparrows. Behav Ecol. 25(1):124–135. doi:10.1093/beheco/art094.
Boogert NJ, Reader SM, Laland KN. 2006. The relation between social rank, neophobia and individual learning in starlings. Anim Behav. 72:1229–1239. doi:10.1016/j.anbehav.2006.02.021.
Brecht KF, Legg EW, Nawroth C, Fraser H, Ostojić L. 2021. The status and value of replications in animal behavior science. Anim Behav Cogn.(8):97–106.
Brodin A, Urhan AU. 2014. Interspecific observational memory in a non-caching Parus species, the great tit Parus major. Behav Ecol Sociobiol. 68:649–656. doi:10.1007/s00265-013-1679-2.
Brodin A, Urhan AU. 2015. Sex differences in learning ability in a common songbird, the great tit — females are better observational learners than males. Behav Ecol Sociobiol. 69:237–241. doi:10.1007/s00265-014-1836-2.
Butler SJ, Whittingham MJ, Quinn JL, Cresswell W. 2006. Time in Captivity, Individual Differences and Foraging Behaviour in Wild-Caught Chaffinches. Behaviour. 143(4):535–548.
Chow PKY, Clayton NS, Steele MA, Ashton BJ. 2021. Cognitive performance of wild Eastern gray squirrels (Sciurus carolinensis) in rural and urban, native, and non-native environments. Front Ecol Evol. 9:80. doi:10.3389/fevo.2021.615899.
Cole EF, Cram DL, Quinn JL. 2011. Individual variation in spontaneous problem-solving performance among wild great tits. Anim Behav. 81(2):491–498. doi:10.1016/j.anbehav.2010.11.025.
Cook M, Weaver M, Hutton P, McGraw K. 2017. The effects of urbanization and human disturbance on problem solving in juvenile house finches (Haemorhous mexicanus). Behav Ecol Sociobiol. 71(5):85. doi:10.1007/s00265-017-2304-6.
Croston R, Kozlovsky DY, Branch CL, Parchman TL, Bridge ES, Pravosudov V V. 2016. Individual variation in spatial memory performance in wild mountain chickadees from different elevations. Anim Behav. 111:225–234. doi:10.1016/j.anbehav.2015.10.015.
Davidson GL, Wiley N, Cooke AC, Johnson CN, Fouhy F, Reichert MS, Hera I de la, Crane JMS, Kulahci IG, Ross RP, et al. 2020. Diet induces parallel changes to the gut microbiota and problem solving performance in a wild bird. Sci Rep. 10:20783. doi:10.1038/s41598-020-77256-y.
Exernová A, Štys P, Fučíková E, Veselá S, Svádová K, Prokopová M, Jarošík V, Fuchs R, Landová E. 2006. Avoidance of aposematic prey in European tits (Paridae): learned or innate? Behav Ecol. 18(October):148–156. doi:10.1093/beheco/arl061.
Farrar BG, Voudouris K, Clayton NS. 2021. Replications, Comparisons, Sampling and the Problem of Representativeness in Animal Cognition Research. Anim Behav Cogn. 8(2):273–295.
Garamszegi LZ, Calhim S, Dochtermann N, Hegyi G, Hurd PL, Jørgensen C, Kutsukake N, Lajeunesse MJ, Pollard KA, Schielzeth H, et al. 2009. Changing philosophies and tools for statistical inferences in behavioral ecology. Behav Ecol. 20(6):1363–1375. doi:10.1093/beheco/arp137.
Greenberg R. 2003. The role of neophobia and neophilia in the development of innovative behaviour of birds. In: Reader SM, Laland KN, editors. Animal Innovation. Oxford University Press. p. 175–196.
Griffin AS, Guez D. 2014. Innovation and problem solving : A review of common mechanisms. Behav Processes. 109:121–134. doi:10.1016/j.beproc.2014.08.027.
Griffin AS, Netto K, Peneaux C. 2017. Neophilia, innovation and learning in an urbanized world: a critical evaluation of mixed findings. Curr Opin Behav Sci. 16:15–22. doi:10.1016/j.cobeha.2017.01.004.
Grunst AS, Grunst ML, Pinxten R, Eens M. 2020. Sources of individual variation in problem-solving performance in urban great tits (Parus major): Exploring effects of metal pollution, urban disturbance and personality. Sci Total Environ. 749:141436. doi:10.1016/j.scitotenv.2020.141436.
Horik JO Van, Madden JR. 2016. A problem with problem solving: motivational traits, but not cognition, predict success on novel operant foraging tasks. Anim Behav. 114:189–198. doi:10.1016/j.anbehav.2016.02.006.
Isaksson E, Urhan AU, Brodin A. 2018. High level of self-control ability in a small passerine bird. Behav Ecol Sociobiol. 72:118.
Jacobs IF, Osvath M. 2015. The String-Pulling Paradigm in Comparative Psychology. J Comp Psychol. 129(2):89–120.
Jahn-Eimermacher A, Lasarzik I, Raber J. 2011. Statistical analysis of latency outcomes in behavioral experiments. Behav Brain Res. 221(1):271–275. doi:10.1016/j.bbr.2011.03.007.
Jensen JK, Jayousi S, von Post M, Isaksson C, Persson AS. 2022. Contrasting effects of tree origin and urbanization on invertebrate abundance and tree phenology. Ecol Appl. 32(2):e02491. doi:10.1002/eap.2491.
Johnson Ulrich L, Yirga G, Strong RL, Holekamp KE. 2021. The effect of urbanization on innovation in spotted hyenas. Anim Cogn. 24:1027–1038. doi:10.1007/s10071-021-01494-4.
Johnsson RD, Brodin A. 2019. Wild‐caught great tits Parus major fail to use tools in a laboratory experiment, despite facilitation. Ethology. 125:324–331. doi:10.1111/eth.12857.
Kabadayi C, Krasheninnikova A, Neill LO, Weijer J Van De, Osvath M, Bayern AMP Von. 2017. Are parrots poor at motor self‑regulation or is the cylinder task poor at measuring it ? Anim Cogn. 20(6):1137–1146. doi:10.1007/s10071-017-1131-5.
Kabadayi C, Taylor LA, Auguste MP, Bayern V, Osvath M. 2016. Ravens, New Caledonian crows and jackdaws parallel great apes in motor self-regulation despite smaller brains. R Soc Open Sci. 3:160104. doi:10.1073/pnas.1323533111.
Kozlovsky DY, Branch CL, Pravosudov V V. 2015. Problem-solving ability and response to novelty in mountain chickadees (Poecile gambeli) from different elevations. Behav Ecol Sociobiol. 69:635–643. doi:10.1007/s00265-015-1874-4.
Kozlovsky DY, Weissgerber EA, Pravosudov V V. 2017. What makes specialized food-caching mountain chickadees successful city slickers? Proc R Soc B. 264:20162613. doi:10.1098/rspb.2016.2613.
Lambert ML, Jacobs I, Osvath M, von Bayern AMP. 2019. Birds of a feather? Parrot and corvid cognition compared. Behaviour. 156:505–594. doi:10.1163/1568539X-00003527.
Lee VE, Thornton A. 2021. Animal cognition in an urbanised world. Front Ecol Evol. 9:633947. doi:10.3389/fevo.2021.633947.
Liker A, Bókony V. 2009. Larger groups are more successful in innovative problem solving in house sparrows. Proc Natl Acad Sci U S A. 106(19):7893–7898. doi:10.1073/pnas.0900042106.
Mazza V, Guenther A. 2021. City mice and country mice: innovative problem solving in rural and urban noncommensal rodents. Anim Behav. 172:197–210. doi:10.1016/j.anbehav.2020.12.007.
McCune KB, Jablonski P, Lee S, Ha RR. 2019. Captive jays exhibit reduced problem-solving performance compared to wild conspecifics. R Soc Open Sci. 6:181311.
Morand-Ferron J, Cole EF, Rawles JEC, Quinn JL. 2011. Who are the innovators? A field experiment with 2 passerine species. Behav Ecol. 22(6):1241–1248. doi:10.1093/beheco/arr120.
Morton FB, Gartner M, Norrie E-M, Haddou Y, Soulsbury CD, Adaway KA. 2023. Urban foxes are bolder but not more innovative than their rural conspecifics. Anim Behav. 203(July):101–113. doi:10.1016/j.anbehav.2023.07.003.
Nakagawa S. 2004. A farewell to Bonferroni: the problems of low statistical power and publication bias. Behav Ecol. 15(6):1044–1045. doi:10.1093/beheco/arh107.
Overington SE, Morand-Ferron J, Boogert NJ, Lefebvre L. 2009. Technical innovations drive the relationship between innovativeness and residual brain size in birds. Anim Behav. 78(4):1001–1010. doi:10.1016/j.anbehav.2009.06.033.
Papp S, Vincze E, Preiszner B, Liker A, Bókony V. 2015. A comparison of problem-solving success between urban and rural house sparrows. Behav Ecol Sociobiol. 69(3):471–480. doi:10.1007/s00265-014-1859-8.
Prasher S, Evans JC, Thompson MJ, Morand-Ferron J. 2019. Characterizing innovators: ecological and individual predictors of problem-solving performance. PLoS One.(6):e0217464. doi:10.5061/dryad.s83d4n1.
Preiszner B, Papp S, Pipoly I, Seress G, Vincze E, Liker A, Bókony V. 2017. Problem-solving performance and reproductive success of great tits in urban and forest habitats. Anim Cogn. 20(1):53–63. doi:10.1007/s10071-016-1008-z.
Reader SM, Laland KN. 2003. Animal Innovation. Oxford University Press.
Rodewald AD, Kearns LJ, Shustack DP. 2011. Anthropogenic resource subsidies decouple predator-prey relationships. Ecol Appl. 21(3):936–943. doi:10.1890/10-0863.1.
Rowe C, Healy SD. 2014. Measuring variation in cognition. Behav Ecol. 25:1287–1292. doi:10.1093/beheco/aru090.
Sasvári L. 1979. Observational learning in great, blue and marsh tits. Anim Behav. 27:767–771. doi:https://doi.org/10.1016/0003-3472(79)90012-5.
Seress G, Hammer T, Bókony V, Vincze E, Preiszner B, Pipoly I, Sinkovics C, Evans KLKL, Liker A. 2018. Impact of urbanization on abundance and phenology of caterpillars and consequences for breeding in an insectivorous bird. Ecol Appl. 28(5):1143–1156. doi:10.1002/eap.1730.
Sol D, Griffin AS, Bartomeus I, Boyce H. 2011. Exploring or avoiding novel food resources? The novelty conflict in an invasive bird. PLoS One. 6(5):e19535. doi:10.1371/journal.pone.0019535.
Sol D, Timmermans S, Lefebvre L. 2002. Behavioural flexibility and invasion success in birds. Anim Behav. 63(3):495–502. doi:10.1006/anbe.2001.1953.
Solaro C, Sarasola JH, Aves D, Ciencias F De, Nacional U, Pampa D La. 2019. Urban living predicts behavioural response in a neotropical raptor. Behav Processes. 169:103995. doi:10.1016/j.beproc.2019.103995.
Sonnenberg BR, Branch CL, Pitera AM, Bridge E, Pravosudov V V. 2019. Natural Selection and Spatial Cognition in Wild Food- Caching Mountain Chickadees. Curr Biol. 29(4):670-676.e1-e3. doi:10.1016/j.cub.2019.01.006.
Therneau TM. 2012. coxme: mixed effects Cox models. R package version 2.2-3. Vienna R Found Stat Comput.
Thorpe W. 1956. Learning and Instinct in Animals. Harvard University Press.
Urhan U, Mårdberg M, Isaksson E, Oers K Van, Brodin A. 2023. Blue tits are outperformed by great tits in a test of motor inhibition, and experience does not improve their performance. R Soc Open Sci. 10:221176.
Venables WN, Ripley BD. 2002. Generalized linear models. New York, NY: Springer New York (Statistics and Computing).
Vince MA. 1956. “String pulling” in birds. 1. Individual differences in wild adult great tits. Br J Anim Behav. 4(3):111–116.
Vincze E, Kovács B. 2022. Urbanization’s Effects on Problem Solving Abilities: A Meta-Analysis. Front Ecol Evol. 10(March):824436. doi:10.3389/fevo.2022.834436.

No competing interests reported.

Download PDF

Editorial decision: Revision requested
27 Apr, 2024
Reviews received at journal
23 Apr, 2024
Reviews received at journal
17 Apr, 2024
Reviewers agreed at journal
20 Mar, 2024
Reviewers agreed at journal
20 Mar, 2024
Reviewers invited by journal
18 Mar, 2024
Editor assigned by journal
15 Mar, 2024
Submission checks completed at journal
09 Mar, 2024
First submitted to journal
07 Mar, 2024

You are reading this latest preprint version

Are comparable studies really comparable? Suggestions from a problem-solving experiment on urban and rural great tits

Status:

Version 1

Abstract

Figures

Introduction

Methods

Results

Discussion

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1