Age-Related Memory and Novel Object Avoidance Differences in Family Dogs: Measuring the Validity and Reliability of a Rapid Behaviour Test Battery

The prolonged lifespan of companion dogs has resulted in an increased occurrence of behavioural and physical challenges linked to old age. The development of behavioural tests for identifying and monitoring age-related differences has begun. However, standardised testing requires validation. The present study aimed to assess external validity, interobserver reliability, and test-retest reliability of an indoor test battery for the rapid assessment of age-related behavioural differences in dogs. Two experimenters tested young and old dogs on a rst occasion and after two weeks. Our results found external validity for two subtests out of six. On both test occasions, old dogs committed more errors than young dogs in a memory test and showed more object avoidance when encountering a novel object. Interobserver reliability and test-retest reliability was high. We conclude that the Memory and Novel object tests are valid and reliable for monitoring age-related memory performance and object neophobic differences in dogs.


Introduction
The lifespan of pet or companion dogs has been increasing over the years [1] and, with this, behavioural and physical de cits more common in old age have also become more frequent. In the last decades, research on canine ageing has grown exponentially because both researchers and the public recognised pet dogs' emotional, economical, and scienti c values, in the latter case as an animal model species [2][3][4][5]. According to these studies, owners often report a decline in the visual and auditory function, social behaviour [6][7][8][9][10][11][12], and the sleep/wake cycles [13,14] of ageing pet dogs.
To better understand these phenomena, researchers have developed various behavioural tests measuring the behavioural differences linked to old age in companion dogs. For instance, an open eld test showed that the chronological age of the dogs is linked to their neophilic behaviour: speci cally, young dogs (1-4 years) sniffed and played for a longer time with novel objects compared to older dogs (>9 years) [15]. In a similar study, authors have also observed that, when a stranger entered the experimental area, younger dogs (1-4 years) had more physical interactions with the person, compared to older dogs (> 9 years) [16].
Discrimination and reversal learning tasks were developed for measuring memory, learning and cognitive exibility, showing that younger dogs (1.5-6.5 years) were able to learn faster than older dogs (8. 0-14.5 years) [4]. These results have been further validated using EEG, demonstrating a correlation between sleep spindle (non-REM bursts of activity in the sigma range) intrinsic frequency and the number of reversal learning training trials [17]. Sleep spindles predict learning in dogs, and vary with age [18,19]. A spatial memory task where dogs had to rely on short-term memory to nd food, was also developed, showing that younger dogs (3-6 years) were more e cient than older dogs (9-11 years), committing fewer errors and nding the food more often at their rst attempt. External validation was obtained examining the gut microbiome of the dogs, demonstrating that worse memory performance (more errors) was associated with a higher proportion of Actinobacteria in their faeces [20] in agreement with the high abundance of some Actinobacteria in the gastrointestinal tract of patients with Alzheimer 's disease [21].
Several authors observed a signi cant difference between young and old individuals during discrimination and reversal learning tasks [4,31,32].
Despite the widespread use and the importance of behavioural testing for ageing research, some shortcomings have been identi ed. Some of these tests require long training and cannot be repeated over a short time (e.g., [4]), which makes it impossible to use them for monitoring age-related behaviour changes in a longitudinal design [33]. Others rely on social interaction [16,34], which may be perceived differently by different dogs depending on the partner. For example, test accuracy may be undermined by the different responsiveness of dogs towards male and female experimenters. Previous research indicates that shelter dogs show a stronger decrease in defensively-aggressive behaviours (tendency to look, barking) towards women [35], lower levels of plasma cortisol and more relaxed posture when petted by women [36], as well as more stress-related behaviours (tendency to look, shorter tail-high periods, liplicking) when walked on a leash by men [37]. The in uence of human gender on behaviour has been understudied in companion dogs [38,39] and, so far, has never been accounted as a potential confounding factor in eld tests aimed to assess age-related behavioural differences. Finally, cognitive tests designed to measure positive affective states have replicability issues and may not be reliable in ageing dogs, due to the learning required: for example, studies based on the cognitive bias test, a test for mood based on discrimination choices, showed that older dogs may struggle to learn the discrimination and therefore it may not be possible to test them [4,40]. Clinicians still need standardised testing for positive emotions in senior animals.
This study aimed to investigate 1) the reliability (interobserver, inter-experimenter, test-retest) of the MMT [2], and 2) to adapt it to indoor settings to have a controlled environment with limited distractions. The MMT demonstrated content and construct validity (internal validity) and a good degree of external validity [2,20]. However, inter-observer and intra-experimenter agreement as well as test-retest reliability are yet to be studied. Therefore, we modi ed the protocol to include two experimenters (a woman and a man) and compared the dogs' behaviour in the two situations with the different experimenters. They and an independent observer also coded the dogs' behaviour for calculation inter-observer reliability. We added a new test to the battery for assessing spatial memory and neophilia, the Novel object recognition test (NOR) [41][42][43]. The test is widely used with murine models, but to our knowledge, has not been applied to dogs yet [41]. Finally, we tested both old and young dogs and compared the performance of the dogs on the rst occasion (T0) and after one to two weeks (T1) to measure test-retest reliability. Owners provided written consent to their voluntary participation. We took special care to ensure that the consent process was understood completely by the dog owners. In the Consent Form, participants were informed about the identity of the researchers, the aim, procedure, location, expected time commitment of the experiment, the handling of personal and research data, and data reuse. The owners were not informed about the exact aim of the tests. The information included the participant's right to withdraw their consent at any time. Participants could decline to participate at any point and could request for their data not to be used and/or deleted after they were collected. Our Consent Form was based on the Ethical Codex of Hungarian Psychologists (2004).

Subjects
A total of 38 dogs was recruited through the Department of Ethology, ELTE's database of participants, social media, and word of mouth. Two groups of dogs were formed based on their age: 'young dogs' (N = 20, median age = 3 years, IQR = 2.50 -4.00, females = 50%, neutered= 65%), and 'old dogs' (N = 18, median age= 11 years, IQR = 10.62 -12.88, females = 33%, neutered= 78%). Age categories were based on previous ndings regarding the onset of cognitive decline (see [44,45] for a review). The sample included 14 mix-breeds and 24 pure breeds from 16 different types of breed (see Table S1 for full demographic information). For both groups, the dogs had to be free from overt signs of distress and/or pain during the test.

Procedure
The study was performed indoors at the Department of Ethology, ELTE. Two tablets (Samsung Galaxy Tab S2), positioned at opposite corners of the room, recorded the behavioural performances ( Fig. 1.).
The battery consisted of six indoor subtests (Fig. 1). An experimenter was present in the room, with the owner on his/her left-hand side and a coder, who coded some of the tests live, on his/her right-hand side. The owner kept the dog on the leash unless instructed differently.
The dogs underwent the same test twice (T0 = rst test, T1 = second test, after 1 to 2 weeks). Different objects were used when the dogs had to be naïve to a certain object (see Supplementary Material). The same experimenter and coder performed the test on both occasions. Half of the dogs were tested by a male experimenter the other half was tested by a female experimenter. The allocation to each experimenter was counterbalanced across dogs (see Table S2).
The behavioural variables are presented in Table 1.

Exploration
The goal of this test was to measure dogs' activity level and interest in investigating a novel environment and spontaneous activity [2,46,47]. The owner walked in the room with the dog on the leash and stayed in a pre-determined position (Fig. 1a) for one minute, while reading a paper given by the experimenters, to avoid looking at or talking to the dog.

Greeting
We aimed to measure the sociability of dogs toward unfamiliar friendly people [2,47]. The experimenter entered the room greeting the dog (Fig. 1b). If the dog approached the experimenter, the interaction continued in a standardised way (see Supplemental Material), up to a ball or tugging game.

Novel object recognition (NOR)
Our goal was to measure neophilia behaviour [48] and short-term memory. Dogs were presented, in a predetermined order (Table S2), with two pairs of containers with different shapes and colours (Fig. S1). After one minute to explore them, the dog was taken out of the room (Fig. 1c). The experimenter swapped the containers with a new pair, where one container was identical to the rst one and the second container had a novel shape and colour. The dog-owner dyad re-entered the room and the dog had one more minute to explore the containers. The position of the novel container and the types of containers were pseudo-randomised and counterbalanced between dogs and between T0 and T1 (Table S2)

Problem box
We aimed at measuring the dogs' persistence. The dog was presented with the food toy Kong wobbler ( Fig. 1d), lled with 20 pieces of dry food, and had one minute to try and retrieve the food by pushing around the toy, which makes the food drop from a small hole ('solvable task'). Then the experimenter lled the toy again but with one large piece of dry meat, which was too big to get through the hole, so it was not possible to retrieve the food ('unsolvable task'). The dog was given the toy for one more minute.

Memory
The goal of this test was to detect differences in the dogs' short-term spatial memory. The dogs were presented with ve identical containers (Fig. S2) placed in a semi-circle (Fig. 1e). The experimenter placed a piece of food in one of the containers, which the dog was allowed to retrieve after a break outside of the room, according to the procedure describe in Piotti et al. [3]. The procedure was repeated ve times, one per container, and the order of the baited container's location was counterbalanced and pseudorandomised across participants, and varied between T0 and T1 (Table S2). In addition, at the end of T1 the dogs were presented with three additional trials ('Control Trials') where the location of the baited container was changed while the dog could not see. This was done to exclude the possibility that the dogs follow odour cues in this test.

Novel object (toy dog)
The objective of this test was to measure dogs' neophilia and neophobia. The dogs were presented for 30 seconds with an electronic, moving, toy dog ( Fig. 1f) placed on the ground by the experimenter, according to the procedure described in Kubinyi and Iotchev [2]. Two toys, identical in shape and rough movement, but different in colours and sound, were used at T0 and T1; the order was counterbalanced across dogs ( Fig. S1 and Table S2).

Statistical analysis
Analyses were performed with the R statistical software [49]. The packages psych [50], ordinal [51], and lme4 [52] were used for the analysis. Cumulative linked mixed models (CLMMs) were calculated to analyse score data. Cauchit link function was used for the activity level, probit link function for the social interaction, and LogLog link function for the object manipulation variables.
Generalized linear mixed models (GLMMs) were used to analyse count data. The recognition index, memory errors, object avoidance, and object interaction variables had Poisson error distribution, the neophilic behaviour had binomial error distribution.
For each model, we initially created a global model including all the variables of interest as xed factors, with no interactions, and the dog as random factor. Each global model included 'age group' (old vs young), 'test number' (test vs retest), and 'experimenter' (A vs B) as xed factors. The model for the predictor 'object manipulation' also included the variable 'test phase' (solvable vs unsolvable). The models for the predictors 'errors' and 'spatial memory' included 'trial' (1 to 5). The main factors 'age group', 'test number' and 'experimenter' were maintained in all models as part of our main hypothesis, while for all other factors we adopted a stepwise approach to select the most parsimonious model to describe the variance of each response variable. Pairwise post-hoc comparisons with Tukey correction were then obtained.
We used a Wilcoxon signed rank test to compare the proportion of times the dogs chose the baited container in trials 1, 3, 5 at T1 with the proportion during the corresponding control trials in the Memory test.
Finally, an independent coder (AS) coded 20% of the video material (16 tests out of 76, from both T0 and T1); interobserver reliability was assessed using inter-observer agreement (kappa) for scores, Cronbach's alpha for binary data, and Spearman correlations for count data.

Age group
Young and old dogs differed in four variables of three tests ( Table 2). Young dogs were more likely to interact with the novel object rst during the NOR; they chose fewer incorrect containers in the Memory test (Fig. 2, Fig. S3); they interacted for a longer time with the toy dog and showed shorter avoidance behaviour in the Novel object (toy dog) test (Fig. 3, Fig. S4).

Test-retest
Dogs' behaviour differed in four variables of three tests. On the second occasion, social interaction scores were higher than in T0 in the Greeting test, i.e. the dogs went closer and sooner to the experimenter. The Recognition Index and the neophilic behaviour were lower compared to T0 in the NOR test, indicating that the dogs spent less time investigating the novel object as they were less interested in it. The spatial memory scores were also lower in the Memory test, i.e. the dogs found the baited container less frequently during the second occasion (Table 2).

Experimenter
There was no signi cant difference between dogs tested by the male and the female experimenter (all p > 0.05; Table 2).

Control trials in the Memory test
The dogs were more successful in choosing the baited container when it was in the location they had witnessed during T1 (Trial 1: p = 0.008; Trial 3: p < 0.001; Trial 3: p = 0.003), compared to the location it was moved to during the control trials (see Table S3 for statistical details).

Inter-observer agreement
Inter-observer agreement (kappa), Chronbach's alpha, and Spearman correlations indicated excellent agreement between coders as all values were equal to 1, and all p < 0.001 (Table S4). Table 2 The results of three cumulative linked mixed models (CLMMs) and six generalized linear mixed models (GLMMs). Signi cant differences are bolded. a estimate is given as ratio; b estimate is given as odds ratio; † N = 1 dog (Irisz) was excluded due to a technical issue with the camera; Asteriks indicate signi cant differences: * = p < 0.05, ** = p < 0.01, *** = p < 0.001.

Discussion
The rst goal of this study was to measure the reliability of a battery of six indoor behavioural tests for the rapid assessment of behavioural differences between young and old companion dogs. Our results indicate that the variables object avoidance in the Novel object (toy dog) test and the errors in the Memory test are reliable and can be used to monitor age-related modi cations in companions dogs. Both measures were unaffected by the experimenter's identity or re-testing. Furthermore, these variables were associated with good inter-observer reliability (see Supplementary Material), con rming that the test coding was well standardised.
The second aim was to reproduce the results obtained in our previous studies [2,3] in an indoor setting.
The Novel object (toy dog) test con rmed the large effect of age previously observed [2]. Younger dogs were much less avoidant than older animals.
Avoidance is a behavioural manifestation of fear or anxiety [53], which is well known to increase in dogs as they age [13]. Age-related changes in the regulation of emotions in dogs are thought to depend on the degeneration of the amygdala, causing increased sensitivity to positive stimuli [7]. However, anxiety in senior dogs may be caused by multiple reasons, such as central or peripheric neuropathology, sensory decline [6], metabolic, gastrointestinal or urogenital disease, dermatological conditions, pain [13], or underlying behaviour problems which aggravate with time [54][55][56].
During the same test, it has also been shown that younger dogs spent more time interacting with the toy dog, compared to older dogs. Persistence in interaction with objects might depend on differences in motivation [57], which may decline in ageing dogs due to cognitive or physical changes. Moreover, dogs may have also perceived the interaction with the object as a playing activity, thus indicating a stronger inclination to playfulness in younger individuals. Playing is a sign of positive emotional states [58-60], which are fundamental for the quality of life of the individual and should therefore be monitored in senior dogs [61]. Nevertheless, despite a signi cant difference between young and old dogs, according to our results, the variable Object interaction in the Novel object test displays a re-test effect, therefore this variable should not be coded over time.
We also reproduced previous ndings related to the measurement of short-term spatial memory in dogs [2,3], thus con rming the e ciency of our test in detecting age-related differences in an independent population of companion dogs. Speci cally, older dogs performed worse in the memory test, choosing the wrong locations, compared to younger dogs. The control trials excluded the possibility that the dogs followed odour cues during the test, therefore we can conclude that they relied on visual information they had gathered during the rst part of the test, to nd the hidden food in the second part of the test. These ndings make the Memory test a reliable and valid behavioural test, which could be used to monitor longitudinal changes in dogs' spatial memory.
The Memory test had already demonstrated both convergent and external validity: a correlation between errors in a memory test and the canine gut microbiome composition was observed [20]. This nding has a large practical impact on the welfare of dogs, allowing veterinarians and other animal professionals to perform a standardised, reliable, valid, practical test to monitor an important cognitive skill as the dogs age. Such tests are fundamental to distinguish between normal and pathological ageing [45], as well as Furthermore, for all the other tests, we did not reproduce the previous ndings. Although Kubinyi and Iotchev [2] detected a small effect in the problem box task in an outdoor setting, the present study suggests that, even if the test appears to be consistent over time, it seems to have no construct validity for ageing. Young and old dogs manipulated the object similarly thus the test may not be consistently effective in detecting age-related behavioural modi cations in companion dogs. Similarly, during the exploration and greeting, we did not detect a signi cant difference in social interaction between young and old dogs and this variable seems to be affected by the repetition of the test. Therefore this variable should not be considered reliable and suitable for longitudinal evaluations of ageing, at least not in indoor tests. Activity levels were consistent between T0 and T1 but old and young dogs' performances did not vary signi cantly, meaning that dogs' exploratory behaviour is not an effective measure of ageing.
According to these ndings, the Exploration and Greeting tasks should not be employed to monitor agerelated differences in companion dogs.
This battery of tests presented for the rst time a paradigm for the novel object recognition (NOR) in family dogs. Contrary to what is largely observed in other species, such as murine models [41][42][43], we did not observe a difference between younger and older dogs in the standard measure of the recognition of the novel object, such as the Recognition Index. However, the older dogs demonstrated a small decline in their neophilic behaviour (i.e. fewer dogs explored the novel object rst). While this difference is not predictive of a cognitive decline, it is in line with the ndings on object interaction suggesting a decline in curiosity and motivation towards objects.
Although some of the tests of the battery have detected behavioural differences positively associated with ageing and were found to be reliable, it must be pointed out that the present results are restricted to a speci c population of companion dogs. Firstly, the subjects tested in this study did not suffer from any overt medical conditions and, therefore, the present results re ect age-related behavioural modi cations associated with a clinically healthy ageing process. Further investigations may help evaluate how and to what extent, during the tests, certain pathologies could in uence senior dogs' behavioural performances, thus developing an assessment tool to diagnose medical conditions. Secondarily, both groups of dogs were medium to large-sized. It is well known that the ageing process is strongly associated with body size, as small-sized dogs age slower than large-sized dogs [45]. It is not yet clear whether the subject's size is a confounder during the executions of tests aimed to assess age-related behavioural modi cations. Since dogs' body size has never been taken into account so far, future studies should also evaluate the potential effect of this factor to assure a high external validity.

Conclusions
It is often very di cult to clinically separate medical and behavioural conditions in senior dogs [13,56]. The presence of pathologies, such as cognitive impairment, is usually related to a modi cation of behaviours (i.e. disorientation, altered interactions, anxiety), often di cult to quantify both for the owners and clinicians [13,56]. Moreover, factors such as breed and individual differences may further confound the correlation between behavioural modi cations and speci c clinical conditions [13]. For these reasons, standardised behavioural tests are particularly useful as they may aid the diagnosis and monitoring of age-related changes in dogs, allowing to make a clearer distinction between healthy and pathological ageing processes.
Overall, the current ndings indicate that two tests with two variables for the assessment of age-related differences in companion dogs, namely the 'errors' in the Memory test and 'object avoidance' in the Novel object (toy dog) test. These variables have good interobserver and inter-experimenter agreement, as well as test-retest reliability. Taking into account previous research, it is possible to say that the Memory test may be considered both valid and reliable. The Novel object (toy dog) test also appears to be reliable and demonstrates good external validity; further studies should be granted to investigate its internal validity.
Since the tests appear to be consistent over time, future research should focus on the longitudinal use of such tests for the monitoring of age-related changes in dogs. Figure 1 Behavioural tests of the test battery. a) Exploration, b) Greeting, c) Novel object recognition, d) Problem box e) Memory, f) Novel object (toy dog).

Figure 2
The number of errors in the memory test. On average, the old dogs made more errors in the memory test compared to the young dogs. A breakdown of the number of errors made in the memory test, divided by group, is presented in the gure. The middle line in the box plots represents the median number of errors, the extremes of the boxes represent the lower and upper quartiles, and the error bars represent the minimum and the maximum number of errors. The asterisks indicate a statistically signi cant difference between the groups (** = p < 0.01).