Human ageing is associated with more rigid concept spaces

Prevalence-induced concept change describes a cognitive mechanism by which someone’s definition of a concept shifts as the prevalence of instances of that concept changes. While this phenomenon has been established in young adults, it is unclear how it affects older adults. In this study, we explore how prevalence-induced concept change affects older adults’ lower-level, perceptual, and higher-order, ethical judgements. We find that older adults are less sensitive to prevalence-induced concept change than younger adults across both domains. Using computational modeling, we demonstrate that these age-related changes in judgements reflect more cautious and deliberate responding in older adults. Based on these findings, we argue that while overly cautious responding by older adults may be maladaptive in some cognitive domains, in the case of prevalence-induced concept change, it might be protective against biased judgements.

By 2068, almost 30% of the Canadian population will be 65 years or older, and this number is expected to rise further over time across North America (U.S. Census Bureau, 2018; Statistics Canada, 2019). As adults age, their judgments and decisions will affect society more than ever and will largely influence our collective future. As such, it is critical to understand how the cognitive processes underlying judgement and decision-making change with age.
In the current study, we explore how changes in one's environment affect concept formation and judgements in younger and older adults. Specifically, we consider how judgements about perceptual and ethical concepts are affected by the prevalence of instances of them in the environment. This phenomenon has been referred to as prevalence-induced concept change (Levari et al., 2018). Technically speaking, prevalence-induced concept change describes the empirical observation that as the number of instances of a given concept decrease in the environment, the boundaries for that concept expand. For example, Levari et al. (2018) asked participants to serially judge whether individual dots that varied on a spectrum between blue and purple were in fact blue or purple. When the relative frequency of objectively coloured dots in the environment is consistent across the task (50% blue dots, 50% purple dots), peoples' judgements were relatively stable: Blue dots were judged as blue and purple dots as purple. However, when the number of blue dots decreased, dots initially judged as purple were later (after the prevalence changed) categorized as blue. Put simply, when the prevalence of instances of a concept in the environment changed, so did the boundaries of that concept.
What these findings suggest is that, from judgement to judgement, people adjust their decisions about concepts to changes in the environment. From a theoretical perspective however, there are compelling reasons to think that older adults differ cognitively from young adults in terms of how they make such decisions. Abundant evidence already suggests that older adults differ from young adults in terms of motivation, postponement of gratification, and to what degree they value desired outcomes (Eppinger et al., 2011;Eppinger et al., 2017;Mather & Harley, 2016;Samanez-Larkin & Knutson, 2015). These differences in decisionmaking processes can in part be explained by age-related differences in cognitive ability, such as changes in executive function (Mayr et al., 2001), memory (Nyberg et al., 2012), and processing speed (Salthouse, 1996) that occur naturally with healthy ageing.
These age-related cognitive changes offer good reason to suspect that older adults might be differentially affected by prevalence-induced concept change compared with younger adults. Specifically, two lines of research paint opposing pictures of how older adults might make different concept judgements than younger adults when the prevalence of instances of a concept in the environment changes.
On the one hand, older adults have been shown to utilize a cautious and conservative decision style when making binary choices (Starns & Ratcliff, 2010;Theisen et al., 2021). This decision style often manifests in longer response times, and increased decision consistency (i.e., higher accuracy in the context of choices that can be correct or incorrect; Starns & Ratcliff, 2010), or more perseverative behaviour in nonbinary choice (i.e., a tendency to repeat previous responses despite changes in the environment; Bruckner et al., 2020;Bolenz et al., 2019;Nassar et al., 2016). With respect to the current study, these findings suggest that older adults should exhibit a decreased sensitivity to prevalence-induced concept change, for two reasons. First, the repetition of past choices would make it less likely that a rarer category will be chosen after a shift in prevalence. Second, given that contextual effects in sequential judgements are known to decay over time (Hammer, 1949;Krauskopf, 1954), older adults' propensity to trade speed for accuracy may reduce the impact past stimuli exert on subsequent judgements, above and beyond the effects of general slowing (Salthouse, 1996).
On the other hand, results from several recent studies suggest that older adults have difficulty converging on an accurate representation of the current state, particularly if doing so is difficult (i.e., if these states are latent-not directly observable-and need to be inferred from experience; Bolenz et al., 2019;Hämmerer et al., 2014;Hämmerer et al., 2019;Ruel et al., 2022). To help compensate for this difficulty in distinguishing task states, older adults may outsource control to the environment rather than relying on (sometimes inaccurate) internal representations (Lindenberger & Mayr, 2015;Mayr et al., 2015;Spieler et al., 2006). Such outsourcing might manifest as an increased reliance on the environment to dictate responses, which, in the context of the current study, would be reflected in a greater impact of past stimuli on current judgements and a decision bias away from internal representations of concepts (see Wilson, 2018)-counteracting the caution described in the previous paragraph.
Taken together, the current literature points to two opposing hypotheses: H1: Older adults are less sensitive to prevalence-induced concept change than are younger adults. H2: Older adults are more sensitive to prevalence-induced concept change than are younger adults.
In this study, we aim to tease these hypotheses apart and gain a better understanding of the cognitive mechanisms underlying age-related changes in prevalence-induced concept change. We do so in two steps. First, we use an agecomparative study design to investigate age differences in prevalence-induced concept change in lower-level-perceptual-and higher-level-moral-judgements. We show that, across domains, older adults are less susceptible to prevalence-induced concept change than younger adults. Second, we fit participants' responses on the perceptual task with a hierarchical drift-diffusion model (HDDM;Wiecki et al., 2013) to explore the mechanism(s) that drives this reduced sensitivity to concept changes. Consistent with our first hypothesis, we find that age difference in prevalenceinduced concept change are associated with more cautious and deliberate responding on the part of older adults. We conclude by arguing that while such cautious responding may be maladaptive in some cognitive domains, in the case of prevalence-induced concept change, it might be protective against biased decision-making.

Participants
We recruited 160 participants from the community and the university participation pool, 80 of which were older adults (M age = 70.10 years, S age = 5.55; 69% women) and 80 of which were younger adults (M age = 21.85 years, S age = 2.27; 69% women). All participants were English-speaking, free of neurological or psychiatric disorders, and free of any cognitive, motor, visual, or other condition(s) that would impede their performance, including but not limited to a history of head trauma with loss of consciousness, organic brain disorders, seizures, or neurosurgical intervention, to sensory deficits (i.e., deafness, blindness, colourblindness, intellectual disability), or self-reported cognitive impairment, and to a recent history of substance abuse. In each age group, 40 participants were randomly assigned to either the decreasing prevalence condition or the stable prevalence condition. In the former, they experienced a decreasing prevalence of instances of the concept in both tasks detailed below. In the latter, the prevalence remained the same throughout the entire experiment. All participants were compensated $20 CAN or two participation pool credits for participating in the study. The study protocol was approved by the Concordia Human Research Ethics Committee (Certification No. 30011191). Sample sizes were based on Levari et al. (2018) data and the stopping rule for data collection was to collect until the end of the semester and group sizes were even.

The dots task
In the dots task, participants had to judge the colour of an individual dot presented on the screen. The task began with a series of instruction screens explaining the task to the participant. These instructions were followed by a practice block consisting of 10 trials, in which participants could familiarize themselves with the task. These trials were identical to trials in the real task and consisted of 50% purple dots and 50% blue dots. Data from practice trials were not analyzed.
After the practice block, participants performed a total of 800 trials, divided into 16 blocks of 50 trials each. In the decreasing prevalence condition, the number of blue dots in the environment decreased as the number of blocks increased in a predetermined fashion. Specifically, the proportion of blue relative to purple dots, was as follows for each of the 16 blocks: .50, .50, .50, .50, .40, .28, .14, .06, .06, .06, .06, .06, .06, .06, .06, .06. In the stable prevalence condition, the proportion of blue dots in the environment remained the same (.50) across the experiment. In both cases, blue dots were defined as any dot who's RGB value was between [0, 0, 254] and [49,0,205]. Purple dots were defined as any dot who's RGB value was between [50,0,204] and [99,0,155]. Red and blue values were sampled uniformly together (each increased or decreased jointly to represent blue or purple dots) and green values were always zero. Dot colours were randomly chosen for each trial based on the number of trials per block (50) and the frequency with which blue and purple dots should appear in a given block.
In each trial, participants judged the color of the dots as being either blue or purple by pressing the 'A' or 'L' key on the keyboard. All stimuli were presented against a darkgrey background. Each trial started with a dot presented on the screen for 500 ms, followed by a question mark that appeared on the screen until participants made a choice, and finally a blank screen appeared for 500 ms as an ITI. Thus, all timing was fixed across participants, except that which would arise from differences in response times. After each block text appeared that indicated that the block was finished, which block the participant was now at, and offering them a short break should they choose to take one.

The ethics task
In the ethics task, participants had to take on the role of a member of an Ethics Review Board and judge whether fictitious research proposals were ethical or not (phrased as whether they would allow these research studies to be conducted or not). All research proposals were norm tested by Levari et al. (2018;see Supporting Online Material) to produce scores depicting how people rated the ethical quality of the 273 proposals. These scores were used to bin proposals as unethical (80 proposals), ethical (113 proposals), or ambiguous (80 proposals). These bins were used to calculate the proportion of proposals that appeared in each block (including the practice trial). Just as in the dots task, participants were first presented with instruction screens explaining the task to them. Following the instructions, participants completed a practice trial in which they judged a research proposal using the keyboard keys. In this task, they pressed 'A' when they would not allow a study to be conducted and 'L' when they would.
Following the practice trial, participants began the test trials. All proposals in the experiment were presented in black text against a dark grey background. The task consisted of 240 trials broken into 10 blocks. In the decreasing prevalence condition, the proportion of unethical, ethical, and ambiguous proposals varied across blocks. Specifically, for the 10 blocks of the study, the proportion of unethical proposals relative to ethical and ambiguous proposals (which shared the same proportion) were as follows: .33, .33, .33, .33, .25, .17, .08, .04, .04, .04 (rounded to the nearest second decimal). In the stable prevalence condition, the proportion between the three types of proposals was the same throughout the task: .33.
Each trial, participants read a proposal and pressed 'A' or 'L' on the keyboard indicating whether they thought that the research should be allowed to be conducted on people or not. There was no time limit on this choice. Following the choice, a fixation cross appeared on the screen for 500 ms, followed by the next proposal. Between each block, text appeared that indicated that the block was finished, which block the participant was now at, and offering them a short break should they choose to take one.
Both the dots and ethics tasks described above were taken from Levari et al. (2018). Both tasks were programmed in Python using the PsychoPy libraries.

Procedure
Participants were recruited from the community through online or paper advertisements or from Concordia's participation pool. Participants were contacted by telephone or email and were asked basic demographic information to determine initial eligibility. If eligible at this stage, they were invited for a single two-hour session in the lab.
Once at the lab, participants were asked to fill out a consent form and complete the Richmond HRR pseudoisochromatic test for colour vision (Cole et al., 2006; see Supplement for more details). Participants were then asked to complete the dots task and ethics task, back to back. The order of these tasks was counterbalanced across participants. The counterbalancing did not impact the results presented below. They were told that they would be free to take short breaks during the tasks (between blocks) and a longer break between the tasks, should they choose to. After completing both tasks, participants were debriefed and paid $20 for participating or were given their participation credits.

Descriptive analyses
Choice and response time data were analyzed in R (Version 3.6.1). For the ethics task normed scores were reversed, to make the plots in the same direction as the dots task, such that lower normed scores represented more ethical scenarios. Choice data was analyzed using six binomial mixed-effects models, predicting binary choice (0 = purple/ethical; 1 = blue/unethical) from trial (scaled to be between 0 and 1), stimulus strength (scaled to be between 0 and 1, larger values corresponding to bluer and more unethical stimuli), experimental condition (−1 = stable, 1 = decreasing prevalence), and age group (−1 young adults, 1 = old adults), with random slopes of trial and random intercepts per participant. Fixed effects were tested for statistical significance via likelihood ratio test. Response times were analyzed using two linear regressions, predicting mean reaction time from age group and condition in both tasks.

Drift-diffusion model
The drift diffusion model assumes that people's decisions are the results of a noisy evidence accumulation process over time (Ratcliff & McKoon, 2008). The process begins at some starting point z and accumulates evidence over time to one of two decision boundaries at a rate of v, known as the drift rate. This accumulation process is subject to random perturbations at each time step, which follow a distribution N(0, s). The process continues until one of two boundaries are hit, which corresponds to either option in the task (blue/purple in the dots task). The separation between these boundaries is defined by the parameter a, such that more evidence is required to reach a decision when a is larger and less is when a is smaller. The direction the evidence accumulation process heads towards (e.g., blue or purple) depends on the sign of the drift rate, v, where positive values of v indicate evidence heading for one boundary (blue, here), and negative values indicate evidence heading towards the other (purple). Once a boundary is hit, a response is initiated, which takes some nonzero amount of time to encode and execute, t0.
Parameters in the DDM have previously been shown to be sensitive to age, in particular decision-boundaries (a) and nondecision time (t0) (Theisen et al., 2021). This makes the DDM a good candidate model for exploring age-related differences in decision-processes in prevalenceinduced concept change. However, according to Ratcliff and McKoon (2008), "the [DDM] should be applied only to relatively fast two-choice decisions (mean RTs less than about 1000 to 1500 ms)" (p. 875). In the case of the ethics task, this assumption was not met (mean reaction times were between 0.68 s and 16.21 s in YA and 3.19 s and 21.32 s in OA) and, as such, the model was not fit to data from the Ethics task. We discuss this limitation in more detail in the Discussion section.
Parameters in the DDM were estimated using the HDDM package in Python (Wiecki et al., 2013), which uses a hierarchical Bayesian framework. Decision thresholds, drift rates, and nondecision times were estimated at a trial-bytrial basis as a linear combination of age group, condition, trial number, and stimulus intensity. All other parameters were assumed to be fixed, at default values set by HDDM (see Wiecki et al., 2013). All reported coefficients (b) are mean posterior values, credible intervals (CI, i.e., highest density intervals) are at the 95% level. Bayesian p values (P) represent one minus the proportion of the posterior that falls above or below zero (depending on the sign of the median posterior value: below zero if b < 0 and above if b > 0). In line with the traditional interpretation of frequentist p values, Bayesian p values can be interpreted probabilistically as "there is a (P⨉100) percent chance that the effect is zero or a reversal of the central tendency." Twenty-five hundred samples were drawn from the posterior for each parameter, discarding the first 500 samples for burn-in. Convergence was assessed via visual inspection of trace-plots and Geweke's statistics, which are reported in Table S2.

Choice data
Statistically speaking, prevalence-induced concept change is reflected in a three-way interaction between condition, trial, and stimulus strength, predicting responses. The effect size of this interaction reflects the degree to which a participant's choice to categorize a given exemplar (dot or research proposal) as one concept or another is influenced by a combined effect of three factors: (a) the prevalence of instances in the environment (i.e., the effect of condition), (b) the amount of time that has past (i.e., the effect of trial), and (c) the strength of the stimulus (i.e., blueness or ethicality). Thus, if younger and older adults differ in their sensitivity to prevalence-induced concept change, we would expect to see a four-way interaction between these three terms above and age group, where age group moderates the three-way interaction that represents concept change.
Indeed, this is exactly what we observe. Results are represented in Figs. 1 and 2 and summarized in Table S1. In both tasks, adding age group to the model significantly improved fit (dots task: χ 2 (8) = 68.31, p < .001; ethics task: χ 2 (8) = 100.33, p < .001) and the three-way interaction between condition, trial, and stimulus strength was significant according to a Wald test (dots task: χ 2 (1) = 272.85, p < .001; ethics task: χ 2 (1) = 15.16, p < .001). Critically, we also observed a significant four-way interaction between age group, condition, trial, and stimulus strength, showing that older adults were less sensitive to concept change in both tasks (dots task: b = −1.28; p < .001, CI [−1.70, −0.86]; ethics task: b = 1.22, p < .001, CI [0.54, 1.90]). To illustrate this effect, consider a dot that is precisely halfway between being purple and blue in the dots task (see Fig. 1B). Over the course of the task, when judging such a dot, both young and old adults in the stable condition remained relatively consistent in their judgements-young adults judged this dot to be blue only 2.5% more often at the end of the task than at the beginning and older adults judged the dot to be blue 9.9% less often by the end of the task. In stark contrast, participants in the decreasing condition exhibited a marked shift in their judgements over the course of the task-young adults judged this same dot to be blue 43% more often than at the beginning of the task, whereas older adults exhibited a shift of only 29%. It is worth noting that, while still significant, overall prevalence effects, irrespective of age group, were notably smaller, in the ethics task than the dots task (consistent with Levari et al., 2018;see Fig. 2).

Response times
Response time data across age groups are presented in Fig. 3A and B. We found a significant main effect of age group on response time in both tasks (dots task: b = 0.14, p < .001, CI [0.11, 0.17], mean difference = 0.28 seconds; ethics task: b = 1.16, p < .001, CI [0.73, 1.60], mean difference = 2.32 seconds), but no statistically significant main effect of condition (ps > .54) or interaction between condition and age group (ps > .27). These findings suggest that older adults made slower responses in both tasks, but that neither group differed with regards to response speed between conditions.

Computational modeling results
Results from the full regression model of latent DDM parameters are presented in Table S2. Because responses in the DDM was coded according to category judgements in the dots task, the strength and sign of the drift rate represents, respectively, the tendency and direction of a dot judgement (towards blue or purple). Accordingly, changes in drift rates should directly parallel age differences in choices in the dots task. Indeed, mirroring the results of the descriptive analysis, we found a robust four-way interaction between age group, condition, trial, and colour on drift rates (b = −1.32, CI [−2.03, −0.64], P = 0). Thus, the tendency to judge an ambiguously coloured dot as blue was stronger in later trials than earlier ones in the decreasing condition, and this effect was weaker for older adults.
The question remains: what differentiates older adults' decisions from those of younger adults and drives these differences in drift rates, and, ultimately, responses? As detailed in our descriptive analyses, the key behavioral distinction between young and older adults was in their response time. In the DDM framework employed here, these differences can arise in two ways: (1) from a greater requirement on evidence quality before making a decision, as indexed by the decision threshold parameter, or (2) from slower motor and encoding times, as indexed by the nondecision time parameter. Here, we found that older adults had larger decision thresholds (b = 0.26, CI [0.16, 0.36], P = 0) and nondecision times (b = 0.001, CI [0.001, 0.001], P = 0) than younger adults. However, as seen in Fig. 3C-E, age differences were most pronounced in decision thresholds, whereas differences in nondecision time, though statistically reliable, were negligible in magnitude (on the order of 1-2 ms on average). The precision of the nondecision time posterior was likely driven in part by (1) the large number of observations included in the hierarchical model (128,000 observations across 160 participants), (2) the constraint on the nondecision time being >0 (Wiecki et al., 2013), and (3) the relative simplicity of the task, leading to very narrow credibility intervals, despite miniscule age differences in parameter values.
Together, these results suggest that younger and older adults differed primarily in terms of response caution (i.e., the amount of evidence required before a decision was made) rather than motor and encoding times. This account is bolstered by the presence of a strong interaction between age group and trial on decision thresholds (b = 0.26,CI [0.24,0.29], P = 0), suggesting that older adults not only exhibited wider thresholds overall, but maintained this elevated evidence requirement over the course of the task, in contrast to young adults, who became less cautious towards the end of the experiment (see Fig. S1).

Discussion
In this experiment, we aimed to investigate whether older adults differ from younger adults in their sensitivity to prevalence-induced concept change. Based on previous findings (e.g., Lindenberger & Mayr, 2015;Nassar et al., 2016), we hypothesized that older adults would either be less (H1) or more sensitive (H2) to prevalence-induced concept change than younger adults. Our results support H1. They demonstrate that older adults were less sensitive to prevalenceinduced concept change than young adults in their judgements about the colours of dots and in their ethical judgements about fictitious research proposals. Thus, we argue that older adults' judgments are more stable (or rigid) than younger adults' when faced with a changing task environment.
This finding dovetails nicely with a body of research demonstrating that older adults have greater difficulty than younger adults updating behaviour despite changes in the environment (Eppinger et al., 2011;Hämmerer et al., 2019;Nassar et al., 2016). Recent work has demonstrated that older adults may have more difficulty learning the latent states of stimuli and engage in a more consistent style of responding as a result (Bruckner et al., 2020;Nassar et al., 2016). Notably, in most task environments these impairments in inferences about latent states are associated with performance deficits. Conversely, in the current task, older adults' reduced sensitivity to environmental statistics was protective against some of the negative consequences that could be associated with prevalenceinduced concept change (e.g., claiming something is ethical when you previously said it was not).
But where do these age differences come from? We hypothesized that if age differences existed, they could be explained by differences between young and older adults in estimated parameter values from a drift-diffusion model (Ratcliff & McKoon, 2008;Wiecki et al., 2013). Consistent with our hypothesis, and in line with past work (Theisen et al., 2021), we found that in the dots task older adults had markedly higher decision thresholds than young adults, above and beyond perceptual and motor slowing effects associated with healthy aging (Salthouse, 1996). Furthermore, these elevated decision thresholds could be observed over the course of the whole task, whereas younger adults became less cautious (and their threshold decreased) towards the end of the experiment (i.e., when the prevalence of blue dots had shifted). These results point to the view that older adults made more cautious judgements in the task. That is, they required an elevated level of evidence before making decisions than younger adults, which in turn may have acted to curtail the effects of prevalence-induced concept change.
Age-related increases in decision-boundaries are often associated with suboptimal performance in cognitive tasks (e.g., Starns & Ratcliff, 2010). However, in the current study we show that increased caution can also be protective against biases in judgement. Together with recent work Ruel et al., 2021), these behavioural and computational results suggest that, while cognitive ageing may have negative consequences in many contexts that require a flexible updating of representations (such learning, cognitive control and memory), it may have unexpected benefits for judgement and decision-making, such as in the case of prevalence-induced concept change.

Limitations and future directions
It is important to comment on some limitations of the current study. Chiefly, the present experiments were cross-sectional in nature, comparing groups of young adults and old adults on their susceptibility to concept change. While this allowed for a straightforward and well-balanced experimental design, it limits our ability to make broad claims about the trajectory and potential causes of concept change across the adult life span.
Moreover, while the DDM is well suited for the analyses of fast responses (such as in simple cognitive tasks), it is inappropriate for tasks where response times are long (Ratcliff & McKoon, 2008). Nevertheless, it is interesting to note that, at a purely descriptive level, older adults spent more time on the ethics task than young adults and did not exhibit concept change, which is a pattern of behaviour consistent with increased response caution. Of course, whether response caution represents the particular mechanism underlying reduced concept change in the Ethics task remains speculative and needs to be established in future studies.
Finally, and critically, our principal conclusions about the mechanism underlying older adults' reduced concept changeincreased decision thresholds-are descriptive in nature: Larger decision thresholds coincided with reduced concept change in older adults. However, it is unclear whether larger thresholds cause the greater choice consistency in the elderly. This could be tested in two ways in the future: First, decision thresholds could be manipulated experimentally. Similar experimental manipulations of latent diffusion parameters have been done before by, for instance, providing performance feedback (Cohen Hoffing et al., 2018). Recent work has shown that feedback can reduce concept change (Lyu et al., 2021)-a finding that fits nicely with the current results, given that feedback has also been shown to increase decision boundaries (Cohen Hoffing et al., 2018). Second, additional computational modeling work could be done to better relate in-task choices to response time dynamics. Levari (2022) recently proposed a computational model for prevalence-induced concept change based solely on response data. Combining such a purely choice-based model with the DDM employed here-for instance, by using the DDM as a choice rule (see Pedersen et al., 2017)-would yield strong mechanistic and normative predictions about the nature and time-course of biased choice under prevalence-induced concept change. While the present results lay the groundwork for such a model, the generation and validation of this model is outside the scope of this paper. We hope to pursue this question in forthcoming work.

Conclusion
The current study shows that as we age, our judgements about concepts become more rigid as we face a changing world. While older adults are still generally susceptible to prevalence-induced concept change, this effect is reduced. This reduced sensitivity to concept change may be driven by a more cautious decisionmaking strategy that is reflected in increased decision thresholds.
While overly cautious responding may be maladaptive in some cognitive domains, in the case of prevalence-induced concept change, it can be protective against biased decision-making.
These results have some real-world relevance when considering the degree to which older adults' use of concepts will come to affect the future direction of our society. As we age, it seems our concepts remain more stable, even if the world around us presents us with continued reason to change them. It is in this sense that the quote at the beginning of this paper earns its relevance: The more things (our age and our environment) change, the more they (our concepts) stay the same.