We used Pavlovian eyeblink conditioning to test stimulus generalization following a differential training paradigm. First, mice were randomly divided into three groups. For each group during differential training, we used the same reinforced tone frequency always paired with an air puff (CS+) and one different non-reinforced tone never paired with an air puff (CS-). The non-reinforced CS was 4kHz for mice in Grp.10CS + 4CS-, 9kHz for mice in Grp.10CS + 9CS-, and 9.5kHz for Grp.10CS + 9.5CS-. Before the differential training started, mice underwent auditory brainstem responses (ABRs) and a period of baseline to check their sensitivity in response to tones used during differential training and generalization probe.
Auditory brainstem and auditory startle responses
We tested both hearing threshold and startle response threshold of single subjects using auditory brainstem responses (ABRs) and auditory startle responses (ASRs) before the start of differential training with Pavlovian eyeblink conditioning. ABRs were recorded following tone pips presented at 4, 8, 16 and 32 kHz (see: standardized ABRs protocol Willott et al., 2006; Akil et al., 2016). Our B6CBAF1/J mice showed on average ABR responses at the lowest frequency of 4kHz to tones with a sound pressure level (SPL) of around 38 dB. For 8 and 16kHz, ABR peaks were elicited with the lowest sound pressure level of 22 dB SPL and 13 dB SPL respectively. We found the highest thresholds of around 43 dB SPL on average in response to tone pips at 32kHz (Figure Supplementary 1, Table Supplementary 1). We found no effect of group on ABR thresholds (F(23,2) = 0.36,p = 0.69; Figure Supplementary 1; Table Supplementary 1). Our results are in line with previous measurements on the CBA/CaJ strain at 9 weeks of age done by Zheng and colleagues (1999) who reported ABR thresholds of 34, 23, 15, and 40 dB SPL for the 4kHz, 8kHz, 16kHz, and 32kHz stimuli, respectively. Since we found comparable ABR thresholds between our B6CBAF1/J mice and the CBA/CaJ strain, we conclude that hearing was intact in our animals.
Subsequently, we tested for each mouse the sound levels at which an auditory startle response (ASR) occurred across the frequency range. Auditory startle responses in rodents are characterized by a rapid contraction of skeleton and facial muscles (Pantoni et al., 2020), which also determines a partial eyelid closure. Sometimes a late component of these eyelid startle responses called B-startle can mask or mimic a cerebellar CR (Boele et al., 2010). This is the reason why it was fundamental to precisely identify the SPL and frequency for each mouse which would elicit minimal or no B-startle responses. All sound stimuli had the same duration and ramp/decay pattern, and were never reinforced with an air puff to the eye during baseline sessions. The baseline sessions were repeated for ten days, each day consisting of 20 trials. During baseline, all the animals received the exact same number of trials consisting of only tones, so that we avoid the potentially differentiating effect of latent inhibition (Lubow, 1973; Lubow & Moore, 1959). Measurements on the last day of baseline showed considerable variation in startle threshold within each group of mice in response to tone only trials, although response thresholds within each mouse were uniform so that more sensitive mice tended to be more sensitive to all frequencies (Figure Supplementary 2, Table Supplementary 2). In general, our B6CBAF1/J mice showed similar sensitivity to all tested tone frequencies, as 61 dB was the lowest SPL detected when all dB SPL were averaged across all animals, compared to the previously measured 60 dB for C57Bl6/J from our laboratory (Fiocchi et al., 2022). Interestingly, animals from Grp.10CS + 4CS- and Grp.10CS + 9CS- were less sensitive than Grp.10CS + 9.5CS- (Figure Supplementary 2; Table Supplementary 2). At the end of the baseline, we established proper SPLs for each mouse and for each stimulus, which were used for the whole duration of the eyeblink conditioning differential training (day 1–18) and generalization test (day 19–22).
Eyeblink conditioning – differential training
After baseline, all animals were conditioned for 18 consecutive days (1 session/day) using a 10kHz tone (CS+) which was always reinforced with an airpuff directed to the eye (US) as well as another random tone (CS-) which was not reinforced with the US. Animals with a learning rate less than 5% following quantification of CR percentage between day 1 and 18 of training were excluded from the analysis. Our main question focuses on stimulus generalization in animals which learned the discrimination task properly and for this reason, two animals from Grp.10CS + 9.5CS- were excluded from the analysis of the tone generalization test sessions and were not considered in any of the presented plots (therefore, for Grp.10CS + 9.5CS- n = 6; for other groups, n = 8).
We analyzed average traces of eyelid responses in both CS + and CS- trials for each group over 18 days of training. For CS + trials, we measured the eyelid response before the onset of the air puff and noticed that this was progressively increasing over the course of sessions for all groups (Fig. 2A-C, top). In a similar way, average eyelid responses to CS- trials increased over the course of training. However, CS- eyelid responses grew less and not as rapidly as the CS + in all groups (Fig. 2A-C, bottom).
CR percentage
We found a statistically significant effect of session on the average increase of CR percentage in response to the CS + for all groups: Grp.10CS + 4CS-, Grp.10CS + 9CS- and Grp.10CS + 9.5CS- (Fig. 3A-C; Table Supplementary 3, respectively F(17,119) = 6.65, p < .0001, F(17,119) = 5.05,p < .0001 and F(17,107) = 1.96,p = 0.019, ANOVA on LME). In a different way, the CR percentage to CS- significantly increased during differential training only for Grp.10CS + 4CS- and Grp.10CS + 9CS- (Fig. 3A-C; Table Supplementary 3, respectively F(17,119) = 2.62,p = 0.0012 and F(17,119) = 3.93,p < .0001, ANOVA on LME). We did not find a significant effect of session for CS- of Grp.10CS + 9.5CS- (Fig. 3A-C; Table Supplementary 3, F(17,107) = 1.33,p = 0.186, ANOVA on LME).
The main purpose of our differential training was to establish how well mice could discriminate between sounds when one was positively reinforced (CS+) and another one was not (CS-). For this reason, we compared CR probability between CS + and CS- for each group of mice and found that this was always significantly higher for CS+ (Fig. 3A-C; Table Supplementary 3; respectively for Grp.10CS + 4CS-, Grp.10CS + 9CS- and Grp.10CS + 9.5CS-: F(1,245) = 55.49,p < .0001; F(1,245) = 25.04,p < .0001; F F(1,221) = 34.91,p < .0001, ANOVA on LME). However, post-hoc analysis only revealed a significant difference between CS + and CS- for Grp.10CS + 4CS- starting around session 13, while there was never an effect for Grp.10CS + 9CS- and Grp.10CS + 9.5CS- (Fig. 3A-C; Table Supplementary 3). On the last day of training (day 18), animals from Grp.10CS + 4CS- reached the highest CR percentage in response to the reinforced CS + of 70.29(± 10.0), while for Grp.10CS + 9CS- this was about 65.64(± 13.5) and for Grp.10CS + 9.5CS- about 67.08(± 11.3). The CS- percentages were respectively 33.13(± 13.2) for Grp.10CS + 4CS-, 56.26(± 13.8) for Grp.10CS + 9CS-, and 51.25(± 13.1) for Grp.10CS + 9.5CS-. These data indicate that there was a difference of around 30% in the amount of CRs on the last day of training between CS + and CS- trials for animals of Grp.10CS + 4CS-, while this difference was slightly less pronounced around 10–15% for Grp.10CS + 9CS- and Grp.10CS + 9.5CS- (Table Supplementary 3, all values: mean ± 95% CI).
In order to determine if there was a statistically significant difference between groups on CS + response probabilities, we ran a linear mixed-model (LME) using group, session, and group*session as fixed effects and mouse as random effect. This analysis revealed a main effect of session on CR probability in response to the CS+ (Table Supplementary 3, F(17,345) = 12.33, p < .0001, ANOVA on LME), but no significant effect of group (Table Supplementary 3, F F(2,21) = 0.713,p = 0.501, ANOVA on LME) nor interaction effect of group*session (Table Supplementary 3, F(34,345) = 1.01,p = 0.324, ANOVA on LME).
FEC amplitude
Further quantification of fraction eyelid closure (FEC) across all trials revealed a significant increase across sessions in response to CS + for all groups (Fig. 3D-F, Table Supplementary 3; respectively for Grp.10CS + 4CS-, Grp.10CS + 9CS- and Grp.10CS + 9.5CS-: F(17,13811) = 227.52,p < .0001; F(17,13277) = 188.30, p < .0001; F(17,12459) = 88.90,p < .0001, ANOVA on LME). The average amplitude of CS- trials also increased during differential training within each group of animals (Fig. 3D-F, Table Supplementary 3; respectively for Grp.10CS + 4CS-, Grp.10CS + 9CS- and Grp.10CS + 9.5CS-: F(17,27569) = 227.59,p < .0001; F(17,13128) = 106.20,p < .0001; F(17,12356) = 51.38,p > .0001, ANOVA on LME). Fraction eyelid closure was always found significantly higher in CS + trials compared to CS- in all groups (Fig. 3D-F, Table Supplementary 3; respectively for Grp.10CS + 4CS-, Grp.10CS + 9CS- and Grp.10CS + 9.5CS-: F(17,27551) = 3011.13,p < .0001; F(1,26362) = 1292.91,p < .0001; F(1,24822) = 1883.90,p < .0001, ANOVA on LME).
Post-hoc analysis revealed that the average FEC of the CS + compared to the CS- showed a pattern of progressively earlier significant difference the more the CS + and CS- were similar. Indeed, the CS + started to be significantly higher around session 7 for Grp.10CS + 4CS- and session 5 for Grp.10CS + 9CS-, while it already showed significance on the first day of training for Grp.10CS + 9.5CS-. Interestingly, we found that fraction eyelid closure (FEC) grew differently in response to CS + and CS-. Indeed, the CS + on the last day of training (session 18) in all our groups measured between 0.43 and 0.46, but the CS- for Grp.10CS + 4CS- was lower (0.14(± 0.05)) compared to the amplitudes of Grp.10CS + 9CS- (0.24(± 0.08)) and Grp.10CS + 9.5CS- (0.18(± 0.06)).
Similar to CR percentage, we compared CS + amplitude of FEC responses by running a linear mixed-model (LME) using group, session, and group*session as fixed effects and mouse as random effect. This analysis showed that there was not a significant effect of group on the CS + FEC amplitude (Table Supplementary 3, F(2,21) = 0.050, p = 0.951, ANOVA on LME), while there was an effect of session (Table Supplementary 3, F(17,39497) = 464.27,p < .0001, ANOVA on LME) and of the session*group interaction (Table Supplementary 3, F(34,34497) = 21.06,p < .0001, ANOVA on LME). Similarly to the CR percentage of the CS+, FEC amplitude of animals from Grp.10CS + 9.5CS- were less pronounced at the end of training compared to the other two groups, although this difference was not significant.
CR amplitude
The amplitude of fraction eyelid closure computed on all trials is not a measure of the actual amplitude of the CR to tone only trials. For this reason, we also computed average amplitude considering only trials which show a CR, which we called CR amplitude. We calculated CR amplitude in CS + trials and found a significant effect of session for each group (Fig. 3G-I, Table Supplementary 3; respectively for Grp.10CS + 4CS-, Grp.10CS + 9CS- and Grp.10CS + 9.5CS-: F(17,5766) = 73.54,p < .0001; F(17,6479) = 103.97,p < .0001; F(17,6148) = 77.56,p < .0001, ANOVA on LME). This was true also for CS- trials (Fig. 3G-I, Table Supplementary 3; respectively: F(17,9044) = 87.97,p < .0001; F(17,5167) = 45.17,p < .0001; F(17,4553) = 42.69,p < .0001, ANOVA on LME). We also noticed that within each group, the CS + eyelid responses which we could consider CRs were on average about 0.25–0.30 higher compared to CS- on the last day of training (day 18) (Table Supplementary 3).
Our results showed that CR amplitudes in CS + were consistently higher compared to CS- in all groups (Fig. 3G-I, Table Supplementary 3; respectively for Grp.10CS + 4CS-, Grp.10CS + 9CS- and Grp.10CS + 9.5CS-: F(1,9026) = 984.17,p < .0001; F(1,11653) = 1192.1423,p < .0001; F(110708) = 1606.12,p < .0001, ANOVA on LME). Similar to FEC amplitude, following post-hoc analysis we found that CR amplitude for Grp.10CS + 4CS- and Grp.10CS + 9CS- started to show significant differences between CS + and CS- at session 6, while Grp.10CS + 9.5CS- already showed a significant difference around session 2 during differential training (Table Supplementary 3).
Next, we computed a linear-mixed effect model (LME) using session, group, and their interaction as fixed effects and mouse as random effect. We did not find a significant effect of the group factor on the CS + amplitude across CR only trials (Fig. 3G-I, Table Supplementary 3, F(1,21) = 0.44,p = 0.645 ANOVA on LME). However, similar to the FEC amplitude there was a significant effect of session (Fig. 3G-I, Table Supplementary 3, F(17,18393) = 230.78,p < .0001, ANOVA on LME) and interaction of group*session (Fig. 3G-I, Table Supplementary 3, F(34,18393) = 12.26,p < .0001, ANOVA on LME).
Behavioral test: generalization test
The day after the last session of training (day 18), we tested the generalization of the CS + for four consecutive days (1 session/day) on the three groups of mice (in Fig. 1B, days 19 to 22). During generalization test sessions, mice were exposed to tone frequencies never paired with an air puff (2, 4, 6, 8, 9, 9.5, 10.5, 11, 12, 14, 16, 18, 20 kHz), as well as to the 10kHz tone only and to the 10kHz CS + paired with an airpuff US. All the stimuli presented during the generalization test sessions had the exact same duration of 280 ms and ramp/decay times of 25 ms as for the CS + and the 4, 9, and 9.5 kHz CS- during differential training. For all groups, n = 8. Note that two animals died during training for GR10CS + 9.5CS- so that during Generalization test sessions n = 6.
CR percentage
We found a significant effect of tone generalization on the average of eyelid CRs for Grp.10CS + 4CS- (Fig. 5A; Table Supplementary 4; F(13,91) = 4.31, p < .0001, ANOVA on LME), but not for Grp.10CS + 9CS- and Grp.10CS + 9.5CS- (Fig. 5A, Table Supplementary 4, respectively: F(13,65) = 1.13, p = 0.346, ANOVA on LME). More specifically, animals from Grp.10CS + 4CS- showed a downward gradient in the direction of lower frequencies than the CS+ (from 72.60(± 12.0)% for the 10kHz to 30.39(± 18.2)% for the 2kHz), which was less evident for higher frequencies (71.22(± 8.8)% for 10.5kHz and 67.66(± 13.8)% for 20kHz) (Fig. 5A, Table Supplementary 4, all values: mean ± 95% CI). On the other hand, animals from Grp.10CS + 9CS- and Grp.10CS + 9.5CS- showed a less steep decreasing gradient than Grp.10CS + 4CS-. Following post-hoc analysis, none of the comparisons between CS + and tone frequencies tested during generalization showed significance within Grp.10CS + 9CS- and Grp.10CS + 9.5CS-. In addition, our results show that CR probability in Grp.10CS + 4CS- dropped off almost 40% from the CS + to the 2kHz, while Grp.10CS + 9CS- and Grp.10CS + 9.5CS- animals’ responses from the CS + trials to the 2kHz tone decreased around 10% in Grp.10CS + 9CS- and almost 15% in Grp.10CS + 9.5CS- (Table Supplementary 4).
Next, we ran a linear-mixed model (LME) using group, tone frequency, and group*tone frequency as fixed effect and mouse as random effect. We found that there was a statistically significant main effect of the tone frequency used on CR percentage (Table Supplementary 4, F(13,247) = 5.22,p < .0001, ANOVA on LME), but no effect of group (Table Supplementary 4, F(2,19) = 0.576,p = 0.571, ANOVA on LME), nor group*tone frequency interaction (Table Supplementary 4, F(26,247) = 1.35,p = 0.123, ANOVA on LME).
FEC amplitude
We found a similar effect of tone frequencies on the eye closure amplitude considering all trials in all groups (Table Supplementary 4; respectively for Grp.10CS + 4CS-, Grp.10CS + 9CS- and Grp.10CS + 9.5CS-: F(13,1841) = 21.36,p < .0001; F(13,1737) = 6.78, p < .0001; F(13,1375) = 3.64, p < .0001, ANOVA on LME), as our data showed a decreasing gradient in the direction of lower frequencies than the CS+ (Figs. 4A-C, 5B, Table Supplementary 4). Averaged FEC amplitude peaked for the CR percentage at the CS + with an amplitude of 0.53(± 0.16) for Grp.10CS + 4CS- (Fig. 4A), 0.46(± 0.17) for Grp.10CS + 9CS- (Fig. 4B), and 0.43(± 0.12) for Grp.10CS + 9.5CS- (Fig. 4C). This amplitude decreased in response to a 2kHz tone around 0.12(± 0.07) for Grp.10CS + 4CS-, 0.30(± 0.12) for Grp.10CS + 9CS-, and 0.23(± 0.08) for Grp.10CS + 9.5CS-. We found a similar decrease also in response to higher tone frequencies than the CS + in each group of mice; for instance, the 20kHz tone in Grp.10CS + 4CS- showed FEC amplitude of 0.38(± 0.11), for Grp.10CS + 9CS- of 0.39(± 0.16), and for Grp.10CS9.5CS- of 0.32(± 0.07).
These data indicate that for Grp.10CS + 4CS- there was a steeper downward gradient in the direction of both lower and higher tones than the CS+, compared to Grp.10CS + 9CS- and Grp.10CS + 9.5CS- (Table Supplementary 4, Fig. 5B, all values: mean ± 95% CI). Post-hoc comparisons revealed that FEC amplitudes for tone frequencies at least 10% higher and lower than the CS + were significantly different from the 10kHz used to test stimulus generalization. Interestingly, there was a significant difference in the FEC amplitude between the CS + and the respective CS- in Grp.10CS + 4CS- and Grp.10CS + 9CS- (Table Supplementary 4, respectively p < .0001 and p = 0.003), but not for Grp.10CS + 9.5CS- (Table Supplementary 4, p = 0.83).
Linear-mixed effect models (LMEs) using group, tone frequency, and group*tone frequency as fixed effects and mouse as random effect, showed that there was a significant effect of tone frequency (Table Supplementary 4, F(13,4953) = 26.68,p < .0001, ANOVA on LME) and interaction between group*tone frequency (Table Supplementary 4, F(26,4953) = 3.48,p < .0001, ANOVA on LME). However, there was not a statistically significant effect of the group factor (Table Supplementary 4, F(2,19) = 0.21,p = 0.805, ANOVA on LME).
Comparison of the cumulative distributions of FEC amplitudes for each group revealed significant effects only for many of the sound frequencies used to test generalization of Grp.10CS + 4CS- (Fig. 6A, D; For p values we refer to Table Supplementary 5; all Kolmogorov-Smirnov tests with correction for multiple comparison using FDR). Cumulative distributions from Grp10CS + 9CS- and Grp10CS + 9.5CS- instead (Fig. 6B, E and C, F) show a decreasing pattern with some significant differences for lower tones than the CS+ (For p values we refer to Table Supplementary 5) but not for higher tones.
CR amplitude
We found a significant effect of tone generalization on the amplitude of trials showing a CR (both CS + and CS-) in all groups (Table Supplementary 4, Fig. 5C; respectively for Grp.10CS + 4CS-, Grp.10CS + 9CS- and Grp.10CS + 9.5CS-: F(13,1066) = 9.28,p < .0001; F(13,1228) = 4.22, p < .0001; F(13,839) = 2.29,p = 0.005, ANOVA on LME). More specifically, the average CR amplitude for Grp.10CS + 4CS- clearly peaked at the CS+ (0.66(± 0.11)) and showed a downward gradient in both directions of higher and lower frequency tones (Table Supplementary 4). On the other hand, Grp.10CS + 9CS- and Grp.10CS + 9.5CS- did not show a clear generalization gradient, although the CR only amplitude peaked at the CS + for animals in both these groups (Table Supplementary 4, respectively 0.55(± 0.16) and 0.57(± 0.09), all values: mean ± 95% CI).
Post-hoc analysis revealed that for Grp.10CS + 4CS- and Grp.10CS + 9CS- there was a significant difference between some of the tone-only trials used to test stimulus generalization and the CS+, while there was no significance between CS + and stimulus generalization test tones in Grp.10CS + 9.5CS- (Table Supplementary 4, Fig. 5C).
We also analyzed this data using group, tone frequency, and group*tone frequency as fixed effects and mouse as random effect in a linear-mixed effect model (LME). We found a significant main effect of tone frequency (Table Supplementary 4, F(13,3133) = 12.07,p < .0001, ANOVA on LME) and also of the interaction tone frequency*group (Table Supplementary 4, F(26,3133) = 1.99,p = 0.001, ANOVA on LME), while there was no effect of group on the amplitude of CR only trials (Table Supplementary 4, F(2,19) = 0.128,p = 0.880, ANOVA on LME).
In addition, we looked at the cumulative distribution of CR amplitude and found a significant difference between the CS + and many of the sound frequencies used to test stimulus generalization in Grp.10CS + 4CS- (for p values we refer to Table Supplementary 5. All Kolmogorov-Smirnov tests with correction for multiple comparisons using FDR, Fig. 6G, J), while all the other tone frequencies in Grp.10CS + 9CS- and Grp.10CS + 9.5CS- did not result significant (Fig. 6H, K and I,L).
Previous work in mice showed that both CR probability and amplitude eyelid closure of CR only trials decreased on the degree of similarity between the tone frequency tested and the reinforced CS (CS + in this experiment) (Fiocchi et al., 2022). However, this same phenomenon was not found in rabbits (Khilkevich et al., 2019) which instead show constant amplitude of eyelid closure CRs irrespectively of the tone frequency tested. For this reason, we also looked at higher CR thresholds other than the 0.05 which we used to detect CRs across training and generalization tests. More specifically, we plotted the cumulative distribution using higher CR thresholds of 0.10, 0.15 and 0.20 (Fig. 7B-D). All in all, animals from Grp10CS + 4CS- showed significant comparisons between tone frequencies even when higher CR thresholds were tested (Fig. 7A-D). Meanwhile, Grp10CS + 9CS- and Grp10CS + 9.5CS- did not show any significant comparison with any of the thresholds that were used (Fig. 7A-D).
CR peaktime
One major advantage of the new technologies used to measure eyeblink conditioning is that these allow for measuring latency to the onset and the peak of the conditioned eyelid responses very precisely. For this reason, we also analyzed measures of timing related to the onset and the peak latency of the eyelid CRs. We found a significant effect of tone frequency on the latency to CR peak for all our groups (Table Supplementary 4; respectively for Grp.10CS + 4CS-, Grp.10CS + 9CS- and Grp.10CS + 9.5CS-: F(13,1066) = 2.12, p = 0.011; F(13,1228) = 2.0, p = 0.0177; F(13,839) = 3.49,p < .0001, ANOVA on LME). On average, it appeared that lower and higher frequencies resulted in longer latencies to CR peak for animals in all groups (Table Supplementary 4, Fig. 5E). When running post-hoc comparisons, we did not find any significantly different latencies to CR peak only when comparing CS + and tones used to test stimulus generalization (Table Supplementary 4, Fig. 5E).
All in all, our linear-mixed effect model (LME) revealed that there was a main effect of tone frequency on the latency to CR peak (Table Supplementary 4, F(13,3133) = 3.35,p < .0001, ANOVA on LME) and of the interaction between tone frequency*group (Table Supplementary 4, F(26,3133) = 1.96,p = 0.002, ANOVA on LME), while comparison between groups did not reveal a significant difference (Table Supplementary 4, F(2,19) = 0.39,p = 0.681, ANOVA on LME).
CR onset
In general, latency to CR onset was between 150 and 180 ms after the onset of the tone CS in all groups and there was an effect of generalization test tones (Table Supplementary 4, Fig. 5D; respectively for Grp.10CS + 4CS-, Grp.10CS + 9CS- and Grp.10CS + 9.5CS-: F(13, 209) = 2.23,p = 0.009; F(13,261) = 1.99,p = 0.022 F(13,243) = 2.33, p = 0.006, ANOVA on LME). Importantly, CR onset was computed across CR only trials that did not show a startle response (see: Methods, Eyeblink conditioning data analysis).
Next, from our linear-mixed effect model (LME) we analyzed the effect of tone frequency, group, and the interaction between tone frequency*group as fixed effects and mouse as random effect. Results from our LME showed that there was no significant effect of group (Table Supplementary 4, F(2,19) = 1.65,p = 0.217, ANOVA on LME), but there was a significant effect for the interaction between tone frequency*group (Table Supplementary 4, F(26,713) = 2.15,p = 000.8) and for the tone frequency (Table Supplementary 4, F(13,713) = 2.28,p = 0.005, ANOVA on LME).