Animal ethics, welfare and surgical implantation
We used two adult male rhesus monkeys (Macaca mulatta; Monkey V: 11 kg and Monkey U: 17.5 kg). This research has been ethically reviewed, approved, regulated and supervised by the following UK and University of Cambridge (UCam) institutions and individuals: UK Home Office, implementing the Animals (Scientific Procedures) Act 1986, Regulations 2012, and represented by the local UK Home Office Inspector, UK Animals in Science Committee, UK National Centre for Replacement, Refinement and Reduction of Animal Experiments (NC3Rs), UCam Animal Welfare and Ethical Review Body (AWERB), UCam Biomedical Service (UBS) Certificate Holder, UCam Welfare Officer, UCam Governance and Strategy Committee, UCam Named Veterinary Surgeon (NVS), and UCam Named Animal Care and Welfare Officer (NACWO).
The two monkeys were housed in adjoining cages and placed on a restricted water regimen calibrated by body weight. Behavioral data were acquired from both animals for a prelusive publication (Al-Mohammad & Schultz 2022). Each weekday, we transported the animals to the experimental laboratory in an individually adjusted primate chair (Crist Instruments). Animals sat in this chair for the duration of the daily tests, which never exceeded 5 hours. We provided animals with fruit and vegetable enrichment on Friday evening and ad-libitum access to water throughout Friday evening and Saturday.
We implanted Monkey U with a titanium headpost (Crist instruments) used for head fixation and later implanted a recording chamber after the headpost had integrated with the bone. For Monkey V, we implanted head-fixation hardware concomitantly with the recording chamber. Chambers were centered on the skull laterally using a stereotaxic head holder and a Kopf stereotaxic manipulator. After recovery from surgery, we drilled craniotomies above recording sites and chambers were monitored and cleaned daily. Once experiments were completed, recording sites were marked with electrolytic lesions (15–20 µA, 20–60 s). Upon completion of the experiment we sacrificed the animals by administering an overdose of sodium pentobarbital (90 mg/kg, IV) and subsequently perfused with 4% formalin in 0.1 M phosphate-buffered saline. We confirmed recording positions histologically from 40 µm slices stained with cresyl violet.
Experimental setup
During experiments, animals were head-fixed while seated in a primate chair. All experiments were performed in a dimly lit experimental isolation booth (Crist instruments) to minimize disruption. The monkeys were positioned so that their eyes were ~ 70 cm from a computer monitor. The joystick-lever was attached to the chair and made accessible via ~ 15 cm2 opening in the front of the chair. Water and juice were delivered through separate spouts positioned ~ 5 mm from the animal’s mouth. Fluid delivery was controlled by gravity-fed solenoid valves connected to 1 L beakers using silicone tubing. Juice and water delivery valves were calibrated to deliver precise volumes (SD < 0.01 ml). The monkeys were trained to use a custom-made touch-sensitive joystick (Biotronix Workshop, University of Cambridge) to interact with the task displayed on a computer monitor as previously described (Al-Mohammad & Schultz 2022). The joystick was only movable in the x and y directions.
BDM elicitation of subjective value
The BDM is a second price sealed-bid auction-like mechanism that has been shown to elicit truthful estimates of internal subjective value on a single-trial basis. Typically, a BDM bidder will garner the highest payoff by bidding exactly the value they place on the good. Economists refer to this as incentive compatibility: the optimal strategy is to bid one’s true subjective value. Bidding too high (overbidding) increases the risk of overpaying, and bidding too low (underbidding) increases the risk of not obtaining the desired item (Fig. 1a, b) (Lusk & Shogren 2007). Three features are key to the BDM’s incentive compatibility: (1) the second-price nature is essential for revealing the true subjective value because the opponent’s bid is unknown; it prevents overbidding because the unknown bid of the opponent may exceed the internal value and thus result in overpaying; it prevents underbidding because the opponent might outbid them, and (2) the bids are hidden until all bids have been submitted (sealed bid auction). Thus, BDM is akin to a private value auction, as subjects are not able to infer the value other bidders place on the good. As opponent bids are drawn randomly from a uniform distribution, a common value cannot be surmised, even with consecutive trials. Many details of the bidding performance of the animals used in this study have been presented before (Al-Mohammad & Schultz 2022), and only the behavioral results relevant for the current neuronal analyses will be described here.
The described characteristics explain our rationale for using the BDM. Any study of internally determined reward value requires that the measured events, namely the bids given by the animal, reflect the true internal subjective value at each moment. The incentive-compatible nature of the BDM provides exactly that assurance. Further, the BDM does not require a biological opponent, which makes the experimentation less complicated and simplifies the interpretation of neuronal data by avoiding confounds from an opponent’s behavior.
Recent experiments demonstrated that rhesus monkeys can show meaningful performance in behavioral tasks implementing a BDM (Al-Mohammad & Schultz 2022). On every trial, the monkey bid against a computer opponent using a joystick; the animal paid from a water endowment that had been allocated on every trial with the same amount. If the animal’s bid equaled or exceeded the computer bid, the animal won the auction and paid the price defined by the computer bid (second-price nature of BDM); thus, the animal received the juice it had bid for, plus the rest of the water endowment after subtraction of the computer bid. If the animal’s bid was below the computer bid, it lost the auction and received the full water endowment (Fig. 1b). To increase the number of trials per reward magnitude, monkeys bid for only 3 reward magnitudes. The range of reward magnitudes was calibrated to each animal’s preferences so that the full range of bids were well represented. In addition, monkeys bid for fixed goods rather than for lotteries, which avoided confounds from the animal’s risk attitude. Our task also circumvented the endowment effect (a tendency to over-value previously acquired goods), as monkeys do not ‘pay’ an amount already acquired but rather indicate the amount of water they are willing to forgo from water paid out at the end of the trial. Specifically, monkeys receive a water payout on every non-error trial (see below) regardless of whether they win or lose the auction; their bids reflect how much water they are willing to forego on each trial to get the juice reward.
BDM Task
BDM trials were initiated with a yellow cross at the center of the screen (Fig. 1a). After 0.5 s, a fractal representing one of three different juice volumes appeared. After 1.0 s, a vertically oriented rectangle appeared that denoted the bid space that was defined by the smallest and largest reward amount the animal could bid for (0 and 1.2 ml; hashed black lines on white background). A bid cursor overlayed the bid space rectangle (magenta). Forward and backward movement of the joystick generated up and down movements of the cursor within the bid space. Bidding had to be initiated within 0.5 s and stabilized within 5 s, otherwise the trial was terminated and the screen briefly flashed red indicating a failed trial and a wait penalty equal to the remaining trial time plus 2 seconds. Letting go of the joystick at any point during the trial or moving during any period except the bidding epoch also resulted in trial termination. The total bid space represented 1.2 ml of water; monkeys’ bids indicated how much water they were willing to sacrifice for a given fractal. Once the monkey’s bid was stable for more than 0.5 s, the bid of the computer opponent appeared and the direction of the hashed lines below the computer bid reversed, indicating the amount of water to be ‘paid’ for obtaining the juice if the animal won the BDM (second price). If the monkey won the BDM, the juice was paid out and the fractal would disappear from the screen after 1 s, followed by water pay out that lasted up to1 s (1.2 ml minus the computer bid measured in ml, which reflected the second price character of BDM) and removal of the bid space from the screen. If the monkey lost the BDM, the fractal disappeared and the water was paid out in full (1.2 ml).
Behavioral analysis
BDM bids reflect subjective value and change from trial to trial. This can be demonstrated formally for expected-utility maximizers (see Lusk & Shogren 2007) and has been supported empirically in rhesus monkeys in previous work from our laboratory (Al-Mohammed & Schultz 2022). Here we sought to further test whether changes in bidding resulted from changes in value or from other value-irrelevant task features. To test which elements of the task contributed most to bid variance, we first used a cross-validated lasso regression (lasso function, Matlab) to identify variables that contributed to bid variability. Lambda (tuning parameter) was selected by taking the value corresponding to one standard error above the mean squared error based on 2000-fold cross-validation (figure S2d). We used the following 29 regressors in the lasso model: 1) reward value, 2) starting bid, 3) previous total liquid, 4) day of week, 5) session number, 6) previous trial failure, 7) previous trial result, 8) previous result from trial with same reward magnitude, 9) trial number, 10) competing bid t-1, 11) competing bid t-2, 12) competing bid t-3, 13) competing bid t-5, 14) competing bid t-7, 15) competing bid for the same reward magnitude t-1, 16) competing bid same reward magnitude t-2, 17) competing bid same reward magnitude t-3, 18) competing bid same reward magnitude t-4, 19) competing bid same reward magnitude t-5, 20) competing bid same reward magnitude t-6, 21) competing bid same reward magnitude t-7, 22) competing bid same reward magnitude t-8, 23) competing bid same reward magnitude t-9, 24) competing bid same reward magnitude t-10, 25) average of competing bid same reward magnitude t-2 to t-1, 26) ) average of competing bid same reward magnitude t-3 to t-1, 27) ) average of competing bid same reward magnitude t-4 to t-1, 28) ) average of competing bid same reward magnitude t-5 to t-1, 29) ) average of competing bid same reward magnitude t-6 to t-1.
Variables remaining in the correspondent model were then used in a mixed-effects model to identify which had the largest impact on bidding behavior (Fig. 1e).
Because reward magnitude is highly intercorrelated with the subjective value and therefore with the bid, we sought to determine what other factors, besides reward, most prominently predicted changes in bids. For this we used a reduced mixed-effects model with reward magnitude included as a random effect (figure S2e).
If changes of individual bids reflect changes in subjective value across trial, then these changes should be apparent in bids across all three reward magnitudes. We tested this by measuring bid coherence within and between sessions. For within session coherence, bids were interpolated for each reward magnitude to create three equally populated vectors of bids to retain trial-by-trial temporal fidelity. Each vector was then correlated with the others in three separate tests comparing low reward magnitudes to mid magnitudes (L:M), mid magnitudes to high magnitudes (M:H), and low magnitudes to high magnitudes (figure S2b). The resulting rho values provide a relative estimate of coherence with values above zero indicating positive coherence. Note that because values were interpolated for trials where a given reward magnitude was not represented, this analysis can only provide a lower bound for the estimated coherence.
Electrophysiological recording and analysis
Electrophysiological signals were recorded using electrodes made in house or ordered from Alpha-omega (125 µm diameter, 60-degree bevel). Electrodes were loaded into a sterile 23-gauge stainless steel cannula which was used to pierce the meninges and stabilized the electrode’s path through the brain. Electrodes were lowered into the midbrain using an electrode micromanipulator from Nan instruments (model: CMS) or Narishige (model: MO-97). Recordings were amplified and band-pass filtered from 100 to 5000 Hz (custom hardware and Bak Electronics). Recordings were digitized using a National Instruments data acquisition card and visualized with custom Matlab (Mathworks) software. Neuronal impulses were sorted off-line using Spike2 version 7.8 (Cambridge Electronic Design).
Ventral midbrain localization
Coordinates for the recording sites in the ventral midbrain were determined using sagittal radiograph images of the head in a stereotaxic frame and electrophysiological signatures of surrounding cell groups. Specifically, animals were placed in a stereotaxic frame and a cannula was inserted in the center-most position of x-y plane of the recording chamber. Bone features (interaural origin and the clinoid process of the sphenoid bone) were used to determine the approximate anteroposterior and dorsoventral positions of familiar nuclei. Using these positions as anchors, stereotypical electrophysiological responses from the red nucleus and ventral posteromedial nucleus of the thalamus guided the localization of the recording sites of dopamine neurons in the ventral midbrain.
Statistical analysis of dopamine neuron responses
Dopamine neurons were identified using canonical criteria: a wide impulse waveform (> 1.8 ms), low baseline impulse activity (< 10 Hz), and consistent responses to unpredicted reward delivery. All neuronal impulse data were binned in 1 ms bins for analyses. Population analyses were performed with all dopamine neurons; analyses of bid-encoding neurons were performed using only dopamine neurons that exhibited a positive correlation with the monkeys’ bids (see below). All statistical analyses were performed on raw activity (for single neurons) or z-normalized activity from time-windows defined by gray boxes in figures (for all bid-encoding neurons and population analyses). Average traces shown in figures were smoothed with an 80 ms or 100 ms moving average. For group-level analyses, data were z-scored to account for variance among neuronal activity. Bids were discretized into bins for individual neuron analyses and for group-level analyses to obtain more accurate estimates of average responses for a given bid-range (e.g. by splitting the overall bid range into tenths and averaging the neural responses within bins; for specifics, see Results).
Correlations between bids and neuronal responses were assessed using the following linear regression:
To test whether subjective value, as expressed by the bids, was driving changes in activity independent of reward magnitude, responses were correlated with bids when reward magnitude was held constant (Fig. 3a-h). This was further tested by comparing neuronal responses for matched bids between reward magnitude levels for individual sessions (Fig. 4). For this analysis, for each experimental session, responses corresponding to similar bids (within 5%) made for two different reward magnitudes (medium vs. low, high vs. medium, high vs. low) were pooled across all neurons and compared using a paired Wilcoxon sign-rank test. Significant differences in neuronal activity in this test would be indicative of responses driven by reward magnitude independent of bid. No difference indicates that neuronal responses were driven by bids independent of reward magnitude.
Neural decoding with Support Vector Regression (SVR)
We used Linear SVR to predict the monkeys’ bids based on the responses of dopamine neurons to the fractals. The continuous bids were discretized into 10 non-overlapping bid ranges from [0-0.1] to [0.91-1.0]. The model was subsequently trained on neuronal responses from each bid range. Neurons with fewer than 10 trials in each bid range were excluded. For neurons with > 10 trials in a bid range, 10 trials were selected at random, providing 100 randomly selected trials (figure S7). The SVR model was adapted from methods used by Glaser et al. (2020) and implemented with custom written software in Matlab 2021b using ‘fitrsvm’ and ‘predict’ functions for model training and testing.
We used three separate SVR models to assess how well dopamine neurons could predict animals’ bids. Neurons were added to these models randomly or by order of explained variance for monkeys’ bids. Neurons were added to model 1 from highest to lowest correlation, in model 2 from lowest to highest correlation, and randomly in model 3. Models one and two provide a lower and upper limit of decoding accuracy, and model 3 allows for comparison with similar analyses in previous works. Model performance was assessed with coefficient of determination R2 (explained variance). Each model was tested against shuffled data (bids and response shuffled 1,000 times). The binary differences between R2 coefficients obtained for real data against the shuffled data was verified by Wilcoxon rank-sum test (p < 0.01).
Each SVR model was trained/tested using a five-fold cross-validation (80% / 20%) method: eight trials from every bid category (8 trials x 10 categories) were used for training, and the remaining two trials from every bid category were used for testing (2 trials x 10 categories; figure S8). This procedure was repeated five times, thus providing five R2 estimates for each set of 100 randomly selected trials. The reported explained variance R2 for model-predicted versus actual bid data was calculated by averaging the R2 values from 150 iterations of the 100-trial random selection procedure described above.