Is human irrationality due to irregularities in dopaminergic neuron firing? Notes from Stauffer, Lak, and Schultz 2014 – Dopamine Reward Prediction Error Responses Reflect Marginal Utility
People and animals do not always make ‘rational’ choices that maximize the amount of reward they receive. For example, deviations from ‘rational’ economic decision making include risk aversion and decreasing marginal utility of a given reward. But what determines this utility? And, how do we know which choice will give us the most utility? A likely candidate is the firing of dopaminergic neurons, which have previously been shown to code reward prediction error. A new paper, from the Schultz lab shows that these neurons do not code reward prediction error directly, but rather utility prediction error. As optogenetic studies have shown that DA neuron firing cause associative learning through reward prediction error, it is likely that these neurons are responsible for our learning the utility of different rewards, and the expected utility of different choices. So is the economic ‘irrationality’ we display in our choices, due to erroneous nonlinear responses of our dopaminergic neurons to rewards? Or, is our ‘irrationality’ still driven by dopaminergic neurons, but this ‘irrationality’ is/was adaptive and evolved.
Schultz’s lab measured two rheesus monkeys economic irrationally during tasks where their choice affected how much juice they got, and used mathematics from economic theory to construct a utility function showing how each monkey valued various rewards. They then tested their utility function using new tasks, and additionally measured from dopaminergic neurons, showing that responses showed marginal utility prediction errors. Finally, they showed that data from dopaminergic neurons could be fed into a temporal difference machine learning algorithm, to teach the algorithm preference for the higher utility option over the lower utility option.
If you’re curious to read more details about the study, check out the full text, and follow along with my notes about each figure.
Hypothesis: If DA neurons are coding utility, their firing should reflect the non-linearities of ‘irrational’ utility functions. Additionally, the reverse may be true, perhaps deviations from ‘rational’ behavior are due to the nonlinear responses of DA neurons.
Methods: These experiments were performed on two male rhesus monkeys, who performed >10,000 trials for each gamble. During behavioral tests, monkeys made choices between a safe and risky rewards by fixating on different stimuli, and based on their choices they were given different amounts of juice. A ‘non-choice task’ was used for the neural recordings. Experimenters measured eye movements with infrared and the lick duration after rewards were given.
They recorded 120 DA neurons during reward prediction tasks and when animals were receiving unpredicted reward.
Figure 1- Generating utility functions (A) schematic of choice task. (B) lick duration correlates with the expected value of reward received. (C) For certain stimuli (and therefore values of safe vs risky rewards), monkeys showed a clear preference for the safe choice, for other stimuli they preferred the risky choice. These results were confirmed by logistic regression. (D) For low values of reward, animals are risk taking. For high values of reward, they are risk averse. So, for low levels of reward they will choose a risky option with lower expected value than the certain option, and for high levels of reward, they will choose a certainty option, with lower value than the expected value of the risky option. (E&F) utility functions generated for the two monkeys, based on their choices.
Figure 2 – Testing utility functions
Using the utility functions generated in A, the experimenters had the animals perform 12 new gambles. (A&B) Show predictions based on the utility functions from Fig 1 E&F on the X axis, and experimental results from the 12 new gambles on the y axis. (C&D) To prove that this relationship truly reflects the utility functions and not just the similarity of the sigmoid utility functions to the expected value, they subtracted the expected value from both the predictions and experimental results, and show that their model is still predictive. (It’s annoying that they don’t use a scale with the same y axis and x axis for C and D, to let you directly compare.)
Figure 3 DA prediction error responses reflect the marginal utility of reward.
(A) Top, three stimuli which animals were shown in a ‘non-choice’ task and the ‘risky’ reward they represent. Note the absolute difference between all of these ‘risky rewards’ is .3mL. Middle, a single unit response to the stimuli. Bottom, population average response to the cues for monkey A.
(B) top Utility function, with points showing the larger reward in the three conditions. Bottom, Marginal utility function, with points showing the larger reward from the gamble.
(C) Top, a single unit response of the ‘positive prediction error’ that occurs when the animals receive the larger reward. (C) bottom population response from monkey A, when the animal receives the larger reward. Note that the larger response, with the larger marginal utility. So because there is a bigger difference in utility between .8mL and .5mL, than in the other conditions, .8mL causes a larger prediction error as shown by the larger DA response.
(D) The larger response for the .8mL stimuli is statistically significant. (p<.01)
(E) DA decreases when the animals receive the smaller reward, which authors interpret as a negative prediction error. However, the responses are not modulated across conditions (and therefore by marginal utility) in any significant way.
Figure 4. Responses to unpredicted reward also reflect marginal utility
Animals were given a reward (unconditioned stimulus) that was not paired to a cue (conditioned stimulus). (A) Following an initial peak of firing that was similar for all amounts of juice, there was a period of DA firing that varied depending on the reward given, shown with the pink bar. (B&C) the firing of DA neurons was modulated in a sigmoid fashion similar to the utility functions generated in Figure 1.
Figure 5 – They tested the idea that the animal is using prediction errors coming from DA neurons. They created a temporal difference reinforcement model using the activity of DA neurons, to see if it could create the expected-utililtiy based behavior they observed in the animals. They trained two models, one receiving on .5 or .8mL rewards, the other receiving .1 or 1.2mL rewards (with p=.5 for each reward in both cases). Each learning simulation was run for 1,000 trials, and each simulation repeated 2,000 times to account for the pseudorandom outcome schedule.
(A) The two gambling tests they trained their model on. The .1 and 1.2mL condition has a higher expected utility (as predicted by animal behavior and DA response) but equal expected value (absolute value of juice received).
(B) With their model they show that a stable utility prediction emerges at the time of the cue, during the last 200 trials of each simulation.
(C) Using the 2000 simulations from each task, they generate a histogram of the ‘stable prediction’ they showed in B. The TD models show a higher stable prediction of expected utility for the .1 and 1.2mL condition as predicted.
(D) Animals showed a higher certainty equivalent for the risky .1 and 1.2mL condition, than they did for the .5 and .8mL condition, indicating they had a higher expected utility for the risky condition.
The authors mention this was consistent with Second-order stochastic dominance “Roughly speaking, for two gambles A and B, gamble A has second-order stochastic dominance over gamble B if the former is more predictable (i.e. involves less risk) and has at least as high a mean.” So because the utility functions showed the animals were both ‘risk preferring,’ they report that it is consistent with them choosing the option with second-order stochastic dominance.
(E and F) Animals showed higher dopamine responses to the cue associated with the higher utility gamble.
Figure 6 – control to show that the higher DA firing really reflects higher expected utility and not just a higher response reflecting the ‘better’ possible outcome.
(A) Two gambles with that have the same ‘better’ outcome,1.2mL juice, but different ‘worse’ outcome .9 or .1mL juice. Animals show a preference for the one with the higher ‘worse’ outcome.
(B) Dopamine responses to the cues, show a greater response for the higher expected utility gamble. This indicates that the greater response in Fig 5(D&E), was not just due to the higher value of the
They term this figure first-order stochastic dominance, defined as a situatton where “Gamble A has first-order stochastic dominance over gamble B if for any good outcome x, A gives at least as high a probability of receiving at least x as does B, and for some x, A gives a higher probability of receiving at least x.” So in this case higher probability equal probability of receiving 1.2, and higher probability of receiving .9.
Filed under: Uncategorized | Leave a Comment