Decision making under uncertainty: exploration and exploitation 467
Junior Researcher, Center for Neurocognitive Research (MEG Center), Moscow State University of Psychology & Education, Moscow, Russia
PhD, Senior Research Fellow, Centre for neuro-cognitive studies (MEG-center), Moscow State University of Psychology and Education, Moscow, Russia
Junior Researcher, Center for Neurocognitive Research (MEG Center), Moscow State University of Psychology & Education, Moscow, Russia
PhD in Engineering, Junior Researcher, Center for Neurocognitive Research (MEG Center), Moscow State University of Psychology & Education, Moscow, Russia
PhD in Biology, Head of Center for Neurocognitive Research (MEG-Center), Moscow State University of Psychology & Education, Moscow, Russia
Decision-making under conditions of the lack of sufficient information is associated with hypotheses construction, verification and refinement. In a novel environment subjects encounter high uncertainty; thus their behavior needs to be variable and aimed at testing the range of multiple options available; such variability allows acquiring information about the environment and finding the most beneficial options. This type of behavior is referred to as exploration. As soon as the internal model of the environment has been formed, the other strategy known as exploitation becomes preferential; exploitation presupposes using profitable options that have already been discovered by the subject. In a changing or complex (probabilistic) environment, it is important to combine these two strategies: research strategies to detect changes in the environment and utilization strategies to benefit from the familiar options. The exploration-exploitation balance is a hot topic in psychology, neurobiology, and neuroeconomics. In this review, we discuss factors that influence exploration-exploitation balance and its neurophysiological basis, decision-making mechanisms under uncertainty, and switching between them. We address the roles of major brain areas involved in these processes such as locus coeruleus, anterior cingulate cortex, frontopolar cortex, and we describe functions of some important neurotransmitters involved in these processes – dopamine, norepinephrine, and acetylcholine.
The reported study was funded by Russian Science Foundation (RSF), project number 14-06-14029.
The authors are grateful to Stroganova T.A. for her great contribution to research on neurocognitive mechanisms of decision-making in the Moscow MEG center.
- Kaneman D., Tverski A. Ratsional'nyi vybor, tsennosti i freimy [Rational
choice, values and frames]. Psikhologicheskii zhurnal [Psychological
journal], 2003. Vol. 24, no. 4, pp. 31–43.
- Beeler J.A. et al. A kinder, gentler dopamine... highlighting dopamine's
role in behavioral flexibility. Frontiers in neuroscience, 2014. Vol. 8,
article ID 4, 2 p. DOI:10.3389/fnins.2014.00004
- Gehring W.J. et al. A neural system for error detection and compensation.
Psychological science, 1993. Vol. 4, no. 6, pp. 385–390.
- Addicott M.A. et al. A primer on foraging and the explore/exploit trade-off
for psychiatry research. Neuropsychopharmacology, 2017. Vol. 42, pp.
- Aspers P. Forms of uncertainty reduction: decision, valuation, and contest.
Theory and society, 2018. Vol. 47, pp. 133–149.
- Aston-Jones G., Cohen J.D. An integrative theory of locus
coeruleus-norepinephrine function: adaptive gain and optimal performance.
Annual Review of Neuroscience, 2005. Vol. 28, pp. 403–450.
- Aston-Jones G., Rajkowski J., Kubiak P. Conditioned responses of monkey
locus coeruleus neurons anticipate acquisition of discriminative behavior in a
vigilance task. Neuroscience, 1997. Vol. 80, no. 3, pp. 697–715.
- Barack D.L., Gold J.I. Temporal trade-offs in psychophysics. Current
opinion in neurobiology, 2016. Vol. 37, pp. 121–125.
- Blanchard V.C., Gershman S.J. Pure correlates of exploration and
exploitation in the human brain. Cognitive, Affective, & Behavioral
Neuroscience, 2018. Vol. 18, no. 1, pp. 117–126.
- Boschin E.A., Piekema C., Buckley M.J. Essential functions of primate
frontopolar cortex in cognition. Proceedings of the National Academy of
Sciences, 2015. Vol. 112, no. 9, pp. E1020–E1027.
- Botvinick M.M., Cohen J.D., Carter C.S. Conflict monitoring and anterior
cingulate cortex: an update. Trends in cognitive sciences, 2004. Vol. 8,
no. 12, pp. 539–546. DOI:10.1016/j.tics.2004.10.003
- Cavanagh J.F., Frank M.J. Frontal theta as a mechanism for cognitive
control. Trends in cognitive sciences, 2014. Vol. 18, no. 8, pp.
- Conant R.C., Ross Ashby W. Every good regulator of a system must be a model
of that system. International journal of systems science, 1970. Vol. 1,
no. 2, pp. 89–97. DOI:10.1080/00207727008920220
- Cook Z., Franks D.W., Robinson E.J.H. Exploration versus exploitation in
polydomous ant colonies. Journal of theoretical biology, 2013. Vol. 323,
pp. 49–56. DOI:10.1016/j.jtbi.2013.01.022
- Daw N.D. et al. Cortical substrates for exploratory decisions in humans.
Nature, 2006. Vol. 441, pp. 876–879. DOI:10.1038/nature04766
- Denison S., Xu F. Infant statisticians: The origins of reasoning under
uncertainty. Perspectives on Psychological Science, 2019. Vol. 14, no.
4, pp. 499–509. DOI:10.1177/1745691619847201
- Cinotti F. et al. Dopamine blockade impairs the exploration-exploitation
trade-off in rats. Scientific reports, 2019. Vol. 9, no. 1, pp. 1–14.
- Kayser A.S. et al. Dopamine, locus of control, and the
exploration-exploitation tradeoff. Neuropsychopharmacology, 2015. Vol.
40, no. 2, pp. 454–462. DOI:10.1038/npp.2014.193
- Humphreys K.L. et al. Exploration–Exploitation strategy is dependent on
early experience. Developmental Psychobiology, 2015. Vol. 57, no. 3, pp.
- Fobbs W.C., Mizumori S.J.Y. Cost–Benefit Decision Circuitry: Proposed
Modulatory Role for Acetylcholine. Progress in molecular biology and
translational science, 2014. Vol. 122, pp. 233–261.
- Frank M.J., Hutchison K. Genetic contributions to avoidance-based
decisions: striatal D2 receptor polymorphisms. Neuroscience, 2009. Vol.
164, no. 1, pp. 131–140. DOI:10.1016/j.neuroscience.2009.04.048
- Gehring W.J., Willoughby A.R. The medial frontal cortex and the rapid
processing of monetary gains and losses. Science, 2002. Vol. 295, no.
5563, pp. 2279–2282. DOI:10.1126/science.1066893
- Gold J.I., Shadlen M.N. The neural basis of decision making. Annual
review of neuroscience, 2007. Vol. 30, pp. 535–574.
- Hills V.V. Animal foraging and the evolution of goal‐directed cognition.
Cognitive science, 2006. Vol. 30, no. 1, pp. 3–41.
- Huang Y., Yu R. The feedback-related negativity reflects “more or less”
prediction error in appetitive and aversive conditions. Frontiers in
neuroscience, 2014. Vol. 8, article ID 108, 6 p.
- Jepma M., Nieuwenhuis S. Pupil diameter predicts changes in the
exploration–exploitation trade-off: Evidence for the adaptive gain theory.
Journal of cognitive neuroscience, 2011. Vol. 23, no. 7, pp. 1587–1596.
- Kahneman D., Tversky A. Variants of uncertainty. Cognition, 1982.
Vol. 11, no. 2, pp. 143–157. DOI:10.1016/0010-0277(82)90023-3
- Killeen P.R. Pavlov + Skinner = Premack [Elektronnyi resurs].
International Journal of Comparative Psychology, 2014. Vol. 27, no. 4,
pp. 544–568. URL:
- McDannald M.A. et al. Learning theory: a driving force in understanding
orbitofrontal function. Neurobiology of learning and memory, 2014. Vol.
108, pp. 22–27. DOI:10.1016/j.nlm.2013.06.003
- Zhang D. et al. Linking brain electrical signals elicited by current
outcomes with future risk decision-making. Frontiers in behavioral
neuroscience, 2014. Vol. 8, article ID 34, 15 p.
- Linson A., Parr V., Friston K.J. Active inference, stressors, and
psychological trauma: A neuroethological model of (mal) adaptive
explore-exploit dynamics in ecological context. Behavioural Brain
Research, 2020. Vol. 380, pp. 112–421. DOI:10.1016/j.bbr.2019.112421
- Aston-Jones G. et al. Locus coeruleus neurons in monkey are selectively
activated by attended cues in a vigilance task. Journal of Neuroscience,
1994. Vol. 14, no. 7, pp. 4467–4480.
- Mansouri F.A. et al. Managing competing goals – a key role for the
frontopolar cortex. Nature Reviews Neuroscience, 2017. Vol. 18, no. 11,
pp. 645–657. DOI:10.1038/nrn.2017.111
- Mata R., Wilke A., Czienskowski U. Foraging across the life span: is there
a reduction in exploration with aging? Frontiers in neuroscience, 2013.
Vol. 7, article ID 34, 7 p. DOI:10.3389/fnins.2013.00053
- McClure S.M., Gilzenrat M.S., Cohen J.D. An exploration-exploitation model
based on norepinephrine and dopamine activity [Elektronnyi resurs]. In Weiss
Y., Schölkopf B., Platt J.C. (eds.), Advances in neural information
processing systems: proceedings from the conference "Neural Information
Processing Systems 2005", 2006, pp. 867–874. URL:
- Miller E.K., Cohen J.D. An integrative theory of prefrontal cortex
function. Annual review of neuroscience, 2001. Vol. 24, pp. 167–202.
- Miltner W.H.R., Braun C.H., Coles M.G.H. Event-related brain potentials
following incorrect feedback in a time-estimation task: evidence for a
“generic” neural system for error detection. Journal of cognitive
neuroscience, 1997. Vol. 9, no. 6, pp. 788–798.
- Heil M. et al. N200 in the Eriksen-task: Inhibitory executive process?
Journal of Psychophysiology, 2000. Vol. 14, no. 4, pp. 218–225.
- Pearson J.M. et al. Neurons in posterior cingulate cortex signal
exploratory decisions in a dynamic multioption choice task. Current
biology, 2009. Vol. 19, no. 18, pp. 1532–1537.
- Naudé J. et al. Nicotinic receptors in the ventral tegmental area promote
uncertainty-seeking. Nature neuroscience, 2016. Vol. 19, no. 3, pp.
- Onge J.R.S., Abhari H., Floresco S.B. Dissociable contributions by
prefrontal D1 and D2 receptors to risk-based decision making. Journal of
Neuroscience, 2011. Vol. 31, no. 23, pp. 8625–8633.
- Stopper C.M. et al. Overriding phasic dopamine signals redirects action
selection during risk/reward decision making. Neuron, 2014. Vol. 84, no.
1, pp. 177–189. DOI:10.1016/j.neuron.2014.08.033
- Padoa-Schioppa C., Conen K.E. Orbitofrontal cortex: a neural circuit for
economic decisions. Neuron, 2017. Vol. 96, no. 4, pp. 736–754.
- Parr V., Friston K.J. Uncertainty, epistemics and active inference.
Journal of The Royal Society Interface, 2017. Vol. 14, no. 136, 10 p.
- Lee M.D. et al. Psychological models of human and optimal performance in
bandit problems. Cognitive Systems Research, 2011. Vol. 12, no. 2, pp.
- Bartholow B.D. et al. Psychophysiological evidence of response conflict and
strategic control of responses in affective priming. Journal of Experimental
Social Psychology, 2009. Vol. 45, no. 4, pp. 655–666.
- Rakow V., Newell B.R., Zougkou K. The role of working memory in information
acquisition and decision making: Lessons from the binary prediction task.
The Quarterly Journal of Experimental Psychology, 2010. Vol. 63, no. 7,
pp. 1335–1360. DOI:10.1080/17470210903357945
- Kiebel S.J. et al. Recognizing sequences of sequences. PLoS
computational biology, 2009. Vol. 5, no. 8, 14 p.
- Laviola G. et al. Risk-taking behavior in adolescent mice: psychobiological
determinants and early epigenetic influence. Neuroscience &
Biobehavioral Reviews, 2003. Vol. 27, no. 1–2, pp. 19–31.
- Badre D. et al. Rostrolateral prefrontal cortex and individual differences
in uncertainty-driven exploration. Neuron, 2012. Vol. 73, no. 3, pp.
- Sara S.J. The locus coeruleus and noradrenergic modulation of cognition.
Nature reviews neuroscience, 2009. Vol. 10, no. 3, pp. 211–223.
- Slovic P. Risk-taking in children: Age and sex differences. Child
Developmen, 1966. Vol. 37, no. 1, pp. 169–176. DOI:10.2307/1126437
- Smith A.P., Beckmann J.S., Zentall V.R. Gambling-like behavior in
pigeons:‘jackpot’signals promote maladaptive risky choice. Scientific
reports, 2017. Vol. 7, no. 1, pp. 1–11. DOI:10.1038/s41598-017-06641-x
- Addicott M.A. et al. Smoking and the bandit: A preliminary study of smoker
and nonsmoker differences in exploratory behavior measured with a multiarmed
bandit task. Experimental and clinical psychopharmacology, 2013. Vol.
21, no. 1, pp. 66–73. DOI:10.1037/a0030843
- Steyvers M., Lee M.D., Wagenmakers E.J. A Bayesian analysis of human
decision-making on bandit problems. Journal of Mathematical Psychology,
2009. Vol. 53, no. 3, pp. 168–179. DOI:10.1016/j.jmp.2008.11.002
- Warren C.M. et al. The effect of atomoxetine on random and directed
exploration in humans. PloS one, 2017. Vol. 12, no. 4, 17 p.
- Usher M. et al. The role of locus coeruleus in the regulation of cognitive
performance. Science, 1999. Vol. 283, no. 5401, pp. 549–554.
- Jepma M. et al. The role of the noradrenergic system in the
exploration-exploitation trade-off: a pharmacological study. Frontiers in
human neuroscience, 2010. Vol. 4, article ID 170, 13 p.
- Laureiro‐Martínez D. et al. Understanding the exploration–exploitation
dilemma: An fMRI study of attention control and decision‐making performance.
Strategic Management Journal, 2015. Vol. 36, no. 3, pp. 319–338.
- Mehlhorn K. et al. Unpacking the exploration–exploitation tradeoff: A
synthesis of human and animal literatures. Decision, 2015. Vol. 2, no.
3, pp. 191–215. DOI:10.1037/dec0000033
- Verdolin J.L. Meta-analysis of foraging and predation risk trade-offs in
terrestrial systems. Behavioral Ecology and Sociobiology, 2006. Vol. 60,
no. 4, pp. 457–464. DOI:10.1007/s00265-006-0172-6
- Yuki S., Okanoya K. Rats show adaptive choice in a metacognitive task with
high uncertainty. Journal of Experimental Psychology: Animal Learning and
Cognition, 2017. Vol. 43, no. 1, pp. 109–118. DOI:10.1037/xan0000130
- Zentall V.R. An animal model of human gambling based on pigeon suboptimal
choice [Elektronnyi resurs]. Research & Reviews: Neuroscience, 2017.
Vol. 1, no. 2, pp. 27–37. URL:
- Zentall V.R. Suboptimal choice by pigeons: An analog of human gambling
behavior. Behavioural processes, 2014. Vol. 103, pp. 156–164.