Reinforcement learning in probabilistic environment and its role in human adaptive and maladaptive behavior

1088

Abstract

The article discusses human training in conditions of partly uncertain outcomes of his/her actions that models one of the mechanisms of adaptive behavior in natural environment. Basic learning mechanisms are studied in details through modelling conditional reflexes of animals in experiments, where a certain behavior is reinforced similarly, immediately and repeatedly. At the same time, neurophysiological foundations of learning opportunities in humans under conditions of irregular or delayed reinforcements, despite increased interest to them in recent years, remain poorly studied. Research of mental and neuropsychiatric disorders has made a significant contribution to the development of this problem. Thus, the specific changes in some aspects of learning with probabilistic reinforcement were found in patients with Parkinson's disease, Tourette's syndrome, schizophrenia, depression, and anxiety disorders. In particular, it is shown that susceptibility to positive and negative reinforcement can be violated independently. Taking into consideration the pathogenetic mechanisms of these conditions, it can be concluded that the key structure for this type of training is the cingulate cortex and orbto-frontal cortex involved in bilateral interaction with underlying structures of striatal system, the limbic system and cores of reticular formations of the brain stem.

General Information

Keywords: reinforcement learning, uncertainty, prediction error, frontal cortex, dopamine, serotonin, norepinephrine, mental disorders

Journal rubric: Educational Psychology and Pedagogical Psychology

DOI: https://doi.org/10.17759/jmfp.2016050409

For citation: Kozunova G.L. Reinforcement learning in probabilistic environment and its role in human adaptive and maladaptive behavior [Elektronnyi resurs]. Sovremennaia zarubezhnaia psikhologiia = Journal of Modern Foreign Psychology, 2016. Vol. 5, no. 4, pp. 85–96. DOI: 10.17759/jmfp.2016050409. (In Russ., аbstr. in Engl.)

References

  1. Sagvolden T. et al. A dynamic developmental theory of attention-deficit/hyperactivity disorder (ADHD) predominantly hyperactive/impulsive and combined subtypes. Behavioral and Brain Sciences, 2005. Vol. 28, no. 3, pp. 397–418. doi: 10.1017/S0140525X05000075
  2. Steinberg E.E. et al. A causal link between prediction errors, dopamine neurons and learning. Nature neuroscience, 2013. Vol. 16, no. 3, pp. 966–973. doi: 10.1038/nn.3413
  3. Qi J. et al. A glutamatergic reward input from the dorsal raphe to ventral tegmental area dopamine neurons. Nature communications, 2014. Vol. 5, art. 5390. doi: 10.1038/ncomms6390
  4. Alloy L.B., Tabachnik N. Assessment of covariation by humans and animals: The joint influence of prior expectations and current situational information. Psychological review, 1984. Vol. 91, no. 1, pp. 112–149. doi: 10.1037/0033-295X.91.1.112
  5. Der-Avakian A. et al. Assessment of reward responsiveness in the response bias probabilistic reward task in rats: implications for cross-species translational research. Translational psychiatry, 2013. Vol. 3, no. 8. doi: 10.1038/tp.2013.74
  6. Aston-Jones G., Cohen J.D. An integrative theory of locus coeruleus-norepinephrine function: Adaptive gain and optimal performance. Annual Review of Neuroscience, 2005. Vol. 28, pp. 403–450. doi: 10.1146/annurev.neuro.28.061604.135709
  7. Balsam P.D., Drew M.R., Yang C. Timing at the start of associative learning. Learning and Motivation, 2002. Vol. 33, no. 1, pp. 141–155. doi: 10.1006/lmot.2001.1104
  8. Bayer H.M., Glimcher P.W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 2005. Vol. 47, no. 1, pp. 129–141. doi: 10.1016/j.neuron.2005.05.020
  9. Bayer H.M., Lau B., Glimcher P.W. Statistics of midbrain dopamine neuron spike trains in the awake primate. Journal of Neurophysiology, 2007. Vol. 98, no. 3, pp. 1428–1439. doi: 10.1152/jn.01140.2006
  10. Bouret S., Richmond B.J. Sensitivity of locus ceruleus neurons to reward value for goal-directed actions. The Journal of Neuroscience, 2015. Vol. 35, no. 9, pp. 4005–4014. doi: 10.1523/JNEUROSCI.4553-14.2015
  11. Bourgeois A., Chelazzi L., Vuilleumier P. How motivation and reward learning modulate selective attention. Progress in Brain Research, 2016. Vol. 229, pp. 325–342. doi: 10.1016/bs.pbr.2016.06.004
  12. Cartoni E., Puglisi-Allegra S., Baldassarre G. The three principles of action: A Pavlovian-instrumental transfer hypothesis. Frontiers in behavioral neuroscience, 2013. Vol. 7, pp. 1–11. doi: 10.3389/fnbeh.2013.00153
  13. Conway C.M., Christiansen M.H. Sequential learning in non-human primates. Trends in cognitive sciences, 2001. Vol. 5, no. 12, pp. 539–546. doi: 10.1016/S1364-6613(00)01800-3
  14. Corbetta M., Patel G., Shulman G.L. The reorienting system of the human brain: From environment to theory of mind. Neuron, 2008. Vol. 58, no. 3, pp. 306–324. doi: 10.1016/j.neuron.2008.04.017
  15. Cytawa J., Trojniar W. The state of pleasure and its role in instrumental conditioning. Activitas nervosa superior, 1976. Vol. 18, no. 1–2, pp. 92–96.
  16. Dayan P., Berridge K.C. Model-based and model-free Pavlovian reward learning: Revaluation, revision, and revelation. Cognitive, Affective, & Behavioral Neuroscience, 2014. Vol. 14, no. 2, pp. 473–492. doi: 10. /3758s13415-014-0277-8
  17. Dickinson A., Watt A., Griffiths W.J.H. Free-operant acquisition with delayed reinforcement. Comparative and Physiological Psychology, 1992. Vol. 45, no. 3, pp. 241–258.
  18. Heinz A. et al. Dimensional psychiatry: Mental disorders as dysfunctions of basic learning mechanisms. Journal of Neural Transmission, 2016. Vol. 123, no. 8, pp. 809–821. doi: 10.1007/s00702-016-1561-2
  19. Roiser J.P. et al. Do patients with schizophrenia exhibit aberrant salience? Psychological medicine, 2009. Vol. 39, no. 2, pp. 199–209. doi: 10.1017/s0033291708003863
  20. Liu Z. et al. Dorsal raphe neurons signal reward through 5-HT and glutamate. Neuron, 2014. Vol. 81, no. 6, pp. 1360–1374. doi: 10.1016/j.neuron.2014.02.010
  21. Frank M.J., Seeberger L.C., O'reilly R.C. By carrot or by stick: Cognitive reinforcement learning in parkinsonism. Science, 2004. Vol. 306, no. 5703, pp. 1940–1943. doi: 10.1126/science.1102941
  22. VanElzakker M.B. et al. From Pavlov to PTSD: The extinction of conditioned fear in rodents, humans, and anxiety disorders. Neurobiology of learning and memory, 2014. Vol. 113, pp. 3–18. doi: 10.1016/j.nlm.2013.11.014
  23. Gallistel C.R., Fairhurst S., Balsam P. The learning curve: Implications of a quantitative analysis. Proceedings of the national academy of Sciences of the united States of america, 2004. Vol. 101, no. 36, pp. 13124-13131. doi: 10.1073/pnas.0404965101
  24. Gershman S.J. A Unifying Probabilistic View of Associative Learning. PLoS Computational Biology, 2015. Vol. 11, no. 11, pp. 1–20. doi: 10.1371/journal.pcbi.1004567
  25. Guillin O., Abi‐Dargham A., Laruelle M. Neurobiology of dopamine in schizophrenia. International review of neurobiology, 2007. Vol. 78, pp. 1–39. doi: 10.1016/S0074-7742(06)78001-1
  26. Hinson J.M., Staddon J.E.R. Matching, maximizing, and hill‐climbing. Journal of the experimental analysis of behavior, 1983. Vol. 40, no. 3, pp. 321–331. doi: 10.1901/jeab.1983.40-321
  27. Hofmeister J., Sterpenich V. A role for the locus ceruleus in reward processing: encoding behavioral energy required for goal-directed actions. The Journal of Neuroscience, 2015. Vol. 35, no. 29, pp. 10387–10389. doi: 10.1523/JNEUROSCI.1734-15.2015
  28. Holroyd C.B., Coles M.G.H. The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. Psychological review, 2002. Vol. 109, no. 4, pp. 679–709. doi: 10.1037/0033-295X.109.4.679
  29. Homberg J.R. Serotonin and decision making processes. Neuroscience & Biobehavioral Reviews, 2012. Vol. 36, no. 1, pp. 218–236. doi: 10.1016/j.neubiorev.2011.06.001
  30. Kirkpatrick K., Balsam P.D. Associative learning and timing. Current opinion in behavioral sciences, 2016. Vol. 8, pp. 181–185. doi: 10.1016/j.cobeha.2016.02.023
  31. Ma W.J., Jazayeri M. Neural coding of uncertainty and probability. Annual review of neuroscience, 2014. Vol. 37, pp. 205–220. doi: 10.1146/annurev-neuro-071013-014017
  32. Maia T.V., Frank M.J. From reinforcement learning models to psychiatric and neurological disorders. Nature neuroscience, 2011. Vol. 14, no. 2, pp. 154–162. doi: 10.1038/nn.2723
  33. Molet M., Miller R.R. Timing: An attribute of associative learning. Behavioural processes, 2014. Vol. 101, pp. 4–14. doi: 10.1016/j.beproc.2013.05.015
  34. Crone E.A. et al. Neural mechanisms supporting flexible performance adjustment during development. Cognitive, Affective, & Behavioral Neuroscience, 2008. Vol. 8, no. 2, pp. 165–177. doi: 10.3758/CABN.8.2.165
  35. Garbusow M. et al. Pavlovian-to-instrumental transfer in alcohol dependence: A pilot study. Neuropsychobiology, 2014. Vol. 70, no. 2, pp. 111–121. doi: 10.1159/000363507
  36. Palminteri S. et al. Pharmacological modulation of subliminal learning in Parkinson's and Tourette's syndromes. Proceedings of the National Academy of Sciences, 2009. Vol. 106, no. 45, pp. 19179–19184. doi: 10.1073/pnas.0904035106
  37. Reddy L.F. et al. Probabilistic reversal learning in schizophrenia: Stability of deficits and potential causal mechanisms. Schizophrenia bulletin, 2016. Vol. 42, no. 4, pp. 942–951. doi: 10.1093/schbul/sbv226
  38. Nieuwenhuis S. et al. Reinforcement-related brain potentials from medial frontal cortex: Origins and functional significance. Neuroscience & Biobehavioral Reviews, 2004. Vol. 28, no. 4, pp. 441–448. doi: 10.1016/j.neubiorev.2004.05.003
  39. Robinson J.S. Stimulus substitution and response learning in the earthworm. Journal of comparative and physiological psychology, 1953. Vol. 46, no. 4, pp. 262–266. doi: 10.1037/h0056151
  40. Saffran J.R., Aslin R.N., Newport E.L. Statistical learning by 8-month-old infants. Science. 1996. Vol. 274, no. 5294, pp. 1926–1928.
  41. Schultz W. Predictive reward signal of dopamine neurons. Journal of neurophysiology, 1998. Vol. 80, no. 1, pp. 1–27.
  42. Izquierdo A. et al. The neural basis of reversal learning: An updated perspective. Neuroscience, 2016. doi: 10.1016/j.neuroscience.2016.03.021
  43. Ferdinand N.K. et al. The processing of unexpected positive response outcomes in the mediofrontal cortex. The Journal of Neuroscience, 2012. Vol. 32, no. 35, pp. 12087–12092. doi: 10.1523/JNEUROSCI.1410-12.2012
  44. Thorndike E.L. Animal intelligence: Experimental studies. Transaction Publishers, 1965.
  45. Walsh M.M., Anderson J.R. Learning from delayed feedback: Neural responses in temporal credit assignment. Cognitive, Affective, & Behavioral Neuroscience, 2011. Vol. 11, no. 2, pp. 131–143. doi: 10.3758/s13415-011-0027-0
  46. Weismüller B., Bellebaum C. Expectancy affects the feedback‐related negativity (FRN) for delayed feedback in probabilistic learning. Psychophysiology, 2016. Vol. 53, no. 11, pp. 1739–1750. doi: 10.1111/psyp.12738
  47. Wolford G., Miller M.B., Gazzaniga M. The left hemisphere’s role in hypothesis formation [Electronic resource]. Journal of Neuroscience, 2000. Vol. 20, no. 6, pp. 1–4. URL: http://www.jneurosci.org/content/jneuro/20/6/RC64.full.pdf (Accessed 27.12.2016).
  48. Yellott J.I. Probability learning with noncontingent success. Journal of mathematical psychology, 1969. Vol. 6, no. 3, pp. 541–575. doi: 10.1016/0022-2496(69)90023-6

Information About the Authors

Galina L. Kozunova, PhD in Psychology, Centre for Neuro-Cognitive Studies (MEG-center), Moscow State University of Psychology and Education, Moscow, Russia, ORCID: https://orcid.org/0000-0002-1286-8654, e-mail: kozunovagl@mgppu.ru

Metrics

Views

Total: 1672
Previous month: 4
Current month: 0

Downloads

Total: 1088
Previous month: 2
Current month: 0