Society for Mathematical Psychology

MathPsych/ICCM 2020 Donatello Learning

...

Authors

UC Berkeley ~ Helen Wills Neuroscience Institute & Dept. of Psychology

UC Berkeley, United States of America

Video

Legal disclaimers

Share

Abstract

Most reinforcement learning (RL) experiments use familiar reinforcers, such as food or money, which are relatively objectively rewarding. However, in everyday life, teaching signals are rarely so straightforward --- often we must learn from the achievement of subgoals (e.g., high heat must be achieved before cooking), or from feedback that we have been instructed to perceive as reinforcement, yet is not intrinsically rewarding (e.g., grades). As such, investigating how similar the dynamics of learning from familiar rewards, which are well-studied, are to the dynamics of learning from subgoals and instructed rewards, which are more realistic, can help us to understand the ecological validity of laboratory reinforcement learning research.In this talk, we discuss our recent work investigating these potential similarities using computational modeling, while emphasizing individual differences. In our experiment, participants completed a probabilistic RL task, comprising multiple interleaved two-armed bandit problems, and an N-back task. Some bandits were learned using points, a familiar reward, while others were learned based on whether their selection lead to a “goal image” unique to each trial, an instructed reward. In the instructed condition, participants tended to learn more slowly, and each participant’s performance correlated with their working memory ability. Hierarchical Bayesian model comparison revealed that differences in behavior due to feedback type were best explained by a lower learning rate for instructed rewards, although this effect was reversed or absent for some participants. These strong individual differences suggest that differences in learning dynamics between familiar and instructed rewards may not be universally applicable.

Differences in learning process dynamics when rewards are familiar versus instructed

Keywords

Topics

Cite this as: