Differences in learning process dynamics when rewards are familiar versus instructed
Most reinforcement learning (RL) experiments use familiar reinforcers, such as food or money, which are relatively objectively rewarding. However, in everyday life, teaching signals are rarely so straightforward --- often we must learn from the achievement of subgoals (e.g., high heat must be achieved before cooking), or from feedback that we have been instructed to perceive as reinforcement, yet is not intrinsically rewarding (e.g., grades). As such, investigating how similar the dynamics of learning from familiar rewards, which are well-studied, are to the dynamics of learning from subgoals and instructed rewards, which are more realistic, can help us to understand the ecological validity of laboratory reinforcement learning research.In this talk, we discuss our recent work investigating these potential similarities using computational modeling, while emphasizing individual differences. In our experiment, participants completed a probabilistic RL task, comprising multiple interleaved two-armed bandit problems, and an N-back task. Some bandits were learned using points, a familiar reward, while others were learned based on whether their selection lead to a “goal image” unique to each trial, an instructed reward. In the instructed condition, participants tended to learn more slowly, and each participant’s performance correlated with their working memory ability. Hierarchical Bayesian model comparison revealed that differences in behavior due to feedback type were best explained by a lower learning rate for instructed rewards, although this effect was reversed or absent for some participants. These strong individual differences suggest that differences in learning dynamics between familiar and instructed rewards may not be universally applicable.
Keywords
Topics
Dear Dr. Beth, thank you very much for your presentation, was excellent. It is a fascinating, enlightening, and very relevant topic. I have a couple of questions. The first question has to do with what you mention about the interactions of different processes. I agree with you that study of the interaction of cognitive processes (simultaneously) i...
Excellent talk, very interesting. Just curious whether you also considered response bias in the n-back task, e.g. bias in SDT in your model. You mentioned a model using keystrokes, so maybe the tendency of being a yes or no sayer (liberal vs conservative bias) in the n-back task will also affect performance in the bandit task.
Cite this as: