A rate-distortion theory analysis of human bounded rational learning
In recent years, computational reinforcement learning (RL) has become an influential normative framework for understanding human learning and decision-making. However, unlike the RL algorithms developed in machine learning, human learners face strict limitations in terms of information processing capacity. For example, human learning performance decreases as the number of possible states of the environment increases, even when controlling for the amount of experience with each environmental state. Collins and Frank [2012; European Journal of Neuroscience, 35(7), 1024-1035] demonstrated this experimentally in a simple instrumental learning task. Different conditions of their experiment manipulated the “set size” of visual stimuli to which subjects had to respond, and they showed that learning efficiency decreased monotonically with set size in a manner incompatible with standard RL algorithms. They interpreted the sub-optimality of human learning performance in terms of decay in human working memory. Our work proposes an alternative explanation for this phenomenon, based on the idea of bounded rationality. We propose that human learners navigate a trade-off between maximizing task performance, and minimizing the complexity of the learned action policy, where policy complexity is formalized in terms of information theory. We apply an RL model with this approach to the Collins and Frank dataset and we achieve a comparable fit to their models. The modeling result shows consistency with our hypothesis: human learners trade part of expected utility for simpler action policy due to their own information processing limitations.
Keywords
Topics
I have two questions: (1) I am very interested to see how your hypothesis --- that with many more stimulus iterations, the information theoretic account's predictions might be superior --- fares in a new experiment. Are there any other effects that you believe might distinguish the information theoretic account of RLWM task data from Collins & ...
Cite this as: