Close
This site uses cookies

By using this site, you consent to our use of cookies. You can view our terms and conditions for more information.

Return to Session

Effects of Decision Complexity in Goal-seeking Gridworlds: A Comparison of Instance-Based Learning and Reinforcement Learning Agents

Authors
Thuy Ngoc Nguyen
Carnegie Mellon University ~ Dynamic Decision Making Lab
Cleotilde (Coty) Gonzalez
Carnegie Mellon University ~ Social and Decision Sciences Department
Abstract

Decisions under uncertainty are often made by weighing the expected costs and benefits of the available options. The tradeoffs of costs and benefits make some decisions easy and some difficult, particularly given uncertainty of these costs and rewards. In this research, we evaluate how a cognitive model based on Instance-Based Learning Theory (IBLT) and two well-known reinforcement learning (RL) algorithms learn to make better choices in a goal-seeking gridworld task under uncertainty and on increasing degrees of decision complexity. We also use a random agent as a base level comparison. Our results suggest that IBL and RL models are comparable in their accuracy levels on simple settings, but the RL models are more efficient than the IBL model. However, as decision complexity increases, the IBL model is not only more accurate but also more efficient than the RL models. Our results suggest that the IBL model is able to pursue highly rewarding targets even when the costs increase; while the RL models seem to get "distracted" by lower costs, reaching lower reward targets.

Discussion
New

Cool stuff! I was wondering about two things: - which component of IBL is especially useful for improving performance in a complex world: bias, blending or decay? - and if the agent does not find the highest reward, do they tend to find the second-highest reward and does this differ between algorithms?

Marieke Van Vugt 0 comments
Learning in grid world Last updated 3 months ago

my former student and I also examined learning in a grid world. We used random start positions and random goal positions. the person could see the start and goal, and then had to learn to find the optimal route. they learned this well. During a test phase, we suddenly blocked the optimal route to see if they could use the second optimum, which they...

Jerome Busemeyer 0 comments