Deep generative models as perceptual front-ends for decision-making
Evidence integration models such as the Drift-diffusion model (DDM) are extremely successful in accounting for reaction time distributions and error rates in decision making. However, these models do not explain how evidence, represented by the drift, is extracted from the stimuli. Models of low-level vision, such as template-matching models, propose mechanisms by which evidence is generated but do not account for RT distributions. We propose a model of the perceptual front-end, implemented as Deep Generative Model, that learns to represent visual inputs in a low-dimensional latent space. Evidence in favour of different choices can be gathered by sampling from these latent variables and feeding them to an integration-to-threshold model. Under some weak assumptions this architecture implements an SPRT test. Therefore, it can be used to provide an end-to-end computational account of reaction-time distributions as well as error-rates. In contrast to DDMs, this model can explain how drift and diffusion rates arise rather than infer them from behavioural data. We show how to generate predictions using this model for perceptual decisions in visual noise and how these depend on different architectural constraints and the learning history. The model thus provides both an explanation of how evidence is generated from any given input and how architectural constraints and learning affect this process. These effects can then be measured through the observed error rates and reaction-time distributions. We expect this approach to allow us to bridge the gap between the complementary, yet rarely interacting literature of decision-making, visual perceptual learning and low-level vision/psychophysics.