Posters: Statistics & Bayesian Models
Ms. Ruslana Tymchyk
Dr. Henrik Singmann
Logistic regression models are often recommended for analysing binary response variables, such as accuracy, that commonly arise in psychological research designs. One of the main reasons for this recommendation are simulation studies showing that binomial logistic models outperform ordinary ANOVA models when simulating from a binomial logistic model. However, such a simulation setup is at risk of circularity as the logistic model is both the data generating and winning candidate model. To overcome this limitation, we compared different candidate models when simulating from two data generating models – a binomial logistic model and the Ratcliff diffusion model. For each simulation study, we simulated a two-group between-participants design with 30 participants per condition. We also varied the number of observations per simulated participant, either 1 observation or 100 observations. We then compared the type I error rates (i.e., the proportion of false positive errors) from three popular candidate methods, linear regression (ANOVA), generalised binomial logistic regression (GLM), and generalised binomial logistic mixed models (GLMM). Our results suggested that ANOVA shows the best performance in terms of type I errors across different simulation setups and data generated by both logistic and diffusion models. For the GLM, the type I error rate was around 0.05 only for 1 observation per participant and severely anti-conservative (i.e., too high type I error) for 100 observations. GLMM yielded an acceptable type I error rate with 100 observations per participant but varied amounts of type I errors dependent on the data generation models with 1 observation per participant. When simulating from the logistic model GLMM produced acceptable type I error rates but too high type I error rates when simulating from the diffusion model. Additionally, the type I errors from GLMM with 1 observation per participant increased as the overall performance level approached the boundary of the parameter space. Overall, our results suggest that in terms of type I error rates, ANOVA generally perform better than logistic models in most cases and the performance of logistic models depend exactly on the simulation setup.
Dr. Henrik Singmann
Mr. Max Maier
Currently, researchers need to choose between one of two different statistical frameworks, a frequentist or Bayesian approach. Frequentist inference – null hypothesis significance testing – is the de-facto standard. It is computationally relatively cheap and comparatively convenient as it does not require the researcher to specify a prior on the effect to be tested. Bayesian inference is becoming increasingly popular, in large parts due to easy-to-use software such as brms that make it easy to estimate complex models with little programming. In contrast, where Bayesian estimation is convenient even for quite complex models, Bayesian testing via Bayes factors is computationally expensive and rather cumbersome as it requires the specification of a prior that can largely influence results. We evaluate a compromised approach that combines Bayesian estimation with frequentist testing: Bayesian-frequentist p-values, where Bayesian model estimation is combined with frequentist Wald-based p-values. To assess this combination, we examine the type I error rates of Bayesian-frequentist p-values across three different settings: regular analysis of variance (ANOVA), logistic regression, and logistic mixed-model designs. Our results showed that Bayesian models with improper flat priors produced nominal type I error rates mirroring the behaviour of frequentist models across all designs. However, non-zero-centred priors resulted in too high (i.e., anti-conservative) rates of type I errors and zero-centred models produced low (i.e., conservative) rates of type I error, with the degree of conservativity depending on the width of the prior. Overall, our results indicate that frequentist testing can be combined with Bayesian estimation if the prior is relatively non-informative. Bayesian-frequentist p-values offer an attractive alternative to researchers, combining the ease of frequentist testing with the convenience and flexibility of Bayesian estimation.
Lucas Castillo
Johanna Falben
Prof. Adam Sanborn
The Autocorrelated Bayesian Sampler (ABS, Zhu et al., 2021) is a sequential sampling model that assumes people draw autocorrelated samples from memory of hypotheses according to their posterior beliefs, producing choices, response times, confidence judgments, estimates, confidence intervals, and probability judgments. Decisional evidence accumulation times are exponentially distributed and samples are aggregated until those in favour of one response category exceed those in favour of the other, then the favoured option is chosen. While this mechanism qualitatively accounts for a range of effects of accuracy and response time (e.g., fast and slow errors), it has never been quantitively evaluated. Therefore, we compared the ABS with the well-established and widely-used Drift Diffusion Model (DDM, Ratcliff, 1978; Ratcliff & McKoon, 2008; Ratcliff & Rouder, 1998) to investigate the strengths and limitations of the ABS. We fit both models to the data from Murphy et al.’s (2014) research, a random dot motion task, using a Bayesian form of quantile maximum likelihood (Heathcote et al., 2002) to evaluate how well the models account for the data. Comparing the two models will illustrate how differences in their assumptions and approaches affect their performance in different scenarios, and point to what is necessary to make the ABS competitive with the best models of accuracy and response times.
Dr. Jelmer Borst
Jacolien van Rij
Pupil dilation time courses are assumed to be a slow and indirect reflection of the latent cognitive events involved in task performance. Additive models of pupil dilation can be used to recover these events through deconvolution, promising a more precise study of cognitive processes. To this end, the conventional deconvolution method assumes that cognitive events all trigger a delayed pupil response. The weighted sum of these individual responses is then believed to be reflected in the pupil dilation time course. Importantly, the conventional method typically assumes the same shape for the pupil responses elicited by all events. Additionally, the method is usually applied to averaged time courses. Thus, it neglects the possibility that the timing between events and the shape of the response differs not just between subjects but also between trials and even different cognitive events. However, accounting for trial and event-level variability is crucial to achieve precise recovery of latent events and thereby a detailed understanding of cognitive processing. Moreover, accounting for trial-level variability is necessary when investigating how trial-level predictors (e.g., continuous word frequency) influence cognitive processes involved in task performance. To ensure a precise recovery of latent cognitive events, we propose an extended model that combines generalized additive mixed models with Hidden semi-Markov models. We will show that despite the added complexity the model recovers parameters accurately and that the risk of overfitting is minimized through efficient and automatic regularization. Finally, we will apply this model to data from a lexical decision experiment in which participants processed words and two types of non-words which differed in their frequency (approximated with Google result counts), to investigate the cognitive events involved in lexical decisions and how they are affected by word type and frequency manipulations.
Juergen Heller
'Make the light as bright as the sound is loud.' This is a typical instruction in experiments dealing with the cross-modal matching of stimuli. According to Luce's (Luce, Steingrimsson, & Narens, 2010) theory of global psychophysics, in such a cross-modal task the perceived stimulus intensities are judged against respondent-generated internal reference intensities, all represented on a common psychological scale. Heller (2021) generalizes Luce's theory by distinguishing the internal references with respect to their role in the experimental setup, that is, whether they pertain to the standard or to the variable stimulus in the matching task. By testing Heller's generalization of Luce's theory of global psychophysics on cross-modal data, the present study aims at thoroughly investigating the role-sensitivity of the internal reference intensities. For achieving this, it replicates a classical experiment by Stevens and Marks (1965), who made participants adjust the brightness of a light to the perceived loudness of a noise sound and vice versa. This allows for complementing the traditional group-level analysis by evaluating the data at the individual level, and for fitting the global psychophysical model to the data in a cognitive modeling approach. We find that on the individual level, the cross-modal matching curves differ in slope, and show a regression effect as reported in the classical literature. This implies role-dependent reference intensities as suggested by Heller's model. In order to experimentally manipulate the internal references' role-(in)dependence, an alternative psychophysical method is discussed. Using an adaptive staircase procedure within the method of constant stimuli, and if instructed to choose the more intense stimulus, the subject is not aware which of the stimuli is the standard and which the variable stimulus. Under these conditions the internal references are expected to be role-independent, and the regression effect should vanish. Heller, J. (2021). Internal references in cross-modal judgments: A global psychophysical perspective. Psychological Review, 128(3), 509–524. https://doi.org/10.1037/rev0000280 Luce, R. D., Steingrimsson, R., & Narens, L. (2010). Are psychophysical scales of intensities the same or different when stimuli vary on other dimensions? Theory with experiments varying loudness and pitch. Psychological Review, 117(4), 1247–1258. https://doi.org/10.1037/a0020174 Stevens, J. C., & Marks, L. E. (1965). Cross-modality matching of brightness and loudness. Proceedings of the National Academy of Sciences of the United States of America, 54(2), 407.
Frank E Ritter
For both designing interfaces and understanding learning, it is important to include error analysis to understand where time goes and how learning happens. Using data from a previous study (Ritter et al., 2022), this paper examines errors that participants make while doing a broken component-finding task. This study chose data from the testing session after one or more training sessions. Errors for each task and each partici¬pant were analyzed, including the misreplaced components for each task. Different from previous text editing tasks where errors were analyzed, this fault-finding task only needed participants to move the mouse and click on the broken component in an interface, so we came up with similar but different error categorization from previous literature. We also present an updated strategy model that generates errors and corrects errors for participant 421, ending up having a better correlation with the participant’s performance.
Asli Kilic
In the memory literature, paired associates and list recall have been studied separately. Recall probabilities of forward and backward recalls have been found approximately equal in paired associates. Whereas in free recall, subjects tend to successively recall words studied in nearby positions, denoted as the contiguity effect, favoring the following word over the preceding one. Temporal Context Model (TCM) proposes that items studied in nearby positions have similar study contexts and recalling an item activates its context along with its neighbors’ which results in the contiguity effect and forward asymmetry. Kılıç et. al. (2013) developed a probed recall task to test the contiguity effect by interrupting the linearity of the experimental procedure. In the current study, we employed their probed recall task on the paired associates where participants studied multiple lists of pairs. At test, they were given a pair to recognize and required to go back to the list that the member was presented in and recall another word from the list. Conditional response probability (CRP) curves indicated both within and between list contiguity with the forward asymmetry, however a symmetric retrieval was observed in paired associates. These two patterns of recall data from the probed recall task are in line with the previous findings in the literature of paired associates and list recall patterns which fits the contextual coding mechanism of TCM.
Dr. Raphael Hartmann
Prof. Christoph Klauer
Signal detection theory (SDT) is one of the most influential modeling frameworks in psychological research. One of its main contributions is the possibility to disentangle two central components in decisions under uncertainty: sensitivity, the ability to differentiate between signal and noise, and response bias, a tendency to favor one decision over the other. When applying such models to common psychological data comprising multiple trials of multiple participants, multilevel modeling is considered the state-of-the-art in psychological research. While the estimation of non-linear multilevel models such as SDT models is usually done in a Bayesian framework, this is not necessary to benefit from the advantages of this modeling approach: Multilevel SDT models can, in principle, also be fitted using maximum likelihood (ML) – although this is rarely done in practice, presumably due to the lack of appropriate software for doing so. We present our work on an R package that is aimed at supporting the straightforward application of this approach for researchers applying SDT. To fit multilevel SDT models using ML, we exploit the equivalence of SDT models and a subclass of generalized linear models (GLMs; DeCarlo, 1998). GLMs can easily be extended to multilevel models by including random effects in the model, yielding generalized linear mixed models (GLMMs). Thereby, multilevel SDT models can be fitted with ML by using commonly-known software packages for fitting GLMMs. Our R package allows one to fit different variants of multilevel SDT models with sensitivity and response bias parameters that can vary according to user-specified predictor variables and different sources of random variation. It "translates" the given SDT model to a GLMM, selects an appropriate random-effects structure, estimates the parameters, and transforms the parameter estimates for both population and subject level back to the SDT framework. In addition, likelihood ratio tests for given predictors can be calculated. We demonstrate the validity of our implementation through simulation studies.
Submitting author
Author