Society for Mathematical Psychology

SMP 2024 de Groot | CUBE 216

Language & AI

Dr. Tehilla Ostrovsky
Paul Ungermann
Chris Donkin

We introduce a novel theoretical framework that merges verbal reports with computational cognitive models. This approach leverages the context-sensitive capabilities of advanced large language models (LLMs) to analyze participants’ verbal descriptions of their approaches to performing a task. To facilitate such an analysis, we have developed a JavaScript-based plugin for use with JsPsych that enables real-time speech recognition, where spoken reports are automatically converted into text and subsequently analyzed using LLMs. This tool significantly reduces the typical burden of analysing self-report data. The talk will start by outlining the theoretical framework, emphasising its potential to enrich our understanding of cognitive processes through verbal data. It will then continue with discussing the functionalities of the plugin, ranging from its voice-to-text transcription capabilities, vectorisation, and the application of analytical techniques such as keyword extraction, clustering, and labelling. These features are pivotal for quantitatively assessing verbal data. Additionally, we will share insights from a pilot study conducted to evaluate the efficacy of our software, providing a comprehensive overview of its potential to enhance cognitive modelling research. In summary, this talk provides a generalisable roadmap for researchers interested in collecting verbal reports.

This is an in-person presentation on July 20, 2024 (14:00 ~ 14:20 CEST).

No recording available Join the discussion

Paul Ungermann
Dr. Tehilla Ostrovsky
Chris Donkin

In this presentation, we argue that verbal reports, often overlooked due to their perceived subjectivity and inefficiency for large-scale analysis, can be invaluable in understanding decision-making processes. Drawing from Mechera-Ostrovsky’s framework, we demonstrate that such reports can validate the formal components in cognitive models, as well as explore their more implicit assumptions. To make the collection and analysis of verbal reports more efficient, we introduce a new, user-friendly platform, which is integrated in the jsPsych library, that uses advanced machine-learning methods to capture and automatically analyze verbal reports collected during an experiment. We demonstrate the capabilities of this platform through a case study on a memory task. In this study, we provide a detailed explanation of our data evaluation process, which includes Speech Recognition, Auto-Summarization, and Text Vectorization. We will show how the text-vectorization step involves converting summaries of text into high-dimensional vectors, which then enables the use of numerical methods like clustering, hypothesis testing, and visualization. Our case study serves as a fundamental illustration, presenting a flexible structure that can be conveniently adapted to different pipelines, tasks, and applications. Overall, our approach provides a scalable and accessible alternative for translating qualitative data into quantified data, opening up new options for the way verbal reports can be utilized in cognitive research.

This is an in-person presentation on July 20, 2024 (14:20 ~ 14:40 CEST).

No recording available Join the discussion

Ms. Yuqi Ye
Dr. Lukasz Walasek
Prof. Gordon Brown

People care about their status in the society. But how do they define status? What features and attributes does a high-status individual have? Traditional self-report methods are not well suited for uncovering the multidimensional and implicit meaning or meanings of status. We therefore used natural language processing (NLP) techniques to address these questions. We used Word2Vec embeddings to predict status ratings (obtained from 161 participants) for 350 globally recognized names from the Pantheon 1.0 dataset. We achieved a correlation of .65 between actual and predicted status ratings. We also explored personality traits associated with perceived high status by multiplying the model's weight vector with embedding representations for trait words. We found that the “Intellect” construct (Goldberg, 1990) had the highest similarity, suggesting knowledge and intelligence are key perceived indicators of status for our (student) participants. Moreover, using a bottom-up approach by measuring the similarity between the weight vector and 10,000 common English words, we extracted the 100 adjectives most semantically related to status. Three clusters were identified: culture and art (e.g., cultural, musical, and classic), math and technology (e.g., mathematical, technological and physical), and nationalities/races (e.g., Indian, Asian and Brazilian). We will also report results from an ongoing study that recruit a broader group of participants from Prolific. We will ask them to describe "high status" individuals in their own words, rather than rating given names. This study aims to explore how status is evaluated more generally.

This is an in-person presentation on July 20, 2024 (14:40 ~ 15:00 CEST).

No recording available Join the discussion

Charlotte Cornell
Shuning Jin
Qiong Zhang

We compare storytelling in GPT-3.5, a recent large language model, with human storytelling. Although GPT models are capable of solving novel and challenging tasks and matching human-level performance, it is not well understood if GPT processes information similarly as humans. We hypothesized that GPT differs from humans in the kind of memories it possesses, and thus could perform differently on tasks influenced by memory, such as storytelling. Storytelling is an important task for comparison as GPT becomes an increasingly popular writing and narrative tool. We used an existing dataset of human stories, either recalled or imagined (Sap et al., 2022), and generated GPT stories with prompts designed to align with human instructions. We found that GPT’s stories followed a common narrative flow of the story prompt (analogous to semantic memory in humans) more than details occurring in the specific context of the event (analogous to episodic memory in humans). Furthermore, despite lacking episodic details, GPT-generated stories exhibited language with greater word affect (valence, arousal, and dominance). When provided with examples of human stories (through few-shot prompting), GPT was not able to align its stories’ narrative flow with human recalled stories, nor did it match its affective aspects with either human imagined or recalled stories. We discuss these results in relation to GPT’s training data as well as the way it was trained.

This is an in-person presentation on July 20, 2024 (15:00 ~ 15:20 CEST).

No recording available Join the discussion

Presenting author
Submitting author
Author