Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rick Nouwen

When Models Decide and When They Bind: A Two-Stage Computation for Multiple-Choice Question-Answering

Jan 07, 2026

Hugh Mee Wong, Rick Nouwen, Albert Gatt

Abstract:Multiple-choice question answering (MCQA) is easy to evaluate but adds a meta-task: models must both solve the problem and output the symbol that *represents* the answer, conflating reasoning errors with symbol-binding failures. We study how language models implement MCQA internally using representational analyses (PCA, linear probes) as well as causal interventions. We find that option-boundary (newline) residual states often contain strong linearly decodable signals related to per-option correctness. Winner-identity probing reveals a two-stage progression: the winning *content position* becomes decodable immediately after the final option is processed, while the *output symbol* is represented closer to the answer emission position. Tests under symbol and content permutations support a two-stage mechanism in which models first select a winner in content space and then bind or route that winner to the appropriate symbol to emit.

* Under review

Via

Access Paper or Ask Questions

Can LLMs Detect Ambiguous Plural Reference? An Analysis of Split-Antecedent and Mereological Reference

Oct 06, 2025

Dang Anh, Rick Nouwen, Massimo Poesio

Abstract:Our goal is to study how LLMs represent and interpret plural reference in ambiguous and unambiguous contexts. We ask the following research questions: (1) Do LLMs exhibit human-like preferences in representing plural reference? (2) Are LLMs able to detect ambiguity in plural anaphoric expressions and identify possible referents? To address these questions, we design a set of experiments, examining pronoun production using next-token prediction tasks, pronoun interpretation, and ambiguity detection using different prompting strategies. We then assess how comparable LLMs are to humans in formulating and interpreting plural reference. We find that LLMs are sometimes aware of possible referents of ambiguous pronouns. However, they do not always follow human reference when choosing between interpretations, especially when the possible interpretation is not explicitly mentioned. In addition, they struggle to identify ambiguity without direct instruction. Our findings also reveal inconsistencies in the results across different types of experiments.

Via

Access Paper or Ask Questions

VAQUUM: Are Vague Quantifiers Grounded in Visual Data?

Feb 18, 2025

Hugh Mee Wong, Rick Nouwen, Albert Gatt

Figure 1 for VAQUUM: Are Vague Quantifiers Grounded in Visual Data?

Figure 2 for VAQUUM: Are Vague Quantifiers Grounded in Visual Data?

Figure 3 for VAQUUM: Are Vague Quantifiers Grounded in Visual Data?

Figure 4 for VAQUUM: Are Vague Quantifiers Grounded in Visual Data?

Abstract:Vague quantifiers such as "a few" and "many" are influenced by many contextual factors, including how many objects are present in a given context. In this work, we evaluate the extent to which vision-and-language models (VLMs) are compatible with humans when producing or judging the appropriateness of vague quantifiers in visual contexts. We release a novel dataset, VAQUUM, containing 20300 human ratings on quantified statements across a total of 1089 images. Using this dataset, we compare human judgments and VLM predictions using three different evaluation methods. Our findings show that VLMs, like humans, are influenced by object counts in vague quantifier use. However, we find significant inconsistencies across models in different evaluation settings, suggesting that judging and producing vague quantifiers rely on two different processes.

* Under review, 12 pages for main paper (5 figures), 15 pages including appendix (2 figures)

Via

Access Paper or Ask Questions