Picture for Raffaella Bernardi

Raffaella Bernardi

CIMeC - Center for Mind/Brain Sciences, University of Trento

All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark

Add code
Feb 24, 2025
Viaarxiv icon

Triangulating LLM Progress through Benchmarks, Games, and Cognitive Tests

Add code
Feb 20, 2025
Viaarxiv icon

The Validation Gap: A Mechanistic Analysis of How Language Models Compute Arithmetic but Fail to Validate It

Add code
Feb 17, 2025
Viaarxiv icon

LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks

Add code
Jun 26, 2024
Figure 1 for LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Figure 2 for LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Figure 3 for LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Figure 4 for LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Viaarxiv icon

Learning to Ask Informative Questions: Enhancing LLMs with Preference Optimization and Expected Information Gain

Add code
Jun 25, 2024
Viaarxiv icon

A Systematic Analysis of Large Language Models as Soft Reasoners: The Case of Syllogistic Inferences

Add code
Jun 17, 2024
Viaarxiv icon

Looking for Confirmations: An Effective and Human-Like Visual Dialogue Strategy

Add code
Sep 11, 2021
Figure 1 for Looking for Confirmations: An Effective and Human-Like Visual Dialogue Strategy
Figure 2 for Looking for Confirmations: An Effective and Human-Like Visual Dialogue Strategy
Figure 3 for Looking for Confirmations: An Effective and Human-Like Visual Dialogue Strategy
Figure 4 for Looking for Confirmations: An Effective and Human-Like Visual Dialogue Strategy
Viaarxiv icon

Overprotective Training Environments Fall Short at Testing Time: Let Models Contribute to Their Own Training

Add code
Mar 30, 2021
Figure 1 for Overprotective Training Environments Fall Short at Testing Time: Let Models Contribute to Their Own Training
Figure 2 for Overprotective Training Environments Fall Short at Testing Time: Let Models Contribute to Their Own Training
Figure 3 for Overprotective Training Environments Fall Short at Testing Time: Let Models Contribute to Their Own Training
Figure 4 for Overprotective Training Environments Fall Short at Testing Time: Let Models Contribute to Their Own Training
Viaarxiv icon

The Interplay of Task Success and Dialogue Quality: An in-depth Evaluation in Task-Oriented Visual Dialogues

Add code
Mar 20, 2021
Figure 1 for The Interplay of Task Success and Dialogue Quality: An in-depth Evaluation in Task-Oriented Visual Dialogues
Figure 2 for The Interplay of Task Success and Dialogue Quality: An in-depth Evaluation in Task-Oriented Visual Dialogues
Figure 3 for The Interplay of Task Success and Dialogue Quality: An in-depth Evaluation in Task-Oriented Visual Dialogues
Figure 4 for The Interplay of Task Success and Dialogue Quality: An in-depth Evaluation in Task-Oriented Visual Dialogues
Viaarxiv icon

Psycholinguistics meets Continual Learning: Measuring Catastrophic Forgetting in Visual Question Answering

Add code
Jun 10, 2019
Figure 1 for Psycholinguistics meets Continual Learning: Measuring Catastrophic Forgetting in Visual Question Answering
Figure 2 for Psycholinguistics meets Continual Learning: Measuring Catastrophic Forgetting in Visual Question Answering
Figure 3 for Psycholinguistics meets Continual Learning: Measuring Catastrophic Forgetting in Visual Question Answering
Viaarxiv icon