Picture for David Schlangen

David Schlangen

Triangulating LLM Progress through Benchmarks, Games, and Cognitive Tests

Add code
Feb 20, 2025
Viaarxiv icon

Plant in Cupboard, Orange on Table, Book on Shelf. Benchmarking Practical Reasoning and Situation Modelling in a Text-Simulated Situated Environment

Add code
Feb 17, 2025
Viaarxiv icon

Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models

Add code
Feb 17, 2025
Viaarxiv icon

Incremental Dialogue Management: Survey, Discussion, and Implications for HRI

Add code
Jan 01, 2025
Figure 1 for Incremental Dialogue Management: Survey, Discussion, and Implications for HRI
Figure 2 for Incremental Dialogue Management: Survey, Discussion, and Implications for HRI
Figure 3 for Incremental Dialogue Management: Survey, Discussion, and Implications for HRI
Figure 4 for Incremental Dialogue Management: Survey, Discussion, and Implications for HRI
Viaarxiv icon

Towards No-Code Programming of Cobots: Experiments with Code Synthesis by Large Code Models for Conversational Programming

Add code
Sep 18, 2024
Viaarxiv icon

The Unreasonable Ineffectiveness of Nucleus Sampling on Mitigating Text Memorization

Add code
Aug 29, 2024
Viaarxiv icon

LLMs as Function Approximators: Terminology, Taxonomy, and Questions for Evaluation

Add code
Jul 18, 2024
Viaarxiv icon

LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks

Add code
Jun 26, 2024
Figure 1 for LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Figure 2 for LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Figure 3 for LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Figure 4 for LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Viaarxiv icon

Two Giraffes in a Dirt Field: Using Game Play to Investigate Situation Modelling in Large Multimodal Models

Add code
Jun 20, 2024
Viaarxiv icon

How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics

Add code
Jun 20, 2024
Figure 1 for How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics
Figure 2 for How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics
Figure 3 for How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics
Figure 4 for How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics
Viaarxiv icon