Picture for Philipp Mondorf

Philipp Mondorf

Understanding When Tree of Thoughts Succeeds: Larger Models Excel in Generation, Not Discrimination

Add code
Oct 24, 2024
Viaarxiv icon

Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models

Add code
Oct 02, 2024
Viaarxiv icon

LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks

Add code
Jun 26, 2024
Figure 1 for LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Figure 2 for LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Figure 3 for LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Figure 4 for LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Viaarxiv icon

Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models

Add code
Jun 18, 2024
Viaarxiv icon

Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey

Add code
Apr 02, 2024
Viaarxiv icon

Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning

Add code
Feb 20, 2024
Viaarxiv icon