Picture for Lucy G. Cheke

Lucy G. Cheke

A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment

Add code
Oct 30, 2024
Figure 1 for A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment
Figure 2 for A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment
Figure 3 for A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment
Figure 4 for A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment
Viaarxiv icon

Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers

Add code
Oct 15, 2024
Viaarxiv icon

100 instances is all you need: predicting the success of a new LLM on unseen data by testing on a few instances

Add code
Sep 05, 2024
Viaarxiv icon

Animal-AI 3: What's New & Why You Should Care

Add code
Dec 18, 2023
Figure 1 for Animal-AI 3: What's New & Why You Should Care
Figure 2 for Animal-AI 3: What's New & Why You Should Care
Figure 3 for Animal-AI 3: What's New & Why You Should Care
Figure 4 for Animal-AI 3: What's New & Why You Should Care
Viaarxiv icon