Picture for Elizabeth Barnes

Elizabeth Barnes

Shammie

RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts

Add code
Nov 22, 2024
Viaarxiv icon

Evaluating Language-Model Agents on Realistic Autonomous Tasks

Add code
Jan 04, 2024
Figure 1 for Evaluating Language-Model Agents on Realistic Autonomous Tasks
Figure 2 for Evaluating Language-Model Agents on Realistic Autonomous Tasks
Figure 3 for Evaluating Language-Model Agents on Realistic Autonomous Tasks
Figure 4 for Evaluating Language-Model Agents on Realistic Autonomous Tasks
Viaarxiv icon

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Add code
Jun 10, 2022
Viaarxiv icon

Evaluating Large Language Models Trained on Code

Add code
Jul 14, 2021
Figure 1 for Evaluating Large Language Models Trained on Code
Figure 2 for Evaluating Large Language Models Trained on Code
Figure 3 for Evaluating Large Language Models Trained on Code
Figure 4 for Evaluating Large Language Models Trained on Code
Viaarxiv icon

Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings

Add code
Mar 12, 2019
Figure 1 for Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings
Figure 2 for Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings
Figure 3 for Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings
Figure 4 for Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings
Viaarxiv icon