Picture for Kyle Richardson

Kyle Richardson

Shammie

SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories

Add code
Sep 11, 2024
Viaarxiv icon

SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals

Add code
Jun 07, 2024
Viaarxiv icon

TimeArena: Shaping Efficient Multitasking Language Agents in a Time-Aware Simulation

Add code
Feb 08, 2024
Viaarxiv icon

OLMo: Accelerating the Science of Language Models

Add code
Feb 07, 2024
Figure 1 for OLMo: Accelerating the Science of Language Models
Figure 2 for OLMo: Accelerating the Science of Language Models
Figure 3 for OLMo: Accelerating the Science of Language Models
Figure 4 for OLMo: Accelerating the Science of Language Models
Viaarxiv icon

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Add code
Jan 31, 2024
Figure 1 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 2 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 3 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 4 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Viaarxiv icon

Paloma: A Benchmark for Evaluating Language Model Fit

Add code
Dec 16, 2023
Viaarxiv icon

Catwalk: A Unified Language Model Evaluation Framework for Many Datasets

Add code
Dec 15, 2023
Viaarxiv icon

Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena

Add code
Oct 09, 2023
Viaarxiv icon

Language Models with Rationality

Add code
May 23, 2023
Figure 1 for Language Models with Rationality
Figure 2 for Language Models with Rationality
Figure 3 for Language Models with Rationality
Figure 4 for Language Models with Rationality
Viaarxiv icon

DISCO: Distilling Phrasal Counterfactuals with Large Language Models

Add code
Dec 20, 2022
Viaarxiv icon