Picture for Kyle Richardson

Kyle Richardson

Shammie

ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

Add code
Feb 03, 2025
Viaarxiv icon

Understanding the Logic of Direct Preference Alignment through Logic

Add code
Dec 23, 2024
Viaarxiv icon

SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories

Add code
Sep 11, 2024
Viaarxiv icon

SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals

Add code
Jun 07, 2024
Figure 1 for SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals
Figure 2 for SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals
Figure 3 for SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals
Figure 4 for SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals
Viaarxiv icon

TimeArena: Shaping Efficient Multitasking Language Agents in a Time-Aware Simulation

Add code
Feb 08, 2024
Viaarxiv icon

OLMo: Accelerating the Science of Language Models

Add code
Feb 07, 2024
Figure 1 for OLMo: Accelerating the Science of Language Models
Figure 2 for OLMo: Accelerating the Science of Language Models
Figure 3 for OLMo: Accelerating the Science of Language Models
Figure 4 for OLMo: Accelerating the Science of Language Models
Viaarxiv icon

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Add code
Jan 31, 2024
Figure 1 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 2 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 3 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 4 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Viaarxiv icon

Paloma: A Benchmark for Evaluating Language Model Fit

Add code
Dec 16, 2023
Viaarxiv icon

Catwalk: A Unified Language Model Evaluation Framework for Many Datasets

Add code
Dec 15, 2023
Viaarxiv icon

Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena

Add code
Oct 09, 2023
Viaarxiv icon