Picture for Erik Jones

Erik Jones

How Do Large Language Monkeys Get Their Power (Laws)?

Add code
Feb 24, 2025
Viaarxiv icon

Forecasting Rare Language Model Behaviors

Add code
Feb 24, 2025
Viaarxiv icon

Best-of-N Jailbreaking

Add code
Dec 04, 2024
Viaarxiv icon

Adversaries Can Misuse Combinations of Safe Models

Add code
Jun 20, 2024
Viaarxiv icon

Feedback Loops With Language Models Drive In-Context Reward Hacking

Add code
Feb 09, 2024
Viaarxiv icon

Orca 2: Teaching Small Language Models How to Reason

Add code
Nov 21, 2023
Figure 1 for Orca 2: Teaching Small Language Models How to Reason
Figure 2 for Orca 2: Teaching Small Language Models How to Reason
Figure 3 for Orca 2: Teaching Small Language Models How to Reason
Figure 4 for Orca 2: Teaching Small Language Models How to Reason
Viaarxiv icon

Teaching Language Models to Hallucinate Less with Synthetic Tasks

Add code
Oct 10, 2023
Figure 1 for Teaching Language Models to Hallucinate Less with Synthetic Tasks
Figure 2 for Teaching Language Models to Hallucinate Less with Synthetic Tasks
Figure 3 for Teaching Language Models to Hallucinate Less with Synthetic Tasks
Figure 4 for Teaching Language Models to Hallucinate Less with Synthetic Tasks
Viaarxiv icon

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

Add code
Sep 26, 2023
Viaarxiv icon

Mass-Producing Failures of Multimodal Systems with Language Models

Add code
Jun 21, 2023
Figure 1 for Mass-Producing Failures of Multimodal Systems with Language Models
Figure 2 for Mass-Producing Failures of Multimodal Systems with Language Models
Figure 3 for Mass-Producing Failures of Multimodal Systems with Language Models
Figure 4 for Mass-Producing Failures of Multimodal Systems with Language Models
Viaarxiv icon

Automatically Auditing Large Language Models via Discrete Optimization

Add code
Mar 08, 2023
Viaarxiv icon