Picture for Johannes Heidecke

Johannes Heidecke

Tony

PaperBench: Evaluating AI's Ability to Replicate AI Research

Add code
Apr 02, 2025
Viaarxiv icon

SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?

Add code
Feb 19, 2025
Viaarxiv icon

Trading Inference-Time Compute for Adversarial Robustness

Add code
Jan 31, 2025
Figure 1 for Trading Inference-Time Compute for Adversarial Robustness
Figure 2 for Trading Inference-Time Compute for Adversarial Robustness
Figure 3 for Trading Inference-Time Compute for Adversarial Robustness
Figure 4 for Trading Inference-Time Compute for Adversarial Robustness
Viaarxiv icon

Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning

Add code
Dec 24, 2024
Figure 1 for Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning
Figure 2 for Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning
Figure 3 for Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning
Figure 4 for Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning
Viaarxiv icon

OpenAI o1 System Card

Add code
Dec 21, 2024
Figure 1 for OpenAI o1 System Card
Figure 2 for OpenAI o1 System Card
Figure 3 for OpenAI o1 System Card
Figure 4 for OpenAI o1 System Card
Viaarxiv icon

Deliberative Alignment: Reasoning Enables Safer Language Models

Add code
Dec 20, 2024
Viaarxiv icon

Rule Based Rewards for Language Model Safety

Add code
Nov 02, 2024
Viaarxiv icon

GPT-4o System Card

Add code
Oct 25, 2024
Viaarxiv icon

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Add code
Apr 19, 2024
Figure 1 for The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Figure 2 for The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Figure 3 for The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Figure 4 for The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Viaarxiv icon

Text and Code Embeddings by Contrastive Pre-Training

Add code
Jan 24, 2022
Figure 1 for Text and Code Embeddings by Contrastive Pre-Training
Figure 2 for Text and Code Embeddings by Contrastive Pre-Training
Figure 3 for Text and Code Embeddings by Contrastive Pre-Training
Figure 4 for Text and Code Embeddings by Contrastive Pre-Training
Viaarxiv icon