Picture for Boaz Barak

Boaz Barak

Stress Testing Deliberative Alignment for Anti-Scheming Training

Add code
Sep 19, 2025
Viaarxiv icon

Trading Inference-Time Compute for Adversarial Robustness

Add code
Jan 31, 2025
Figure 1 for Trading Inference-Time Compute for Adversarial Robustness
Figure 2 for Trading Inference-Time Compute for Adversarial Robustness
Figure 3 for Trading Inference-Time Compute for Adversarial Robustness
Figure 4 for Trading Inference-Time Compute for Adversarial Robustness
Viaarxiv icon

OpenAI o1 System Card

Add code
Dec 21, 2024
Figure 1 for OpenAI o1 System Card
Figure 2 for OpenAI o1 System Card
Figure 3 for OpenAI o1 System Card
Figure 4 for OpenAI o1 System Card
Viaarxiv icon

Deliberative Alignment: Reasoning Enables Safer Language Models

Add code
Dec 20, 2024
Viaarxiv icon

An Economic Solution to Copyright Challenges of Generative AI

Add code
Apr 24, 2024
Figure 1 for An Economic Solution to Copyright Challenges of Generative AI
Figure 2 for An Economic Solution to Copyright Challenges of Generative AI
Figure 3 for An Economic Solution to Copyright Challenges of Generative AI
Figure 4 for An Economic Solution to Copyright Challenges of Generative AI
Viaarxiv icon

Distinguishing the Knowable from the Unknowable with Language Models

Add code
Feb 05, 2024
Figure 1 for Distinguishing the Knowable from the Unknowable with Language Models
Figure 2 for Distinguishing the Knowable from the Unknowable with Language Models
Figure 3 for Distinguishing the Knowable from the Unknowable with Language Models
Figure 4 for Distinguishing the Knowable from the Unknowable with Language Models
Viaarxiv icon

Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models

Add code
Nov 15, 2023
Figure 1 for Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models
Figure 2 for Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models
Figure 3 for Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models
Figure 4 for Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models
Viaarxiv icon

Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning

Add code
Jun 14, 2023
Viaarxiv icon

Scaling Data-Constrained Language Models

Add code
May 25, 2023
Viaarxiv icon

Provable Copyright Protection for Generative Models

Add code
Feb 21, 2023
Viaarxiv icon