Picture for Nicholas Carlini

Nicholas Carlini

Dj

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Add code
Jan 17, 2026
Viaarxiv icon

Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples

Add code
Oct 08, 2025
Viaarxiv icon

Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks

Add code
Oct 02, 2025
Figure 1 for Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks
Figure 2 for Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks
Figure 3 for Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks
Figure 4 for Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks
Viaarxiv icon

LLMs unlock new paths to monetizing exploits

Add code
May 16, 2025
Figure 1 for LLMs unlock new paths to monetizing exploits
Figure 2 for LLMs unlock new paths to monetizing exploits
Figure 3 for LLMs unlock new paths to monetizing exploits
Figure 4 for LLMs unlock new paths to monetizing exploits
Viaarxiv icon

Defeating Prompt Injections by Design

Add code
Mar 24, 2025
Viaarxiv icon

AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses

Add code
Mar 03, 2025
Figure 1 for AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses
Figure 2 for AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses
Figure 3 for AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses
Figure 4 for AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses
Viaarxiv icon

Adversarial ML Problems Are Getting Harder to Solve and to Evaluate

Add code
Feb 04, 2025
Viaarxiv icon

Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards

Add code
Jan 13, 2025
Figure 1 for Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
Figure 2 for Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
Figure 3 for Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
Figure 4 for Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
Viaarxiv icon

On Evaluating the Durability of Safeguards for Open-Weight LLMs

Add code
Dec 10, 2024
Figure 1 for On Evaluating the Durability of Safeguards for Open-Weight LLMs
Figure 2 for On Evaluating the Durability of Safeguards for Open-Weight LLMs
Figure 3 for On Evaluating the Durability of Safeguards for Open-Weight LLMs
Figure 4 for On Evaluating the Durability of Safeguards for Open-Weight LLMs
Viaarxiv icon

SoK: Watermarking for AI-Generated Content

Add code
Nov 27, 2024
Figure 1 for SoK: Watermarking for AI-Generated Content
Figure 2 for SoK: Watermarking for AI-Generated Content
Viaarxiv icon