Picture for Nicholas Carlini

Nicholas Carlini

Dj

Defeating Prompt Injections by Design

Add code
Mar 24, 2025
Viaarxiv icon

AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses

Add code
Mar 03, 2025
Viaarxiv icon

Adversarial ML Problems Are Getting Harder to Solve and to Evaluate

Add code
Feb 04, 2025
Viaarxiv icon

Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards

Add code
Jan 13, 2025
Figure 1 for Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
Figure 2 for Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
Figure 3 for Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
Figure 4 for Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
Viaarxiv icon

On Evaluating the Durability of Safeguards for Open-Weight LLMs

Add code
Dec 10, 2024
Viaarxiv icon

SoK: Watermarking for AI-Generated Content

Add code
Nov 27, 2024
Viaarxiv icon

Gradient Masking All-at-Once: Ensemble Everything Everywhere Is Not Robust

Add code
Nov 22, 2024
Figure 1 for Gradient Masking All-at-Once: Ensemble Everything Everywhere Is Not Robust
Figure 2 for Gradient Masking All-at-Once: Ensemble Everything Everywhere Is Not Robust
Figure 3 for Gradient Masking All-at-Once: Ensemble Everything Everywhere Is Not Robust
Viaarxiv icon

Measuring Non-Adversarial Reproduction of Training Data in Large Language Models

Add code
Nov 15, 2024
Viaarxiv icon

Stealing User Prompts from Mixture of Experts

Add code
Oct 30, 2024
Viaarxiv icon

Remote Timing Attacks on Efficient Language Model Inference

Add code
Oct 22, 2024
Figure 1 for Remote Timing Attacks on Efficient Language Model Inference
Figure 2 for Remote Timing Attacks on Efficient Language Model Inference
Figure 3 for Remote Timing Attacks on Efficient Language Model Inference
Figure 4 for Remote Timing Attacks on Efficient Language Model Inference
Viaarxiv icon