Picture for Florian Tramèr

Florian Tramèr

AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses

Add code
Mar 03, 2025
Viaarxiv icon

Adversarial ML Problems Are Getting Harder to Solve and to Evaluate

Add code
Feb 04, 2025
Viaarxiv icon

International AI Safety Report

Add code
Jan 29, 2025
Viaarxiv icon

Consistency Checks for Language Model Forecasters

Add code
Dec 24, 2024
Viaarxiv icon

Gradient Masking All-at-Once: Ensemble Everything Everywhere Is Not Robust

Add code
Nov 22, 2024
Figure 1 for Gradient Masking All-at-Once: Ensemble Everything Everywhere Is Not Robust
Figure 2 for Gradient Masking All-at-Once: Ensemble Everything Everywhere Is Not Robust
Figure 3 for Gradient Masking All-at-Once: Ensemble Everything Everywhere Is Not Robust
Viaarxiv icon

Measuring Non-Adversarial Reproduction of Training Data in Large Language Models

Add code
Nov 15, 2024
Viaarxiv icon

Persistent Pre-Training Poisoning of LLMs

Add code
Oct 17, 2024
Viaarxiv icon

Gradient-based Jailbreak Images for Multimodal Fusion Models

Add code
Oct 04, 2024
Figure 1 for Gradient-based Jailbreak Images for Multimodal Fusion Models
Figure 2 for Gradient-based Jailbreak Images for Multimodal Fusion Models
Figure 3 for Gradient-based Jailbreak Images for Multimodal Fusion Models
Figure 4 for Gradient-based Jailbreak Images for Multimodal Fusion Models
Viaarxiv icon

Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data

Add code
Sep 29, 2024
Figure 1 for Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data
Figure 2 for Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data
Figure 3 for Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data
Figure 4 for Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data
Viaarxiv icon

An Adversarial Perspective on Machine Unlearning for AI Safety

Add code
Sep 26, 2024
Viaarxiv icon