Picture for Florian Tramèr

Florian Tramèr

Gradient Masking All-at-Once: Ensemble Everything Everywhere Is Not Robust

Add code
Nov 22, 2024
Viaarxiv icon

Measuring Non-Adversarial Reproduction of Training Data in Large Language Models

Add code
Nov 15, 2024
Viaarxiv icon

Persistent Pre-Training Poisoning of LLMs

Add code
Oct 17, 2024
Viaarxiv icon

Gradient-based Jailbreak Images for Multimodal Fusion Models

Add code
Oct 04, 2024
Viaarxiv icon

Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data

Add code
Sep 29, 2024
Viaarxiv icon

An Adversarial Perspective on Machine Unlearning for AI Safety

Add code
Sep 26, 2024
Viaarxiv icon

Extracting Training Data from Document-Based VQA Models

Add code
Jul 11, 2024
Viaarxiv icon

Adversarial Search Engine Optimization for Large Language Models

Add code
Jun 26, 2024
Viaarxiv icon

Blind Baselines Beat Membership Inference Attacks for Foundation Models

Add code
Jun 23, 2024
Viaarxiv icon

AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents

Add code
Jun 19, 2024
Viaarxiv icon