Picture for Florian Tramèr

Florian Tramèr

Consistency Checks for Language Model Forecasters

Add code
Dec 24, 2024
Viaarxiv icon

Gradient Masking All-at-Once: Ensemble Everything Everywhere Is Not Robust

Add code
Nov 22, 2024
Figure 1 for Gradient Masking All-at-Once: Ensemble Everything Everywhere Is Not Robust
Figure 2 for Gradient Masking All-at-Once: Ensemble Everything Everywhere Is Not Robust
Figure 3 for Gradient Masking All-at-Once: Ensemble Everything Everywhere Is Not Robust
Viaarxiv icon

Measuring Non-Adversarial Reproduction of Training Data in Large Language Models

Add code
Nov 15, 2024
Viaarxiv icon

Persistent Pre-Training Poisoning of LLMs

Add code
Oct 17, 2024
Viaarxiv icon

Gradient-based Jailbreak Images for Multimodal Fusion Models

Add code
Oct 04, 2024
Viaarxiv icon

Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data

Add code
Sep 29, 2024
Figure 1 for Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data
Figure 2 for Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data
Figure 3 for Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data
Figure 4 for Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data
Viaarxiv icon

An Adversarial Perspective on Machine Unlearning for AI Safety

Add code
Sep 26, 2024
Viaarxiv icon

Extracting Training Data from Document-Based VQA Models

Add code
Jul 11, 2024
Viaarxiv icon

Adversarial Search Engine Optimization for Large Language Models

Add code
Jun 26, 2024
Viaarxiv icon

Blind Baselines Beat Membership Inference Attacks for Foundation Models

Add code
Jun 23, 2024
Viaarxiv icon