Picture for Florian Tramèr

Florian Tramèr

Design Patterns for Securing LLM Agents against Prompt Injections

Add code
Jun 11, 2025
Viaarxiv icon

Membership Inference Attacks on Sequence Models

Add code
Jun 05, 2025
Viaarxiv icon

RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics

Add code
May 18, 2025
Viaarxiv icon

LLMs unlock new paths to monetizing exploits

Add code
May 16, 2025
Viaarxiv icon

The Jailbreak Tax: How Useful are Your Jailbreak Outputs?

Add code
Apr 14, 2025
Viaarxiv icon

Defeating Prompt Injections by Design

Add code
Mar 24, 2025
Viaarxiv icon

AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses

Add code
Mar 03, 2025
Viaarxiv icon

Adversarial ML Problems Are Getting Harder to Solve and to Evaluate

Add code
Feb 04, 2025
Viaarxiv icon

International AI Safety Report

Add code
Jan 29, 2025
Viaarxiv icon

Consistency Checks for Language Model Forecasters

Add code
Dec 24, 2024
Viaarxiv icon