Picture for Jonas Geiping

Jonas Geiping

A Realistic Threat Model for Large Language Model Jailbreaks

Add code
Oct 21, 2024
Viaarxiv icon

Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs

Add code
Jun 14, 2024
Viaarxiv icon

AI Risk Management Should Incorporate Both Safety and Security

Add code
May 29, 2024
Viaarxiv icon

Transformers Can Do Arithmetic with the Right Embeddings

Add code
May 27, 2024
Figure 1 for Transformers Can Do Arithmetic with the Right Embeddings
Figure 2 for Transformers Can Do Arithmetic with the Right Embeddings
Figure 3 for Transformers Can Do Arithmetic with the Right Embeddings
Figure 4 for Transformers Can Do Arithmetic with the Right Embeddings
Viaarxiv icon

LMD3: Language Model Data Density Dependence

Add code
May 10, 2024
Viaarxiv icon

Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models

Add code
Apr 01, 2024
Viaarxiv icon

Measuring Style Similarity in Diffusion Models

Add code
Apr 01, 2024
Viaarxiv icon

Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion

Add code
Mar 25, 2024
Viaarxiv icon

What do we learn from inverting CLIP models?

Add code
Mar 05, 2024
Viaarxiv icon

Coercing LLMs to do and reveal anything

Add code
Feb 21, 2024
Viaarxiv icon