Picture for Jonathan Nöther

Jonathan Nöther

Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints

Add code
Jan 14, 2025
Viaarxiv icon

Implicit Poisoning Attacks in Two-Agent Reinforcement Learning: Adversarial Policies for Training-Time Attacks

Add code
Feb 27, 2023
Viaarxiv icon