Picture for Jonathan Nöther

Jonathan Nöther

MaMa: A Game-Theoretic Approach for Designing Safe Agentic Systems

Add code
Feb 04, 2026
Viaarxiv icon

AgenticRed: Optimizing Agentic Systems for Automated Red-teaming

Add code
Jan 20, 2026
Viaarxiv icon

Policy Teaching via Data Poisoning in Learning from Human Preferences

Add code
Mar 13, 2025
Figure 1 for Policy Teaching via Data Poisoning in Learning from Human Preferences
Figure 2 for Policy Teaching via Data Poisoning in Learning from Human Preferences
Viaarxiv icon

Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints

Add code
Jan 14, 2025
Viaarxiv icon

Implicit Poisoning Attacks in Two-Agent Reinforcement Learning: Adversarial Policies for Training-Time Attacks

Add code
Feb 27, 2023
Figure 1 for Implicit Poisoning Attacks in Two-Agent Reinforcement Learning: Adversarial Policies for Training-Time Attacks
Figure 2 for Implicit Poisoning Attacks in Two-Agent Reinforcement Learning: Adversarial Policies for Training-Time Attacks
Figure 3 for Implicit Poisoning Attacks in Two-Agent Reinforcement Learning: Adversarial Policies for Training-Time Attacks
Figure 4 for Implicit Poisoning Attacks in Two-Agent Reinforcement Learning: Adversarial Policies for Training-Time Attacks
Viaarxiv icon