Picture for Patrick McDaniel

Patrick McDaniel

On the Robustness Tradeoff in Fine-Tuning

Add code
Mar 19, 2025
Viaarxiv icon

Adversarial Agents: Black-Box Evasion Attacks with Reinforcement Learning

Add code
Mar 03, 2025
Viaarxiv icon

Alignment and Adversarial Robustness: Are More Human-Like Models More Secure?

Add code
Feb 17, 2025
Viaarxiv icon

Targeting Alignment: Extracting Safety Classifiers of Aligned LLMs

Add code
Jan 27, 2025
Viaarxiv icon

Err on the Side of Texture: Texture Bias on Real Data

Add code
Dec 13, 2024
Viaarxiv icon

AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs

Add code
Oct 14, 2024
Figure 1 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Figure 2 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Figure 3 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Figure 4 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Viaarxiv icon

On Synthetic Texture Datasets: Challenges, Creation, and Curation

Add code
Sep 16, 2024
Viaarxiv icon

Explorations in Texture Learning

Add code
Mar 14, 2024
Viaarxiv icon

A New Era in LLM Security: Exploring Security Concerns in Real-World LLM-based Systems

Add code
Feb 28, 2024
Figure 1 for A New Era in LLM Security: Exploring Security Concerns in Real-World LLM-based Systems
Figure 2 for A New Era in LLM Security: Exploring Security Concerns in Real-World LLM-based Systems
Figure 3 for A New Era in LLM Security: Exploring Security Concerns in Real-World LLM-based Systems
Figure 4 for A New Era in LLM Security: Exploring Security Concerns in Real-World LLM-based Systems
Viaarxiv icon

Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment

Add code
Feb 27, 2024
Viaarxiv icon