Picture for Patrick McDaniel

Patrick McDaniel

Alignment and Adversarial Robustness: Are More Human-Like Models More Secure?

Add code
Feb 17, 2025
Viaarxiv icon

Targeting Alignment: Extracting Safety Classifiers of Aligned LLMs

Add code
Jan 27, 2025
Viaarxiv icon

Err on the Side of Texture: Texture Bias on Real Data

Add code
Dec 13, 2024
Viaarxiv icon

AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs

Add code
Oct 14, 2024
Figure 1 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Figure 2 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Figure 3 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Figure 4 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Viaarxiv icon

On Synthetic Texture Datasets: Challenges, Creation, and Curation

Add code
Sep 16, 2024
Viaarxiv icon

Explorations in Texture Learning

Add code
Mar 14, 2024
Viaarxiv icon

A New Era in LLM Security: Exploring Security Concerns in Real-World LLM-based Systems

Add code
Feb 28, 2024
Viaarxiv icon

Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment

Add code
Feb 27, 2024
Viaarxiv icon

The Efficacy of Transformer-based Adversarial Attacks in Security Domains

Add code
Oct 17, 2023
Viaarxiv icon

The Space of Adversarial Strategies

Add code
Sep 09, 2022
Figure 1 for The Space of Adversarial Strategies
Figure 2 for The Space of Adversarial Strategies
Figure 3 for The Space of Adversarial Strategies
Figure 4 for The Space of Adversarial Strategies
Viaarxiv icon