Picture for Matt Fredrikson

Matt Fredrikson

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

Add code
Oct 11, 2024
Viaarxiv icon

Improving Alignment and Robustness with Circuit Breakers

Add code
Jun 10, 2024
Figure 1 for Improving Alignment and Robustness with Circuit Breakers
Figure 2 for Improving Alignment and Robustness with Circuit Breakers
Figure 3 for Improving Alignment and Robustness with Circuit Breakers
Figure 4 for Improving Alignment and Robustness with Circuit Breakers
Viaarxiv icon

Sales Whisperer: A Human-Inconspicuous Attack on LLM Brand Recommendations

Add code
Jun 07, 2024
Figure 1 for Sales Whisperer: A Human-Inconspicuous Attack on LLM Brand Recommendations
Figure 2 for Sales Whisperer: A Human-Inconspicuous Attack on LLM Brand Recommendations
Figure 3 for Sales Whisperer: A Human-Inconspicuous Attack on LLM Brand Recommendations
Figure 4 for Sales Whisperer: A Human-Inconspicuous Attack on LLM Brand Recommendations
Viaarxiv icon

Improving Alignment and Robustness with Short Circuiting

Add code
Jun 06, 2024
Figure 1 for Improving Alignment and Robustness with Short Circuiting
Figure 2 for Improving Alignment and Robustness with Short Circuiting
Figure 3 for Improving Alignment and Robustness with Short Circuiting
Figure 4 for Improving Alignment and Robustness with Short Circuiting
Viaarxiv icon

VeriSplit: Secure and Practical Offloading of Machine Learning Inferences across IoT Devices

Add code
Jun 02, 2024
Viaarxiv icon

Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization

Add code
May 15, 2024
Viaarxiv icon

Transfer Attacks and Defenses for Large Language Models on Coding Tasks

Add code
Nov 22, 2023
Figure 1 for Transfer Attacks and Defenses for Large Language Models on Coding Tasks
Figure 2 for Transfer Attacks and Defenses for Large Language Models on Coding Tasks
Figure 3 for Transfer Attacks and Defenses for Large Language Models on Coding Tasks
Figure 4 for Transfer Attacks and Defenses for Large Language Models on Coding Tasks
Viaarxiv icon

Is Certifying $\ell_p$ Robustness Still Worthwhile?

Add code
Oct 13, 2023
Figure 1 for Is Certifying $\ell_p$ Robustness Still Worthwhile?
Figure 2 for Is Certifying $\ell_p$ Robustness Still Worthwhile?
Figure 3 for Is Certifying $\ell_p$ Robustness Still Worthwhile?
Figure 4 for Is Certifying $\ell_p$ Robustness Still Worthwhile?
Viaarxiv icon

Representation Engineering: A Top-Down Approach to AI Transparency

Add code
Oct 10, 2023
Figure 1 for Representation Engineering: A Top-Down Approach to AI Transparency
Figure 2 for Representation Engineering: A Top-Down Approach to AI Transparency
Figure 3 for Representation Engineering: A Top-Down Approach to AI Transparency
Figure 4 for Representation Engineering: A Top-Down Approach to AI Transparency
Viaarxiv icon

A Recipe for Improved Certifiable Robustness: Capacity and Data

Add code
Oct 04, 2023
Viaarxiv icon