Picture for Alexander Robey

Alexander Robey

Jailbreaking LLM-Controlled Robots

Add code
Oct 17, 2024
Viaarxiv icon

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Add code
Mar 28, 2024
Figure 1 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 2 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 3 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 4 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Viaarxiv icon

Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation

Add code
Mar 28, 2024
Figure 1 for Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Figure 2 for Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Figure 3 for Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Figure 4 for Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Viaarxiv icon

A Safe Harbor for AI Evaluation and Red Teaming

Add code
Mar 07, 2024
Figure 1 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 2 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 3 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 4 for A Safe Harbor for AI Evaluation and Red Teaming
Viaarxiv icon

Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing

Add code
Feb 28, 2024
Figure 1 for Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
Figure 2 for Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
Figure 3 for Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
Figure 4 for Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
Viaarxiv icon

Data-Driven Modeling and Verification of Perception-Based Autonomous Systems

Add code
Dec 11, 2023
Viaarxiv icon

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

Add code
Oct 13, 2023
Viaarxiv icon

Jailbreaking Black Box Large Language Models in Twenty Queries

Add code
Oct 13, 2023
Viaarxiv icon

Adversarial Training Should Be Cast as a Non-Zero-Sum Game

Add code
Jun 19, 2023
Viaarxiv icon

Probable Domain Generalization via Quantile Risk Minimization

Add code
Jul 20, 2022
Figure 1 for Probable Domain Generalization via Quantile Risk Minimization
Figure 2 for Probable Domain Generalization via Quantile Risk Minimization
Figure 3 for Probable Domain Generalization via Quantile Risk Minimization
Figure 4 for Probable Domain Generalization via Quantile Risk Minimization
Viaarxiv icon