Picture for Javier Rando

Javier Rando

Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations

Add code
Nov 15, 2024
Figure 1 for Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations
Figure 2 for Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations
Figure 3 for Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations
Figure 4 for Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations
Viaarxiv icon

Measuring Non-Adversarial Reproduction of Training Data in Large Language Models

Add code
Nov 15, 2024
Viaarxiv icon

Persistent Pre-Training Poisoning of LLMs

Add code
Oct 17, 2024
Viaarxiv icon

Gradient-based Jailbreak Images for Multimodal Fusion Models

Add code
Oct 04, 2024
Viaarxiv icon

An Adversarial Perspective on Machine Unlearning for AI Safety

Add code
Sep 26, 2024
Viaarxiv icon

Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition

Add code
Jun 12, 2024
Figure 1 for Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Figure 2 for Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Figure 3 for Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Figure 4 for Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Viaarxiv icon

Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs

Add code
Apr 22, 2024
Viaarxiv icon

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Apr 15, 2024
Figure 1 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 2 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 3 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 4 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Viaarxiv icon

Universal Jailbreak Backdoors from Poisoned Human Feedback

Add code
Nov 24, 2023
Viaarxiv icon

Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation

Add code
Nov 06, 2023
Viaarxiv icon