Picture for Rylan Schaeffer

Rylan Schaeffer

No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data

Add code
Feb 26, 2025
Viaarxiv icon

How Do Large Language Monkeys Get Their Power (Laws)?

Add code
Feb 24, 2025
Viaarxiv icon

Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks

Add code
Feb 24, 2025
Viaarxiv icon

Best-of-N Jailbreaking

Add code
Dec 04, 2024
Viaarxiv icon

Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach

Add code
Dec 03, 2024
Figure 1 for Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach
Figure 2 for Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach
Figure 3 for Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach
Figure 4 for Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach
Viaarxiv icon

ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment

Add code
Oct 23, 2024
Figure 1 for ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
Figure 2 for ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
Figure 3 for ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
Figure 4 for ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
Viaarxiv icon

Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World

Add code
Oct 22, 2024
Viaarxiv icon

When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?

Add code
Jul 21, 2024
Figure 1 for When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
Figure 2 for When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
Figure 3 for When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
Figure 4 for When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
Viaarxiv icon

Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models

Add code
Jun 20, 2024
Viaarxiv icon

In-Context Learning of Energy Functions

Add code
Jun 18, 2024
Figure 1 for In-Context Learning of Energy Functions
Viaarxiv icon