Picture for Rylan Schaeffer

Rylan Schaeffer

Best-of-N Jailbreaking

Add code
Dec 04, 2024
Viaarxiv icon

Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach

Add code
Dec 03, 2024
Figure 1 for Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach
Figure 2 for Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach
Figure 3 for Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach
Figure 4 for Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach
Viaarxiv icon

ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment

Add code
Oct 23, 2024
Figure 1 for ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
Figure 2 for ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
Figure 3 for ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
Figure 4 for ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
Viaarxiv icon

Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World

Add code
Oct 22, 2024
Viaarxiv icon

When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?

Add code
Jul 21, 2024
Figure 1 for When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
Figure 2 for When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
Figure 3 for When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
Figure 4 for When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
Viaarxiv icon

Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models

Add code
Jun 20, 2024
Viaarxiv icon

In-Context Learning of Energy Functions

Add code
Jun 18, 2024
Figure 1 for In-Context Learning of Energy Functions
Viaarxiv icon

Quantifying Variance in Evaluation Benchmarks

Add code
Jun 14, 2024
Figure 1 for Quantifying Variance in Evaluation Benchmarks
Figure 2 for Quantifying Variance in Evaluation Benchmarks
Figure 3 for Quantifying Variance in Evaluation Benchmarks
Figure 4 for Quantifying Variance in Evaluation Benchmarks
Viaarxiv icon

Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations

Add code
Jun 13, 2024
Viaarxiv icon

Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

Add code
Jun 06, 2024
Figure 1 for Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
Figure 2 for Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
Figure 3 for Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
Figure 4 for Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
Viaarxiv icon