Picture for Valentina Pyatkin

Valentina Pyatkin

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

Add code
Oct 24, 2024
Figure 1 for Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Figure 2 for Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Figure 3 for Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Figure 4 for Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Viaarxiv icon

SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation

Add code
Oct 22, 2024
Viaarxiv icon

Diverging Preferences: When do Annotators Disagree and do Models Know?

Add code
Oct 18, 2024
Viaarxiv icon

Explicating the Implicit: Argument Detection Beyond Sentence Boundaries

Add code
Aug 08, 2024
Viaarxiv icon

Self-Directed Synthetic Dialogues and Revisions Technical Report

Add code
Jul 25, 2024
Viaarxiv icon

Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

Add code
Jun 13, 2024
Viaarxiv icon

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Add code
Jun 07, 2024
Viaarxiv icon

Superlatives in Context: Explicit and Implicit Domain Restrictions for Superlative Frames

Add code
May 31, 2024
Viaarxiv icon

RewardBench: Evaluating Reward Models for Language Modeling

Add code
Mar 20, 2024
Viaarxiv icon

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models

Add code
Feb 26, 2024
Viaarxiv icon