Picture for Sachin Kumar

Sachin Kumar

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

Add code
Oct 24, 2024
Figure 1 for Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Figure 2 for Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Figure 3 for Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Figure 4 for Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Viaarxiv icon

ComPO: Community Preferences for Language Model Personalization

Add code
Oct 21, 2024
Figure 1 for ComPO: Community Preferences for Language Model Personalization
Figure 2 for ComPO: Community Preferences for Language Model Personalization
Figure 3 for ComPO: Community Preferences for Language Model Personalization
Figure 4 for ComPO: Community Preferences for Language Model Personalization
Viaarxiv icon

Overriding Safety protections of Open-source Models

Add code
Sep 28, 2024
Viaarxiv icon

MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization

Add code
Jul 11, 2024
Figure 1 for MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization
Figure 2 for MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization
Figure 3 for MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization
Figure 4 for MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization
Viaarxiv icon

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

Add code
Jun 26, 2024
Figure 1 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Figure 2 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Figure 3 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Figure 4 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Viaarxiv icon

RewardBench: Evaluating Reward Models for Language Modeling

Add code
Mar 20, 2024
Viaarxiv icon

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Add code
Jan 31, 2024
Figure 1 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 2 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 3 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 4 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Viaarxiv icon

What Constitutes a Faithful Summary? Preserving Author Perspectives in News Summarization

Add code
Nov 16, 2023
Viaarxiv icon

Gen-Z: Generative Zero-Shot Text Classification with Contextualized Label Descriptions

Add code
Nov 13, 2023
Figure 1 for Gen-Z: Generative Zero-Shot Text Classification with Contextualized Label Descriptions
Figure 2 for Gen-Z: Generative Zero-Shot Text Classification with Contextualized Label Descriptions
Figure 3 for Gen-Z: Generative Zero-Shot Text Classification with Contextualized Label Descriptions
Figure 4 for Gen-Z: Generative Zero-Shot Text Classification with Contextualized Label Descriptions
Viaarxiv icon

Minding Language Models' Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker

Add code
Jun 01, 2023
Viaarxiv icon