Picture for Sachin Kumar

Sachin Kumar

Automated Testing of COBOL to Java Transformation

Add code
Apr 14, 2025
Viaarxiv icon

A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models

Add code
Apr 09, 2025
Viaarxiv icon

Steering off Course: Reliability Challenges in Steering Language Models

Add code
Apr 06, 2025
Viaarxiv icon

TESS 2: A Large-Scale Generalist Diffusion Language Model

Add code
Feb 19, 2025
Viaarxiv icon

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

Add code
Oct 24, 2024
Figure 1 for Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Figure 2 for Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Figure 3 for Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Figure 4 for Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Viaarxiv icon

ComPO: Community Preferences for Language Model Personalization

Add code
Oct 21, 2024
Figure 1 for ComPO: Community Preferences for Language Model Personalization
Figure 2 for ComPO: Community Preferences for Language Model Personalization
Figure 3 for ComPO: Community Preferences for Language Model Personalization
Figure 4 for ComPO: Community Preferences for Language Model Personalization
Viaarxiv icon

Overriding Safety protections of Open-source Models

Add code
Sep 28, 2024
Viaarxiv icon

MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization

Add code
Jul 11, 2024
Figure 1 for MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization
Figure 2 for MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization
Figure 3 for MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization
Figure 4 for MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization
Viaarxiv icon

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

Add code
Jun 26, 2024
Figure 1 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Figure 2 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Figure 3 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Figure 4 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Viaarxiv icon

RewardBench: Evaluating Reward Models for Language Modeling

Add code
Mar 20, 2024
Viaarxiv icon