Picture for Faeze Brahman

Faeze Brahman

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Add code
Nov 22, 2024
Viaarxiv icon

RESTOR: Knowledge Recovery through Machine Unlearning

Add code
Oct 31, 2024
Figure 1 for RESTOR: Knowledge Recovery through Machine Unlearning
Figure 2 for RESTOR: Knowledge Recovery through Machine Unlearning
Figure 3 for RESTOR: Knowledge Recovery through Machine Unlearning
Figure 4 for RESTOR: Knowledge Recovery through Machine Unlearning
Viaarxiv icon

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

Add code
Oct 24, 2024
Figure 1 for Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Figure 2 for Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Figure 3 for Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Figure 4 for Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Viaarxiv icon

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions

Add code
Sep 26, 2024
Figure 1 for HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
Figure 2 for HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
Figure 3 for HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
Figure 4 for HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
Viaarxiv icon

AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents

Add code
Sep 13, 2024
Viaarxiv icon

Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement

Add code
Jul 25, 2024
Viaarxiv icon

How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models

Add code
Jun 29, 2024
Figure 1 for How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models
Figure 2 for How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models
Figure 3 for How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models
Figure 4 for How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models
Viaarxiv icon

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

Add code
Jun 26, 2024
Figure 1 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Figure 2 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Figure 3 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Figure 4 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Viaarxiv icon

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Add code
Jun 07, 2024
Figure 1 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Figure 2 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Figure 3 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Figure 4 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Viaarxiv icon

Information-Theoretic Distillation for Reference-less Summarization

Add code
Mar 20, 2024
Viaarxiv icon