Picture for Nino Vieillard

Nino Vieillard

On Teacher Hacking in Language Model Distillation

Add code
Feb 04, 2025
Figure 1 for On Teacher Hacking in Language Model Distillation
Figure 2 for On Teacher Hacking in Language Model Distillation
Figure 3 for On Teacher Hacking in Language Model Distillation
Figure 4 for On Teacher Hacking in Language Model Distillation
Viaarxiv icon

Loss Functions and Operators Generated by f-Divergences

Add code
Jan 30, 2025
Viaarxiv icon

Imitating Language via Scalable Inverse Reinforcement Learning

Add code
Sep 02, 2024
Figure 1 for Imitating Language via Scalable Inverse Reinforcement Learning
Figure 2 for Imitating Language via Scalable Inverse Reinforcement Learning
Figure 3 for Imitating Language via Scalable Inverse Reinforcement Learning
Figure 4 for Imitating Language via Scalable Inverse Reinforcement Learning
Viaarxiv icon

Gemma 2: Improving Open Language Models at a Practical Size

Add code
Aug 02, 2024
Figure 1 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 2 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 3 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 4 for Gemma 2: Improving Open Language Models at a Practical Size
Viaarxiv icon

BOND: Aligning LLMs with Best-of-N Distillation

Add code
Jul 19, 2024
Figure 1 for BOND: Aligning LLMs with Best-of-N Distillation
Figure 2 for BOND: Aligning LLMs with Best-of-N Distillation
Figure 3 for BOND: Aligning LLMs with Best-of-N Distillation
Figure 4 for BOND: Aligning LLMs with Best-of-N Distillation
Viaarxiv icon

WARP: On the Benefits of Weight Averaged Rewarded Policies

Add code
Jun 24, 2024
Figure 1 for WARP: On the Benefits of Weight Averaged Rewarded Policies
Figure 2 for WARP: On the Benefits of Weight Averaged Rewarded Policies
Figure 3 for WARP: On the Benefits of Weight Averaged Rewarded Policies
Figure 4 for WARP: On the Benefits of Weight Averaged Rewarded Policies
Viaarxiv icon

WARM: On the Benefits of Weight Averaged Reward Models

Add code
Jan 22, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models

Add code
Jun 23, 2023
Viaarxiv icon

Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback

Add code
May 31, 2023
Figure 1 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Figure 2 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Figure 3 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Figure 4 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Viaarxiv icon