Picture for Sertan Girgin

Sertan Girgin

Diversity-Rewarded CFG Distillation

Add code
Oct 08, 2024
Viaarxiv icon

Gemma 2: Improving Open Language Models at a Practical Size

Add code
Aug 02, 2024
Figure 1 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 2 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 3 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 4 for Gemma 2: Improving Open Language Models at a Practical Size
Viaarxiv icon

BOND: Aligning LLMs with Best-of-N Distillation

Add code
Jul 19, 2024
Figure 1 for BOND: Aligning LLMs with Best-of-N Distillation
Figure 2 for BOND: Aligning LLMs with Best-of-N Distillation
Figure 3 for BOND: Aligning LLMs with Best-of-N Distillation
Figure 4 for BOND: Aligning LLMs with Best-of-N Distillation
Viaarxiv icon

WARP: On the Benefits of Weight Averaged Rewarded Policies

Add code
Jun 24, 2024
Figure 1 for WARP: On the Benefits of Weight Averaged Rewarded Policies
Figure 2 for WARP: On the Benefits of Weight Averaged Rewarded Policies
Figure 3 for WARP: On the Benefits of Weight Averaged Rewarded Policies
Figure 4 for WARP: On the Benefits of Weight Averaged Rewarded Policies
Viaarxiv icon

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Add code
Apr 11, 2024
Figure 1 for RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Figure 2 for RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Figure 3 for RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Figure 4 for RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Viaarxiv icon

Gemma: Open Models Based on Gemini Research and Technology

Add code
Mar 13, 2024
Figure 1 for Gemma: Open Models Based on Gemini Research and Technology
Figure 2 for Gemma: Open Models Based on Gemini Research and Technology
Figure 3 for Gemma: Open Models Based on Gemini Research and Technology
Figure 4 for Gemma: Open Models Based on Gemini Research and Technology
Viaarxiv icon

MusicRL: Aligning Music Generation to Human Preferences

Add code
Feb 06, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

Nash Learning from Human Feedback

Add code
Dec 06, 2023
Viaarxiv icon

Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback

Add code
May 31, 2023
Viaarxiv icon