Picture for Sabela Ramos

Sabela Ramos

Gemma 2: Improving Open Language Models at a Practical Size

Add code
Aug 02, 2024
Figure 1 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 2 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 3 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 4 for Gemma 2: Improving Open Language Models at a Practical Size
Viaarxiv icon

BOND: Aligning LLMs with Best-of-N Distillation

Add code
Jul 19, 2024
Figure 1 for BOND: Aligning LLMs with Best-of-N Distillation
Figure 2 for BOND: Aligning LLMs with Best-of-N Distillation
Figure 3 for BOND: Aligning LLMs with Best-of-N Distillation
Figure 4 for BOND: Aligning LLMs with Best-of-N Distillation
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models

Add code
Jun 23, 2023
Viaarxiv icon

Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback

Add code
May 31, 2023
Viaarxiv icon

RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning

Add code
Nov 04, 2021
Figure 1 for RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning
Figure 2 for RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning
Figure 3 for RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning
Figure 4 for RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning
Viaarxiv icon

Hyperparameter Selection for Imitation Learning

Add code
May 25, 2021
Figure 1 for Hyperparameter Selection for Imitation Learning
Figure 2 for Hyperparameter Selection for Imitation Learning
Figure 3 for Hyperparameter Selection for Imitation Learning
Figure 4 for Hyperparameter Selection for Imitation Learning
Viaarxiv icon

Reverb: A Framework For Experience Replay

Add code
Feb 09, 2021
Figure 1 for Reverb: A Framework For Experience Replay
Figure 2 for Reverb: A Framework For Experience Replay
Figure 3 for Reverb: A Framework For Experience Replay
Figure 4 for Reverb: A Framework For Experience Replay
Viaarxiv icon