Picture for Yannis Flet-Berliac

Yannis Flet-Berliac

Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier

Add code
Dec 05, 2024
Viaarxiv icon

Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion

Add code
Jun 27, 2024
Viaarxiv icon

Averaging log-likelihoods in direct alignment

Add code
Jun 27, 2024
Viaarxiv icon

OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators

Add code
May 27, 2024
Viaarxiv icon

PASTA: Pretrained Action-State Transformer Agents

Add code
Jul 20, 2023
Viaarxiv icon

Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets

Add code
Jun 24, 2023
Viaarxiv icon

Model-based Offline Reinforcement Learning with Local Misspecification

Add code
Jan 26, 2023
Viaarxiv icon

Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data

Add code
Oct 16, 2022
Figure 1 for Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data
Figure 2 for Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data
Figure 3 for Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data
Figure 4 for Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data
Viaarxiv icon

Offline Policy Optimization with Eligible Actions

Add code
Jul 01, 2022
Figure 1 for Offline Policy Optimization with Eligible Actions
Figure 2 for Offline Policy Optimization with Eligible Actions
Figure 3 for Offline Policy Optimization with Eligible Actions
Figure 4 for Offline Policy Optimization with Eligible Actions
Viaarxiv icon

SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

Add code
Apr 20, 2022
Figure 1 for SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics
Figure 2 for SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics
Figure 3 for SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics
Figure 4 for SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics
Viaarxiv icon