Picture for Rishabh Joshi

Rishabh Joshi

Evolving Alignment via Asymmetric Self-Play

Add code
Oct 31, 2024
Viaarxiv icon

Preference Optimization as Probabilistic Inference

Add code
Oct 05, 2024
Viaarxiv icon

Building Math Agents with Multi-Turn Iterative Preference Learning

Add code
Sep 04, 2024
Figure 1 for Building Math Agents with Multi-Turn Iterative Preference Learning
Figure 2 for Building Math Agents with Multi-Turn Iterative Preference Learning
Figure 3 for Building Math Agents with Multi-Turn Iterative Preference Learning
Figure 4 for Building Math Agents with Multi-Turn Iterative Preference Learning
Viaarxiv icon

Offline Regularised Reinforcement Learning for Large Language Models Alignment

Add code
May 29, 2024
Viaarxiv icon

Human Alignment of Large Language Models through Online Preference Optimisation

Add code
Mar 13, 2024
Viaarxiv icon

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

LiPO: Listwise Preference Optimization through Learning-to-Rank

Add code
Feb 02, 2024
Figure 1 for LiPO: Listwise Preference Optimization through Learning-to-Rank
Figure 2 for LiPO: Listwise Preference Optimization through Learning-to-Rank
Figure 3 for LiPO: Listwise Preference Optimization through Learning-to-Rank
Figure 4 for LiPO: Listwise Preference Optimization through Learning-to-Rank
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

Calibrating Likelihoods towards Consistency in Summarization Models

Add code
Oct 12, 2023
Viaarxiv icon

Statistical Rejection Sampling Improves Preference Optimization

Add code
Sep 13, 2023
Viaarxiv icon