Picture for Milad Aghajohari

Milad Aghajohari

VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

Add code
Oct 02, 2024
Figure 1 for VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
Figure 2 for VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
Figure 3 for VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
Figure 4 for VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
Viaarxiv icon

Advantage Alignment Algorithms

Add code
Jun 20, 2024
Figure 1 for Advantage Alignment Algorithms
Figure 2 for Advantage Alignment Algorithms
Figure 3 for Advantage Alignment Algorithms
Figure 4 for Advantage Alignment Algorithms
Viaarxiv icon

LOQA: Learning with Opponent Q-Learning Awareness

Add code
May 02, 2024
Figure 1 for LOQA: Learning with Opponent Q-Learning Awareness
Figure 2 for LOQA: Learning with Opponent Q-Learning Awareness
Figure 3 for LOQA: Learning with Opponent Q-Learning Awareness
Figure 4 for LOQA: Learning with Opponent Q-Learning Awareness
Viaarxiv icon

Best Response Shaping

Add code
Apr 05, 2024
Viaarxiv icon

Meta-Value Learning: a General Framework for Learning with Learning Awareness

Add code
Jul 17, 2023
Viaarxiv icon

Riemannian Diffusion Models

Add code
Aug 16, 2022
Figure 1 for Riemannian Diffusion Models
Figure 2 for Riemannian Diffusion Models
Figure 3 for Riemannian Diffusion Models
Figure 4 for Riemannian Diffusion Models
Viaarxiv icon