Picture for Yunhao Tang

Yunhao Tang

The Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes

Add code
Jan 15, 2026
Viaarxiv icon

Ministral 3

Add code
Jan 13, 2026
Viaarxiv icon

Voxtral

Add code
Jul 17, 2025
Viaarxiv icon

Magistral

Add code
Jun 12, 2025
Figure 1 for Magistral
Figure 2 for Magistral
Figure 3 for Magistral
Figure 4 for Magistral
Viaarxiv icon

On a few pitfalls in KL divergence gradient estimation for RL

Add code
Jun 11, 2025
Viaarxiv icon

LlamaRL: A Distributed Asynchronous Reinforcement Learning Framework for Efficient Large-scale LLM Trainin

Add code
May 29, 2025
Viaarxiv icon

RL-finetuning LLMs from on- and off-policy data with a single algorithm

Add code
Mar 25, 2025
Viaarxiv icon

Learning to chain-of-thought with Jensen's evidence lower bound

Add code
Mar 25, 2025
Viaarxiv icon

Optimizing Language Models for Inference Time Objectives using Reinforcement Learning

Add code
Mar 25, 2025
Viaarxiv icon

Soft Policy Optimization: Online Off-Policy RL for Sequence Models

Add code
Mar 07, 2025
Figure 1 for Soft Policy Optimization: Online Off-Policy RL for Sequence Models
Figure 2 for Soft Policy Optimization: Online Off-Policy RL for Sequence Models
Viaarxiv icon