Picture for Johan Obando-Ceron

Johan Obando-Ceron

A Comedy of Estimators: On KL Regularization in RL Training of LLMs

Add code
Dec 26, 2025
Viaarxiv icon

Grounding Computer Use Agents on Human Demonstrations

Add code
Nov 10, 2025
Figure 1 for Grounding Computer Use Agents on Human Demonstrations
Figure 2 for Grounding Computer Use Agents on Human Demonstrations
Figure 3 for Grounding Computer Use Agents on Human Demonstrations
Figure 4 for Grounding Computer Use Agents on Human Demonstrations
Viaarxiv icon

Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning

Add code
Oct 02, 2025
Viaarxiv icon

Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

Add code
Sep 30, 2025
Viaarxiv icon

Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning

Add code
Jun 18, 2025
Figure 1 for Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning
Figure 2 for Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning
Figure 3 for Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning
Figure 4 for Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning
Viaarxiv icon

The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning

Add code
Jun 16, 2025
Figure 1 for The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning
Figure 2 for The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning
Figure 3 for The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning
Figure 4 for The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning
Viaarxiv icon

Measure gradients, not activations! Enhancing neuronal activity in deep reinforcement learning

Add code
May 29, 2025
Viaarxiv icon

Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation

Add code
Apr 09, 2025
Viaarxiv icon

Adaptive Computation Pruning for the Forgetting Transformer

Add code
Apr 09, 2025
Viaarxiv icon

Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training

Add code
Mar 24, 2025
Viaarxiv icon