Picture for Shaohan Huang

Shaohan Huang

The Era of Agentic Organization: Learning to Organize with Language Models

Add code
Oct 30, 2025
Viaarxiv icon

Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts

Add code
Oct 27, 2025
Viaarxiv icon

VibeVoice Technical Report

Add code
Aug 26, 2025
Viaarxiv icon

VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models

Add code
Aug 13, 2025
Viaarxiv icon

Geometric-Mean Policy Optimization

Add code
Jul 28, 2025
Viaarxiv icon

Reasoning with Exploration: An Entropy Perspective

Add code
Jun 17, 2025
Viaarxiv icon

On-Policy RL with Optimal Reward Baseline

Add code
May 29, 2025
Viaarxiv icon

Think Only When You Need with Large Hybrid-Reasoning Models

Add code
May 21, 2025
Viaarxiv icon

Reward Reasoning Model

Add code
May 20, 2025
Figure 1 for Reward Reasoning Model
Figure 2 for Reward Reasoning Model
Figure 3 for Reward Reasoning Model
Figure 4 for Reward Reasoning Model
Viaarxiv icon

Efficient RL Training for Reasoning Models via Length-Aware Optimization

Add code
May 18, 2025
Viaarxiv icon