Picture for Haosheng Zou

Haosheng Zou

Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond

Add code
Mar 13, 2025
Viaarxiv icon

Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision

Add code
Feb 28, 2025
Viaarxiv icon

Reward Shaping via Meta-Learning

Add code
Jan 27, 2019
Figure 1 for Reward Shaping via Meta-Learning
Figure 2 for Reward Shaping via Meta-Learning
Figure 3 for Reward Shaping via Meta-Learning
Figure 4 for Reward Shaping via Meta-Learning
Viaarxiv icon

Understanding Human Behaviors in Crowds by Imitating the Decision-Making Process

Add code
Jan 25, 2018
Figure 1 for Understanding Human Behaviors in Crowds by Imitating the Decision-Making Process
Figure 2 for Understanding Human Behaviors in Crowds by Imitating the Decision-Making Process
Figure 3 for Understanding Human Behaviors in Crowds by Imitating the Decision-Making Process
Figure 4 for Understanding Human Behaviors in Crowds by Imitating the Decision-Making Process
Viaarxiv icon

The YouTube-8M Kaggle Competition: Challenges and Methods

Add code
Jul 13, 2017
Figure 1 for The YouTube-8M Kaggle Competition: Challenges and Methods
Figure 2 for The YouTube-8M Kaggle Competition: Challenges and Methods
Figure 3 for The YouTube-8M Kaggle Competition: Challenges and Methods
Figure 4 for The YouTube-8M Kaggle Competition: Challenges and Methods
Viaarxiv icon