Picture for Dian Yu

Dian Yu

Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning

Add code
Jan 26, 2026
Viaarxiv icon

Stable and Efficient Single-Rollout RL for Multimodal Reasoning

Add code
Dec 20, 2025
Figure 1 for Stable and Efficient Single-Rollout RL for Multimodal Reasoning
Figure 2 for Stable and Efficient Single-Rollout RL for Multimodal Reasoning
Figure 3 for Stable and Efficient Single-Rollout RL for Multimodal Reasoning
Figure 4 for Stable and Efficient Single-Rollout RL for Multimodal Reasoning
Viaarxiv icon

CLUE: Non-parametric Verification from Experience via Hidden-State Clustering

Add code
Oct 02, 2025
Figure 1 for CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
Figure 2 for CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
Figure 3 for CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
Figure 4 for CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
Viaarxiv icon

VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning

Add code
Oct 01, 2025
Figure 1 for VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning
Figure 2 for VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning
Figure 3 for VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning
Figure 4 for VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning
Viaarxiv icon

Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

Add code
Sep 18, 2025
Viaarxiv icon

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

Add code
Sep 11, 2025
Figure 1 for CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
Figure 2 for CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
Figure 3 for CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
Figure 4 for CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
Viaarxiv icon

Self-Rewarding Vision-Language Model via Reasoning Decomposition

Add code
Aug 27, 2025
Figure 1 for Self-Rewarding Vision-Language Model via Reasoning Decomposition
Figure 2 for Self-Rewarding Vision-Language Model via Reasoning Decomposition
Figure 3 for Self-Rewarding Vision-Language Model via Reasoning Decomposition
Figure 4 for Self-Rewarding Vision-Language Model via Reasoning Decomposition
Viaarxiv icon

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

Add code
Apr 15, 2025
Figure 1 for DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
Figure 2 for DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
Figure 3 for DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
Figure 4 for DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
Viaarxiv icon

Safe Flow Matching: Robot Motion Planning with Control Barrier Functions

Add code
Apr 11, 2025
Viaarxiv icon

Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains

Add code
Apr 01, 2025
Figure 1 for Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
Figure 2 for Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
Figure 3 for Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
Figure 4 for Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
Viaarxiv icon