Picture for Dongbin Zhao

Dongbin Zhao

SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning

Add code
Jun 24, 2025
Viaarxiv icon

DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy

Add code
Jun 11, 2025
Viaarxiv icon

TeViR: Text-to-Video Reward with Diffusion Models for Efficient Reinforcement Learning

Add code
May 26, 2025
Viaarxiv icon

ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving

Add code
May 26, 2025
Viaarxiv icon

Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL

Add code
May 16, 2025
Viaarxiv icon

UncAD: Towards Safe End-to-end Autonomous Driving via Online Map Uncertainty

Add code
Apr 17, 2025
Viaarxiv icon

Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation

Add code
Mar 17, 2025
Viaarxiv icon

FetchBot: Object Fetching in Cluttered Shelves via Zero-Shot Sim2Real

Add code
Feb 25, 2025
Viaarxiv icon

ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy

Add code
Feb 08, 2025
Figure 1 for ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy
Figure 2 for ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy
Figure 3 for ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy
Figure 4 for ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy
Viaarxiv icon

Dream to Drive with Predictive Individual World Model

Add code
Jan 28, 2025
Viaarxiv icon