Picture for Dongbin Zhao

Dongbin Zhao

Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation

Add code
Mar 17, 2025
Viaarxiv icon

FetchBot: Object Fetching in Cluttered Shelves via Zero-Shot Sim2Real

Add code
Feb 25, 2025
Viaarxiv icon

ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy

Add code
Feb 08, 2025
Figure 1 for ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy
Figure 2 for ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy
Figure 3 for ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy
Figure 4 for ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy
Viaarxiv icon

Dream to Drive with Predictive Individual World Model

Add code
Jan 28, 2025
Viaarxiv icon

Online Preference-based Reinforcement Learning with Self-augmented Feedback from Large Language Model

Add code
Dec 22, 2024
Viaarxiv icon

In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning

Add code
Dec 12, 2024
Viaarxiv icon

Preliminary Investigation into Data Scaling Laws for Imitation Learning-Based End-to-End Autonomous Driving

Add code
Dec 03, 2024
Viaarxiv icon

Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement

Add code
Oct 15, 2024
Viaarxiv icon

SELU: Self-Learning Embodied MLLMs in Unknown Environments

Add code
Oct 04, 2024
Figure 1 for SELU: Self-Learning Embodied MLLMs in Unknown Environments
Figure 2 for SELU: Self-Learning Embodied MLLMs in Unknown Environments
Figure 3 for SELU: Self-Learning Embodied MLLMs in Unknown Environments
Figure 4 for SELU: Self-Learning Embodied MLLMs in Unknown Environments
Viaarxiv icon

Discretizing Continuous Action Space with Unimodal Probability Distributions for On-Policy Reinforcement Learning

Add code
Aug 01, 2024
Viaarxiv icon