Picture for Songjun Tu

Songjun Tu

Online Preference-based Reinforcement Learning with Self-augmented Feedback from Large Language Model

Add code
Dec 22, 2024
Viaarxiv icon

In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning

Add code
Dec 12, 2024
Viaarxiv icon