Picture for Songjun Tu

Songjun Tu

Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation

Add code
Mar 17, 2025
Viaarxiv icon

Online Preference-based Reinforcement Learning with Self-augmented Feedback from Large Language Model

Add code
Dec 22, 2024
Viaarxiv icon

In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning

Add code
Dec 12, 2024
Viaarxiv icon