Picture for David Zhu

David Zhu

Rethinking Data Synthesis: A Teacher Model Training Recipe with Interpretation

Add code
Oct 27, 2024
Viaarxiv icon

Optimal Reward Labeling: Bridging Offline Preference and Reward-Based Reinforcement Learning

Add code
Jun 14, 2024
Viaarxiv icon