Picture for David D. Yao

David D. Yao

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization

Add code
Oct 05, 2024
Viaarxiv icon

Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey

Add code
Sep 17, 2024
Viaarxiv icon

Scores as Actions: a framework of fine-tuning diffusion models by continuous-time reinforcement learning

Add code
Sep 12, 2024
Viaarxiv icon

Policy Optimization for Continuous Reinforcement Learning

Add code
Jun 02, 2023
Viaarxiv icon