Picture for Kaihui Chen

Kaihui Chen

TSO: Self-Training with Scaled Preference Optimization

Add code
Aug 31, 2024
Viaarxiv icon

Towards Comprehensive Preference Data Collection for Reward Modeling

Add code
Jun 24, 2024
Viaarxiv icon