Picture for Josef Dai

Josef Dai

PKU-SafeRLHF: A Safety Alignment Preference Dataset for Llama Family Models

Add code
Jun 20, 2024
Viaarxiv icon

SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset

Add code
Jun 20, 2024
Viaarxiv icon

Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective

Add code
Feb 20, 2024
Viaarxiv icon

Safe RLHF: Safe Reinforcement Learning from Human Feedback

Add code
Oct 19, 2023
Viaarxiv icon