Picture for Rong Bao

Rong Bao

RMB: Comprehensively Benchmarking Reward Models in LLM Alignment

Add code
Oct 13, 2024
Figure 1 for RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
Figure 2 for RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
Figure 3 for RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
Figure 4 for RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
Viaarxiv icon

Aligning Large Language Models from Self-Reference AI Feedback with one General Principle

Add code
Jun 17, 2024
Viaarxiv icon

Mitigating Reward Hacking via Information-Theoretic Reward Modeling

Add code
Feb 16, 2024
Viaarxiv icon

Orthogonal Subspace Learning for Language Model Continual Learning

Add code
Oct 22, 2023
Viaarxiv icon

Robust Lottery Tickets for Pre-trained Language Models

Add code
Nov 06, 2022
Viaarxiv icon