Picture for Banghua Zhu

Banghua Zhu

How to Evaluate Reward Models for RLHF

Add code
Oct 18, 2024
Viaarxiv icon

Taming Overconfidence in LLMs: Reward Calibration in RLHF

Add code
Oct 13, 2024
Figure 1 for Taming Overconfidence in LLMs: Reward Calibration in RLHF
Figure 2 for Taming Overconfidence in LLMs: Reward Calibration in RLHF
Figure 3 for Taming Overconfidence in LLMs: Reward Calibration in RLHF
Figure 4 for Taming Overconfidence in LLMs: Reward Calibration in RLHF
Viaarxiv icon

From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline

Add code
Jun 17, 2024
Viaarxiv icon

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Add code
Mar 07, 2024
Figure 1 for Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Figure 2 for Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Figure 3 for Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Figure 4 for Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Viaarxiv icon

Generative AI Security: Challenges and Countermeasures

Add code
Feb 20, 2024
Figure 1 for Generative AI Security: Challenges and Countermeasures
Figure 2 for Generative AI Security: Challenges and Countermeasures
Figure 3 for Generative AI Security: Challenges and Countermeasures
Figure 4 for Generative AI Security: Challenges and Countermeasures
Viaarxiv icon

Efficient Prompt Caching via Embedding Similarity

Add code
Feb 02, 2024
Viaarxiv icon

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

Add code
Jan 29, 2024
Figure 1 for Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Figure 2 for Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Figure 3 for Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Figure 4 for Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Viaarxiv icon

Fairness in Serving Large Language Models

Add code
Dec 31, 2023
Viaarxiv icon

The Effective Horizon Explains Deep RL Performance in Stochastic Environments

Add code
Dec 13, 2023
Viaarxiv icon

Towards Optimal Statistical Watermarking

Add code
Dec 13, 2023
Viaarxiv icon