Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:AlignBench: Benchmarking Chinese Alignment of Large Language Models

Dec 05, 2023

Xiao Liu, Xuanyu Lei, Shengyuan Wang, Yue Huang, Zhuoer Feng, Bosi Wen, Jiale Cheng, Pei Ke, Yifan Xu, Weng Lam Tam(+7 more)

Figure 1 for AlignBench: Benchmarking Chinese Alignment of Large Language Models

Figure 2 for AlignBench: Benchmarking Chinese Alignment of Large Language Models

Figure 3 for AlignBench: Benchmarking Chinese Alignment of Large Language Models

Figure 4 for AlignBench: Benchmarking Chinese Alignment of Large Language Models

Share this with someone who'll enjoy it:

Abstract:Alignment has become a critical step for instruction-tuned Large Language Models (LLMs) to become helpful assistants. However, effective evaluation of alignment for emerging Chinese LLMs is still significantly lacking, calling for real-scenario grounded, open-ended, challenging and automatic evaluations tailored for alignment. To fill in this gap, we introduce AlignBench, a comprehensive multi-dimensional benchmark for evaluating LLMs' alignment in Chinese. Equipped with a human-in-the-loop data curation pipeline, our benchmark employs a rule-calibrated multi-dimensional LLM-as-Judge with Chain-of-Thought to generate explanations and final ratings as evaluations, ensuring high reliability and interpretability. Furthermore, we report AlignBench evaluated by CritiqueLLM, a dedicated Chinese evaluator LLM that recovers 95% of GPT-4's evaluation ability. We will provide public APIs for evaluating AlignBench with CritiqueLLM to facilitate the evaluation of LLMs' Chinese alignment. All evaluation codes, data, and LLM generations are available at \url{https://github.com/THUDM/AlignBench}.

View paper on

Share this with someone who'll enjoy it:

Title:AlignBench: Benchmarking Chinese Alignment of Large Language Models

Paper and Code