Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhen-Hui Liu

Peer-review-in-LLMs: Automatic Evaluation Method for LLMs in Open-environment

Feb 02, 2024

Kun-Peng Ning, Shuo Yang, Yu-Yang Liu, Jia-Yu Yao, Zhen-Hui Liu, Yu Wang, Ming Pang, Li Yuan

Figure 1 for Peer-review-in-LLMs: Automatic Evaluation Method for LLMs in Open-environment

Figure 2 for Peer-review-in-LLMs: Automatic Evaluation Method for LLMs in Open-environment

Figure 3 for Peer-review-in-LLMs: Automatic Evaluation Method for LLMs in Open-environment

Figure 4 for Peer-review-in-LLMs: Automatic Evaluation Method for LLMs in Open-environment

Abstract:Existing large language models (LLMs) evaluation methods typically focus on testing the performance on some closed-environment and domain-specific benchmarks with human annotations. In this paper, we explore a novel unsupervised evaluation direction, utilizing peer-review mechanisms to measure LLMs automatically. In this setting, both open-source and closed-source LLMs lie in the same environment, capable of answering unlabeled questions and evaluating each other, where each LLM's response score is jointly determined by other anonymous ones. To obtain the ability hierarchy among these models, we assign each LLM a learnable capability parameter to adjust the final ranking. We formalize it as a constrained optimization problem, intending to maximize the consistency of each LLM's capabilities and scores. The key assumption behind is that high-level LLM can evaluate others' answers more accurately than low-level ones, while higher-level LLM can also achieve higher response scores. Moreover, we propose three metrics called PEN, CIN, and LIS to evaluate the gap in aligning human rankings. We perform experiments on multiple datasets with these metrics, validating the effectiveness of the proposed approach.

Via

Access Paper or Ask Questions

LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples

Oct 04, 2023

Jia-Yu Yao, Kun-Peng Ning, Zhen-Hui Liu, Mu-Nan Ning, Li Yuan

Figure 1 for LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples

Figure 2 for LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples

Figure 3 for LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples

Figure 4 for LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples

Abstract:Large Language Models (LLMs), including GPT-3.5, LLaMA, and PaLM, seem to be knowledgeable and able to adapt to many tasks. However, we still can not completely trust their answer, since LLMs suffer from hallucination--fabricating non-existent facts to cheat users without perception. And the reasons for their existence and pervasiveness remain unclear. In this paper, we demonstrate that non-sense prompts composed of random tokens can also elicit the LLMs to respond with hallucinations. This phenomenon forces us to revisit that hallucination may be another view of adversarial examples, and it shares similar features with conventional adversarial examples as the basic feature of LLMs. Therefore, we formalize an automatic hallucination triggering method as the hallucination attack in an adversarial way. Finally, we explore basic feature of attacked adversarial prompts and propose a simple yet effective defense strategy. Our code is released on GitHub.

Via

Access Paper or Ask Questions