Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Guanqun Bi

CharacterBench: Benchmarking Character Customization of Large Language Models

Dec 16, 2024

Jinfeng Zhou, Yongkang Huang, Bosi Wen, Guanqun Bi, Yuxuan Chen, Pei Ke, Zhuang Chen, Xiyao Xiao, Libiao Peng, Kuntian Tang(+6 more)

Figure 1 for CharacterBench: Benchmarking Character Customization of Large Language Models

Figure 2 for CharacterBench: Benchmarking Character Customization of Large Language Models

Figure 3 for CharacterBench: Benchmarking Character Customization of Large Language Models

Figure 4 for CharacterBench: Benchmarking Character Customization of Large Language Models

Abstract:Character-based dialogue (aka role-playing) enables users to freely customize characters for interaction, which often relies on LLMs, raising the need to evaluate LLMs' character customization capability. However, existing benchmarks fail to ensure a robust evaluation as they often only involve a single character category or evaluate limited dimensions. Moreover, the sparsity of character features in responses makes feature-focused generative evaluation both ineffective and inefficient. To address these issues, we propose CharacterBench, the largest bilingual generative benchmark, with 22,859 human-annotated samples covering 3,956 characters from 25 detailed character categories. We define 11 dimensions of 6 aspects, classified as sparse and dense dimensions based on whether character features evaluated by specific dimensions manifest in each response. We enable effective and efficient evaluation by crafting tailored queries for each dimension to induce characters' responses related to specific dimensions. Further, we develop CharacterJudge model for cost-effective and stable evaluations. Experiments show its superiority over SOTA automatic judges (e.g., GPT-4) and our benchmark's potential to optimize LLMs' character customization. Our repository is at https://github.com/thu-coai/CharacterBench.

* AAAI 2025

Via

Access Paper or Ask Questions

ToMBench: Benchmarking Theory of Mind in Large Language Models

Feb 23, 2024

Zhuang Chen, Jincenzi Wu, Jinfeng Zhou, Bosi Wen, Guanqun Bi, Gongyao Jiang, Yaru Cao, Mengting Hu, Yunghwei Lai, Zexuan Xiong(+1 more)

Abstract:Theory of Mind (ToM) is the cognitive capability to perceive and ascribe mental states to oneself and others. Recent research has sparked a debate over whether large language models (LLMs) exhibit a form of ToM. However, existing ToM evaluations are hindered by challenges such as constrained scope, subjective judgment, and unintended contamination, yielding inadequate assessments. To address this gap, we introduce ToMBench with three key characteristics: a systematic evaluation framework encompassing 8 tasks and 31 abilities in social cognition, a multiple-choice question format to support automated and unbiased evaluation, and a build-from-scratch bilingual inventory to strictly avoid data leakage. Based on ToMBench, we conduct extensive experiments to evaluate the ToM performance of 10 popular LLMs across tasks and abilities. We find that even the most advanced LLMs like GPT-4 lag behind human performance by over 10% points, indicating that LLMs have not achieved a human-level theory of mind yet. Our aim with ToMBench is to enable an efficient and effective evaluation of LLMs' ToM capabilities, thereby facilitating the development of LLMs with inherent social intelligence.

* Under review

Via

Access Paper or Ask Questions

A Group Fairness Lens for Large Language Models

Dec 24, 2023

Guanqun Bi, Lei Shen, Yuqiang Xie, Yanan Cao, Tiangang Zhu, Xiaodong He

Abstract:The rapid advancement of large language models has revolutionized various applications but also raised crucial concerns about their potential to perpetuate biases and unfairness when deployed in social media contexts. Evaluating LLMs' potential biases and fairness has become crucial, as existing methods rely on limited prompts focusing on just a few groups, lacking a comprehensive categorical perspective. In this paper, we propose evaluating LLM biases from a group fairness lens using a novel hierarchical schema characterizing diverse social groups. Specifically, we construct a dataset, GFair, encapsulating target-attribute combinations across multiple dimensions. In addition, we introduce statement organization, a new open-ended text generation task, to uncover complex biases in LLMs. Extensive evaluations of popular LLMs reveal inherent safety concerns. To mitigate the biases of LLM from a group fairness perspective, we pioneer a novel chain-of-thought method GF-Think to mitigate biases of LLMs from a group fairness perspective. Experimental results demonstrate its efficacy in mitigating bias in LLMs to achieve fairness.

* Work in progress

Via

Access Paper or Ask Questions

DiffusEmp: A Diffusion Model-Based Framework with Multi-Grained Control for Empathetic Response Generation

Jun 02, 2023

Guanqun Bi, Lei Shen, Yanan Cao, Meng Chen, Yuqiang Xie, Zheng Lin, Xiaodong He

Figure 1 for DiffusEmp: A Diffusion Model-Based Framework with Multi-Grained Control for Empathetic Response Generation

Figure 2 for DiffusEmp: A Diffusion Model-Based Framework with Multi-Grained Control for Empathetic Response Generation

Figure 3 for DiffusEmp: A Diffusion Model-Based Framework with Multi-Grained Control for Empathetic Response Generation

Figure 4 for DiffusEmp: A Diffusion Model-Based Framework with Multi-Grained Control for Empathetic Response Generation

Abstract:Empathy is a crucial factor in open-domain conversations, which naturally shows one's caring and understanding to others. Though several methods have been proposed to generate empathetic responses, existing works often lead to monotonous empathy that refers to generic and safe expressions. In this paper, we propose to use explicit control to guide the empathy expression and design a framework DiffusEmp based on conditional diffusion language model to unify the utilization of dialogue context and attribute-oriented control signals. Specifically, communication mechanism, intent, and semantic frame are imported as multi-grained signals that control the empathy realization from coarse to fine levels. We then design a specific masking strategy to reflect the relationship between multi-grained signals and response tokens, and integrate it into the diffusion model to influence the generative process. Experimental results on a benchmark dataset EmpatheticDialogue show that our framework outperforms competitive baselines in terms of controllability, informativeness, and diversity without the loss of context-relatedness.

* accepted by ACL 2023 main conference (Oral)

Via

Access Paper or Ask Questions

Psychology-guided Controllable Story Generation

Oct 14, 2022

Yuqiang Xie, Yue Hu, Yunpeng Li, Guanqun Bi, Luxi Xing, Wei Peng

Figure 1 for Psychology-guided Controllable Story Generation

Figure 2 for Psychology-guided Controllable Story Generation

Figure 3 for Psychology-guided Controllable Story Generation

Figure 4 for Psychology-guided Controllable Story Generation

Abstract:Controllable story generation is a challenging task in the field of NLP, which has attracted increasing research interest in recent years. However, most existing works generate a whole story conditioned on the appointed keywords or emotions, ignoring the psychological changes of the protagonist. Inspired by psychology theories, we introduce global psychological state chains, which include the needs and emotions of the protagonists, to help a story generation system create more controllable and well-planned stories. In this paper, we propose a Psychology-guIded Controllable Story Generation System (PICS) to generate stories that adhere to the given leading context and desired psychological state chains for the protagonist. Specifically, psychological state trackers are employed to memorize the protagonist's local psychological states to capture their inner temporal relationships. In addition, psychological state planners are adopted to gain the protagonist's global psychological states for story planning. Eventually, a psychology controller is designed to integrate the local and global psychological states into the story context representation for composing psychology-guided stories. Automatic and manual evaluations demonstrate that PICS outperforms baselines, and each part of PICS shows effectiveness for writing stories with more consistent psychological changes.

* Accepted by COLING 2022

Via

Access Paper or Ask Questions

COMMA: Modeling Relationship among Motivations, Emotions and Actions in Language-based Human Activities

Sep 14, 2022

Yuqiang Xie, Yue Hu, Wei Peng, Guanqun Bi, Luxi Xing

Figure 1 for COMMA: Modeling Relationship among Motivations, Emotions and Actions in Language-based Human Activities

Figure 2 for COMMA: Modeling Relationship among Motivations, Emotions and Actions in Language-based Human Activities

Figure 3 for COMMA: Modeling Relationship among Motivations, Emotions and Actions in Language-based Human Activities

Figure 4 for COMMA: Modeling Relationship among Motivations, Emotions and Actions in Language-based Human Activities

Abstract:Motivations, emotions, and actions are inter-related essential factors in human activities. While motivations and emotions have long been considered at the core of exploring how people take actions in human activities, there has been relatively little research supporting analyzing the relationship between human mental states and actions. We present the first study that investigates the viability of modeling motivations, emotions, and actions in language-based human activities, named COMMA (Cognitive Framework of Human Activities). Guided by COMMA, we define three natural language processing tasks (emotion understanding, motivation understanding and conditioned action generation), and build a challenging dataset Hail through automatically extracting samples from Story Commonsense. Experimental results on NLP applications prove the effectiveness of modeling the relationship. Furthermore, our models inspired by COMMA can better reveal the essential relationship among motivations, emotions and actions than existing methods.

* Accepted to COLING 2022

Via

Access Paper or Ask Questions

How Does Knowledge Graph Embedding Extrapolate to Unseen Data: a Semantic Evidence View

Sep 24, 2021

Ren Li, Yanan Cao, Qiannan Zhu, Guanqun Bi, Fang Fang, Yi Liu, Qian Li

Figure 1 for How Does Knowledge Graph Embedding Extrapolate to Unseen Data: a Semantic Evidence View

Figure 2 for How Does Knowledge Graph Embedding Extrapolate to Unseen Data: a Semantic Evidence View

Figure 3 for How Does Knowledge Graph Embedding Extrapolate to Unseen Data: a Semantic Evidence View

Figure 4 for How Does Knowledge Graph Embedding Extrapolate to Unseen Data: a Semantic Evidence View

Abstract:Knowledge Graph Embedding (KGE) aims to learn representations for entities and relations. Most KGE models have gained great success, especially on extrapolation scenarios. Specifically, given an unseen triple (h, r, t), a trained model can still correctly predict t from (h, r, ?), or h from (?, r, t), such extrapolation ability is impressive. However, most existing KGE works focus on the design of delicate triple modeling function, which mainly tell us how to measure the plausibility of observed triples, but we have limited understanding of why the methods can extrapolate to unseen data, and what are the important factors to help KGE extrapolate. Therefore in this work, we attempt to, from a data relevant view, study KGE extrapolation of two problems: 1. How does KGE extrapolate to unseen data? 2. How to design the KGE model with better extrapolation ability? For the problem 1, we first discuss the impact factors for extrapolation and from relation, entity and triple level respectively, propose three Semantic Evidences (SEs), which can be observed from training set and provide important semantic information for extrapolation to unseen data. Then we verify the effectiveness of SEs through extensive experiments on several typical KGE methods, and demonstrate that SEs serve as an important role for understanding the extrapolation ability of KGE. For the problem 2, to make better use of the SE information for more extrapolative knowledge representation, we propose a novel GNN-based KGE model, called Semantic Evidence aware Graph Neural Network (SE-GNN). Finally, through extensive experiments on FB15k-237 and WN18RR datasets, we show that SE-GNN achieves state-of-the-art performance on Knowledge Graph Completion task and perform a better extrapolation ability.

* Main paper: 7 pages, References: 2 pages, Appendix: 2 pages

Via

Access Paper or Ask Questions