Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jijun Zhang

RoleBreak: Character Hallucination as a Jailbreak Attack in Role-Playing Systems

Sep 25, 2024

Yihong Tang, Bo Wang, Xu Wang, Dongming Zhao, Jing Liu, Jijun Zhang, Ruifang He, Yuexian Hou

Figure 1 for RoleBreak: Character Hallucination as a Jailbreak Attack in Role-Playing Systems

Figure 2 for RoleBreak: Character Hallucination as a Jailbreak Attack in Role-Playing Systems

Figure 3 for RoleBreak: Character Hallucination as a Jailbreak Attack in Role-Playing Systems

Figure 4 for RoleBreak: Character Hallucination as a Jailbreak Attack in Role-Playing Systems

Abstract:Role-playing systems powered by large language models (LLMs) have become increasingly influential in emotional communication applications. However, these systems are susceptible to character hallucinations, where the model deviates from predefined character roles and generates responses that are inconsistent with the intended persona. This paper presents the first systematic analysis of character hallucination from an attack perspective, introducing the RoleBreak framework. Our framework identifies two core mechanisms-query sparsity and role-query conflict-as key factors driving character hallucination. Leveraging these insights, we construct a novel dataset, RoleBreakEval, to evaluate existing hallucination mitigation techniques. Our experiments reveal that even enhanced models trained to minimize hallucination remain vulnerable to attacks. To address these vulnerabilities, we propose a novel defence strategy, the Narrator Mode, which generates supplemental context through narration to mitigate role-query conflicts and improve query generalization. Experimental results demonstrate that Narrator Mode significantly outperforms traditional refusal-based strategies by reducing hallucinations, enhancing fidelity to character roles and queries, and improving overall narrative coherence.

Via

Access Paper or Ask Questions

MORPHEUS: Modeling Role from Personalized Dialogue History by Exploring and Utilizing Latent Space

Jul 02, 2024

Yihong Tang, Bo Wang, Dongming Zhao, Xiaojia Jin, Jijun Zhang, Ruifang He, Yuexian Hou

Figure 1 for MORPHEUS: Modeling Role from Personalized Dialogue History by Exploring and Utilizing Latent Space

Figure 2 for MORPHEUS: Modeling Role from Personalized Dialogue History by Exploring and Utilizing Latent Space

Figure 3 for MORPHEUS: Modeling Role from Personalized Dialogue History by Exploring and Utilizing Latent Space

Figure 4 for MORPHEUS: Modeling Role from Personalized Dialogue History by Exploring and Utilizing Latent Space

Abstract:Personalized Dialogue Generation (PDG) aims to create coherent responses according to roles or personas. Traditional PDG relies on external role data, which can be scarce and raise privacy concerns. Approaches address these issues by extracting role information from dialogue history, which often fail to generically model roles in continuous space. To overcome these limitations, we introduce a novel framework \textbf{MO}dels \textbf{R}oles from \textbf{P}ersonalized Dialogue \textbf{H}istory by \textbf{E}xploring and \textbf{U}tilizing Latent \textbf{S}pace (MORPHEUS) through a three-stage training process. Specifically, we create a persona codebook to represent roles in latent space compactly, and this codebook is used to construct a posterior distribution of role information. This method enables the model to generalize across roles, allowing the generation of personalized dialogues even for unseen roles. Experiments on both Chinese and English datasets demonstrate that MORPHEUS enhances the extraction of role information, and improves response generation without external role data. Additionally, MORPHEUS can be considered an efficient fine-tuning for large language models.

Via

Access Paper or Ask Questions