Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jens Grossklags

PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics

Apr 06, 2024

Derui Zhu, Dingfan Chen, Qing Li, Zongxiong Chen, Lei Ma, Jens Grossklags, Mario Fritz

Figure 1 for PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics

Figure 2 for PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics

Figure 3 for PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics

Figure 4 for PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics

Abstract:Despite tremendous advancements in large language models (LLMs) over recent years, a notably urgent challenge for their practical deployment is the phenomenon of hallucination, where the model fabricates facts and produces non-factual statements. In response, we propose PoLLMgraph, a Polygraph for LLMs, as an effective model-based white-box detection and forecasting approach. PoLLMgraph distinctly differs from the large body of existing research that concentrates on addressing such challenges through black-box evaluations. In particular, we demonstrate that hallucination can be effectively detected by analyzing the LLM's internal state transition dynamics during generation via tractable probabilistic models. Experimental results on various open-source LLMs confirm the efficacy of PoLLMgraph, outperforming state-of-the-art methods by a considerable margin, evidenced by over 20% improvement in AUC-ROC on common benchmarking datasets like TruthfulQA. Our work paves a new way for model-based white-box analysis of LLMs, motivating the research community to further explore, understand, and refine the intricate dynamics of LLM behaviors.

* 15 pages

Via

Access Paper or Ask Questions

Data Forensics in Diffusion Models: A Systematic Analysis of Membership Privacy

Feb 15, 2023

Derui Zhu, Dingfan Chen, Jens Grossklags, Mario Fritz

Abstract:In recent years, diffusion models have achieved tremendous success in the field of image generation, becoming the stateof-the-art technology for AI-based image processing applications. Despite the numerous benefits brought by recent advances in diffusion models, there are also concerns about their potential misuse, specifically in terms of privacy breaches and intellectual property infringement. In particular, some of their unique characteristics open up new attack surfaces when considering the real-world deployment of such models. With a thorough investigation of the attack vectors, we develop a systematic analysis of membership inference attacks on diffusion models and propose novel attack methods tailored to each attack scenario specifically relevant to diffusion models. Our approach exploits easily obtainable quantities and is highly effective, achieving near-perfect attack performance (>0.9 AUCROC) in realistic scenarios. Our extensive experiments demonstrate the effectiveness of our method, highlighting the importance of considering privacy and intellectual property risks when using diffusion models in image generation tasks.

Via

Access Paper or Ask Questions