Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis

Oct 17, 2024

Yiyi Chen, Qiongxiu Li, Russa Biswas, Johannes Bjerva

Figure 1 for Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis

Figure 2 for Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis

Figure 3 for Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis

Figure 4 for Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis

Share this with someone who'll enjoy it:

Abstract:Language Confusion is a phenomenon where Large Language Models (LLMs) generate text that is neither in the desired language, nor in a contextually appropriate language. This phenomenon presents a critical challenge in text generation by LLMs, often appearing as erratic and unpredictable behavior. We hypothesize that there are linguistic regularities to this inherent vulnerability in LLMs and shed light on patterns of language confusion across LLMs. We introduce a novel metric, Language Confusion Entropy, designed to directly measure and quantify this confusion, based on language distributions informed by linguistic typology and lexical variation. Comprehensive comparisons with the Language Confusion Benchmark (Marchisio et al., 2024) confirm the effectiveness of our metric, revealing patterns of language confusion across LLMs. We further link language confusion to LLM security, and find patterns in the case of multilingual embedding inversion attacks. Our analysis demonstrates that linguistic typology offers theoretically grounded interpretation, and valuable insights into leveraging language similarities as a prior for LLM alignment and security.

* 17 pages, 6 figures, 14 tables

View paper on

Share this with someone who'll enjoy it:

Title:Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis

Paper and Code