Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Characterizing Large Language Model Geometry Solves Toxicity Detection and Generation

Dec 04, 2023

Randall Balestriero, Romain Cosentino, Sarath Shekkizhar

Figure 1 for Characterizing Large Language Model Geometry Solves Toxicity Detection and Generation

Figure 2 for Characterizing Large Language Model Geometry Solves Toxicity Detection and Generation

Figure 3 for Characterizing Large Language Model Geometry Solves Toxicity Detection and Generation

Figure 4 for Characterizing Large Language Model Geometry Solves Toxicity Detection and Generation

Share this with someone who'll enjoy it:

Abstract:Large Language Models~(LLMs) drive current AI breakthroughs despite very little being known about their internal representations, e.g., how to extract a few informative features to solve various downstream tasks. To provide a practical and principled answer, we propose to characterize LLMs from a geometric perspective. We obtain in closed form (i) the intrinsic dimension in which the Multi-Head Attention embeddings are constrained to exist and (ii) the partition and per-region affine mappings of the per-layer feedforward networks. Our results are informative, do not rely on approximations, and are actionable. First, we show that, motivated by our geometric interpretation, we can bypass Llama$2$'s RLHF by controlling its embedding's intrinsic dimension through informed prompt manipulation. Second, we derive $7$ interpretable spline features that can be extracted from any (pre-trained) LLM layer, providing a rich abstract representation of their inputs. Those features alone ($224$ for Mistral-7B and Llama$2$-7B) are sufficient to help solve toxicity detection, infer the domain of the prompt, and even tackle the Jigsaw challenge, which aims at characterizing the type of toxicity of various prompts. Our results demonstrate how, even in large-scale regimes, exact theoretical results can answer practical questions in language models. Code: \url{https://github.com/RandallBalestriero/SplineLLM}.

View paper on

Share this with someone who'll enjoy it:

Title:Characterizing Large Language Model Geometry Solves Toxicity Detection and Generation

Paper and Code