Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daniel Scalena

A gentle push funziona benissimo: making instructed models in Italian via contrastive activation steering

Nov 27, 2024

Daniel Scalena, Elisabetta Fersini, Malvina Nissim

Abstract:Adapting models to a language that was only partially present in the pre-training data requires fine-tuning, which is expensive in terms of both data and computational resources. As an alternative to fine-tuning, we explore the potential of activation steering-based techniques to enhance model performance on Italian tasks. Through our experiments we show that Italian steering (i) can be successfully applied to different models, (ii) achieves performances comparable to, or even better than, fine-tuned models for Italian, and (iii) yields higher quality and consistency in Italian generations. We also discuss the utility of steering and fine-tuning in the contemporary LLM landscape where models are anyway getting high Italian performances even if not explicitly trained in this language.

Via

Access Paper or Ask Questions

Multi-property Steering of Large Language Models with Dynamic Activation Composition

Jun 25, 2024

Daniel Scalena, Gabriele Sarti, Malvina Nissim

Figure 1 for Multi-property Steering of Large Language Models with Dynamic Activation Composition

Figure 2 for Multi-property Steering of Large Language Models with Dynamic Activation Composition

Figure 3 for Multi-property Steering of Large Language Models with Dynamic Activation Composition

Figure 4 for Multi-property Steering of Large Language Models with Dynamic Activation Composition

Abstract:Activation steering methods were shown to be effective in conditioning language model generation by additively intervening over models' intermediate representations. However, the evaluation of these techniques has so far been limited to single conditioning properties and synthetic settings. In this work, we conduct a comprehensive evaluation of various activation steering strategies, highlighting the property-dependent nature of optimal parameters to ensure a robust effect throughout generation. To address this issue, we propose Dynamic Activation Composition, an information-theoretic approach to modulate the steering intensity of one or more properties throughout generation. Our experiments on multi-property steering show that our method successfully maintains high conditioning while minimizing the impact of conditioning on generation fluency.

Via

Access Paper or Ask Questions

Let the Models Respond: Interpreting Language Model Detoxification Through the Lens of Prompt Dependence

Sep 01, 2023

Daniel Scalena, Gabriele Sarti, Malvina Nissim, Elisabetta Fersini

Figure 1 for Let the Models Respond: Interpreting Language Model Detoxification Through the Lens of Prompt Dependence

Figure 2 for Let the Models Respond: Interpreting Language Model Detoxification Through the Lens of Prompt Dependence

Figure 3 for Let the Models Respond: Interpreting Language Model Detoxification Through the Lens of Prompt Dependence

Abstract:Due to language models' propensity to generate toxic or hateful responses, several techniques were developed to align model generations with users' preferences. Despite the effectiveness of such methods in improving the safety of model interactions, their impact on models' internal processes is still poorly understood. In this work, we apply popular detoxification approaches to several language models and quantify their impact on the resulting models' prompt dependence using feature attribution methods. We evaluate the effectiveness of counter-narrative fine-tuning and compare it with reinforcement learning-driven detoxification, observing differences in prompt reliance between the two methods despite their similar detoxification performances.

* 4 pages

Via

Access Paper or Ask Questions