Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Mar 12, 2024

Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah D. Goodman, Christopher D. Manning, Christopher Potts

Figure 1 for pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Figure 2 for pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Figure 3 for pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Share this with someone who'll enjoy it:

Abstract:Interventions on model-internal states are fundamental operations in many areas of AI, including model editing, steering, robustness, and interpretability. To facilitate such research, we introduce $\textbf{pyvene}$, an open-source Python library that supports customizable interventions on a range of different PyTorch modules. $\textbf{pyvene}$ supports complex intervention schemes with an intuitive configuration format, and its interventions can be static or include trainable parameters. We show how $\textbf{pyvene}$ provides a unified and extensible framework for performing interventions on neural models and sharing the intervened upon models with others. We illustrate the power of the library via interpretability analyses using causal abstraction and knowledge localization. We publish our library through Python Package Index (PyPI) and provide code, documentation, and tutorials at https://github.com/stanfordnlp/pyvene.

* 8 pages, 3 figures

View paper on

Share this with someone who'll enjoy it:

Title:pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Paper and Code