Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Leonard Bereska

Mechanistic Interpretability for AI Safety -- A Review

Apr 22, 2024

Leonard Bereska, Efstratios Gavves

Figure 1 for Mechanistic Interpretability for AI Safety -- A Review

Figure 2 for Mechanistic Interpretability for AI Safety -- A Review

Figure 3 for Mechanistic Interpretability for AI Safety -- A Review

Figure 4 for Mechanistic Interpretability for AI Safety -- A Review

Abstract:Understanding AI systems' inner workings is critical for ensuring value alignment and safety. This review explores mechanistic interpretability: reverse-engineering the computational mechanisms and representations learned by neural networks into human-understandable algorithms and concepts to provide a granular, causal understanding. We establish foundational concepts such as features encoding knowledge within neural activations and hypotheses about their representation and computation. We survey methodologies for causally dissecting model behaviors and assess the relevance of mechanistic interpretability to AI safety. We investigate challenges surrounding scalability, automation, and comprehensive interpretation. We advocate for clarifying concepts, setting standards, and scaling techniques to handle complex models and behaviors and expand to domains such as vision and reinforcement learning. Mechanistic interpretability could help prevent catastrophic outcomes as AI systems become more powerful and inscrutable.

Via

Access Paper or Ask Questions

Tractable Dendritic RNNs for Reconstructing Nonlinear Dynamical Systems

Jul 06, 2022

Manuel Brenner, Florian Hess, Jonas M. Mikhaeil, Leonard Bereska, Zahra Monfared, Po-Chen Kuo, Daniel Durstewitz

Figure 1 for Tractable Dendritic RNNs for Reconstructing Nonlinear Dynamical Systems

Figure 2 for Tractable Dendritic RNNs for Reconstructing Nonlinear Dynamical Systems

Figure 3 for Tractable Dendritic RNNs for Reconstructing Nonlinear Dynamical Systems

Figure 4 for Tractable Dendritic RNNs for Reconstructing Nonlinear Dynamical Systems

Abstract:In many scientific disciplines, we are interested in inferring the nonlinear dynamical system underlying a set of observed time series, a challenging task in the face of chaotic behavior and noise. Previous deep learning approaches toward this goal often suffered from a lack of interpretability and tractability. In particular, the high-dimensional latent spaces often required for a faithful embedding, even when the underlying dynamics lives on a lower-dimensional manifold, can hamper theoretical analysis. Motivated by the emerging principles of dendritic computation, we augment a dynamically interpretable and mathematically tractable piecewise-linear (PL) recurrent neural network (RNN) by a linear spline basis expansion. We show that this approach retains all the theoretically appealing properties of the simple PLRNN, yet boosts its capacity for approximating arbitrary nonlinear dynamical systems in comparatively low dimensions. We employ two frameworks for training the system, one combining back-propagation-through-time (BPTT) with teacher forcing, and another based on fast and scalable variational inference. We show that the dendritically expanded PLRNN achieves better reconstructions with fewer parameters and dimensions on various dynamical systems benchmarks and compares favorably to other methods, while retaining a tractable and interpretable structure.

* To be published in the Proceedings of the 39th International Conference on Machine Learning (ICML 2022)

Via

Access Paper or Ask Questions

Continual Learning of Dynamical Systems with Competitive Federated Reservoir Computing

Jun 27, 2022

Leonard Bereska, Efstratios Gavves

Figure 1 for Continual Learning of Dynamical Systems with Competitive Federated Reservoir Computing

Figure 2 for Continual Learning of Dynamical Systems with Competitive Federated Reservoir Computing

Figure 3 for Continual Learning of Dynamical Systems with Competitive Federated Reservoir Computing

Figure 4 for Continual Learning of Dynamical Systems with Competitive Federated Reservoir Computing

Abstract:Machine learning recently proved efficient in learning differential equations and dynamical systems from data. However, the data is commonly assumed to originate from a single never-changing system. In contrast, when modeling real-world dynamical processes, the data distribution often shifts due to changes in the underlying system dynamics. Continual learning of these processes aims to rapidly adapt to abrupt system changes without forgetting previous dynamical regimes. This work proposes an approach to continual learning based on reservoir computing, a state-of-the-art method for training recurrent neural networks on complex spatiotemporal dynamical systems. Reservoir computing fixes the recurrent network weights - hence these cannot be forgotten - and only updates linear projection heads to the output. We propose to train multiple competitive prediction heads concurrently. Inspired by neuroscience's predictive coding, only the most predictive heads activate, laterally inhibiting and thus protecting the inactive heads from forgetting induced by interfering parameter updates. We show that this multi-head reservoir minimizes interference and catastrophic forgetting on several dynamical systems, including the Van-der-Pol oscillator, the chaotic Lorenz attractor, and the high-dimensional Lorenz-96 weather model. Our results suggest that reservoir computing is a promising candidate framework for the continual learning of dynamical systems. We provide our code for data generation, method, and comparisons at \url{https://github.com/leonardbereska/multiheadreservoir}.

* CoLLAs 2022

Via

Access Paper or Ask Questions

Unsupervised Part-Based Disentangling of Object Shape and Appearance

Mar 16, 2019

Dominik Lorenz, Leonard Bereska, Timo Milbich, Björn Ommer

Figure 1 for Unsupervised Part-Based Disentangling of Object Shape and Appearance

Figure 2 for Unsupervised Part-Based Disentangling of Object Shape and Appearance

Figure 3 for Unsupervised Part-Based Disentangling of Object Shape and Appearance

Figure 4 for Unsupervised Part-Based Disentangling of Object Shape and Appearance

Abstract:Large intra-class variation is the result of changes in multiple object characteristics. Images, however, only show the superposition of different variable factors such as appearance or shape. Therefore, learning to disentangle and represent these different characteristics poses a great challenge, especially in the unsupervised case. Moreover, large object articulation calls for a flexible part-based model. We present an unsupervised approach for disentangling appearance and shape by learning parts consistently over all instances of a category. Our model for learning an object representation is trained by simultaneously exploiting invariance and equivariance constraints between synthetically transformed images. Since no part annotation or prior information on an object class is required, the approach is applicable to arbitrary classes. We evaluate our approach on a wide range of object categories and diverse tasks including pose prediction, disentangled image synthesis, and video-to-video translation. The approach outperforms the state-of-the-art on unsupervised keypoint prediction and compares favorably even against supervised approaches on the task of shape and appearance transfer.

* CVPR 2019 Oral

Via

Access Paper or Ask Questions