Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael Oesterle

Evolution of SAE Features Across Layers in LLMs

Oct 11, 2024

Daniel Balcells, Benjamin Lerner, Michael Oesterle, Ediz Ucar, Stefan Heimersheim

Abstract:Sparse Autoencoders for transformer-based language models are typically defined independently per layer. In this work we analyze statistical relationships between features in adjacent layers to understand how features evolve through a forward pass. We provide a graph visualization interface for features and their most similar next-layer neighbors, and build communities of related features across layers. We find that a considerable amount of features are passed through from a previous layer, some features can be expressed as quasi-boolean combinations of previous features, and some features become more specialized in later layers.

Via

Access Paper or Ask Questions

GOV-REK: Governed Reward Engineering Kernels for Designing Robust Multi-Agent Reinforcement Learning Systems

Apr 14, 2024

Ashish Rana, Michael Oesterle, Jannik Brinkmann

Abstract:For multi-agent reinforcement learning systems (MARLS), the problem formulation generally involves investing massive reward engineering effort specific to a given problem. However, this effort often cannot be translated to other problems; worse, it gets wasted when system dynamics change drastically. This problem is further exacerbated in sparse reward scenarios, where a meaningful heuristic can assist in the policy convergence task. We propose GOVerned Reward Engineering Kernels (GOV-REK), which dynamically assign reward distributions to agents in MARLS during its learning stage. We also introduce governance kernels, which exploit the underlying structure in either state or joint action space for assigning meaningful agent reward distributions. During the agent learning stage, it iteratively explores different reward distribution configurations with a Hyperband-like algorithm to learn ideal agent reward models in a problem-agnostic manner. Our experiments demonstrate that our meaningful reward priors robustly jumpstart the learning process for effectively learning different MARL problems.

* Extended Abstract accepted in the 23rd International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2024)

Via

Access Paper or Ask Questions

Beyond Single-Feature Importance with ICECREAM

Jul 19, 2023

Michael Oesterle, Patrick Blöbaum, Atalanti A. Mastakouri, Elke Kirschbaum

Figure 1 for Beyond Single-Feature Importance with ICECREAM

Figure 2 for Beyond Single-Feature Importance with ICECREAM

Figure 3 for Beyond Single-Feature Importance with ICECREAM

Figure 4 for Beyond Single-Feature Importance with ICECREAM

Abstract:Which set of features was responsible for a certain output of a machine learning model? Which components caused the failure of a cloud computing application? These are just two examples of questions we are addressing in this work by Identifying Coalition-based Explanations for Common and Rare Events in Any Model (ICECREAM). Specifically, we propose an information-theoretic quantitative measure for the influence of a coalition of variables on the distribution of a target variable. This allows us to identify which set of factors is essential to obtain a certain outcome, as opposed to well-established explainability and causal contribution analysis methods which can assign contributions only to individual factors and rank them by their importance. In experiments with synthetic and real-world data, we show that ICECREAM outperforms state-of-the-art methods for explainability and root cause analysis, and achieves impressive accuracy in both tasks.

Via

Access Paper or Ask Questions