Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Su-In Lee

Ensembling Sparse Autoencoders

May 21, 2025

Soham Gadgil, Chris Lin, Su-In Lee

Abstract:Sparse autoencoders (SAEs) are used to decompose neural network activations into human-interpretable features. Typically, features learned by a single SAE are used for downstream applications. However, it has recently been shown that SAEs trained with different initial weights can learn different features, demonstrating that a single SAE captures only a limited subset of features that can be extracted from the activation space. Motivated by this limitation, we propose to ensemble multiple SAEs through naive bagging and boosting. Specifically, SAEs trained with different weight initializations are ensembled in naive bagging, whereas SAEs sequentially trained to minimize the residual error are ensembled in boosting. We evaluate our ensemble approaches with three settings of language models and SAE architectures. Our empirical results demonstrate that ensembling SAEs can improve the reconstruction of language model activations, diversity of features, and SAE stability. Furthermore, ensembling SAEs performs better than applying a single SAE on downstream tasks such as concept detection and spurious correlation removal, showing improved practical utility.

* Preprint

Via

Access Paper or Ask Questions

Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution

Jan 29, 2024

Ian Covert, Chanwoo Kim, Su-In Lee, James Zou, Tatsunori Hashimoto

Abstract:Many tasks in explainable machine learning, such as data valuation and feature attribution, perform expensive computation for each data point and can be intractable for large datasets. These methods require efficient approximations, and learning a network that directly predicts the desired output, which is commonly known as amortization, is a promising solution. However, training such models with exact labels is often intractable; we therefore explore training with noisy labels and find that this is inexpensive and surprisingly effective. Through theoretical analysis of the label noise and experiments with various models and datasets, we show that this approach significantly accelerates several feature attribution and data valuation methods, often yielding an order of magnitude speedup over existing approaches.

Via

Access Paper or Ask Questions

Feature Selection in the Contrastive Analysis Setting

Oct 27, 2023

Ethan Weinberger, Ian Covert, Su-In Lee

Abstract:Contrastive analysis (CA) refers to the exploration of variations uniquely enriched in a target dataset as compared to a corresponding background dataset generated from sources of variation that are irrelevant to a given task. For example, a biomedical data analyst may wish to find a small set of genes to use as a proxy for variations in genomic data only present among patients with a given disease (target) as opposed to healthy control subjects (background). However, as of yet the problem of feature selection in the CA setting has received little attention from the machine learning community. In this work we present contrastive feature selection (CFS), a method for performing feature selection in the CA setting. We motivate our approach with a novel information-theoretic analysis of representation learning in the CA setting, and we empirically validate CFS on a semi-synthetic dataset and four real-world biomedical datasets. We find that our method consistently outperforms previously proposed state-of-the-art supervised and fully unsupervised feature selection methods not designed for the CA setting. An open-source implementation of our method is available at https://github.com/suinleelab/CFS.

* NeurIPS 2023

Via

Access Paper or Ask Questions

On the Robustness of Removal-Based Feature Attributions

Jun 12, 2023

Chris Lin, Ian Covert, Su-In Lee

Figure 1 for On the Robustness of Removal-Based Feature Attributions

Figure 2 for On the Robustness of Removal-Based Feature Attributions

Figure 3 for On the Robustness of Removal-Based Feature Attributions

Figure 4 for On the Robustness of Removal-Based Feature Attributions

Abstract:To explain complex models based on their inputs, many feature attribution methods have been developed that assign importance scores to input features. However, some recent work challenges the robustness of feature attributions by showing that these methods are sensitive to input and model perturbations, while other work addresses this robustness issue by proposing robust attribution methods and model modifications. Nevertheless, previous work on attribution robustness has focused primarily on gradient-based feature attributions. In contrast, the robustness properties of removal-based attribution methods are not comprehensively well understood. To bridge this gap, we theoretically characterize the robustness of removal-based feature attributions. Specifically, we provide a unified analysis of such methods and prove upper bounds for the difference between intact and perturbed attributions, under settings of both input and model perturbations. Our empirical experiments on synthetic and real-world data validate our theoretical results and demonstrate their practical implications.

Via

Access Paper or Ask Questions

Estimating Conditional Mutual Information for Dynamic Feature Selection

Jun 05, 2023

Soham Gadgil, Ian Covert, Su-In Lee

Abstract:Dynamic feature selection, where we sequentially query features to make accurate predictions with a minimal budget, is a promising paradigm to reduce feature acquisition costs and provide transparency into the prediction process. The problem is challenging, however, as it requires both making predictions with arbitrary feature sets and learning a policy to identify the most valuable selections. Here, we take an information-theoretic perspective and prioritize features based on their mutual information with the response variable. The main challenge is learning this selection policy, and we design a straightforward new modeling approach that estimates the mutual information in a discriminative rather than generative fashion. Building on our learning approach, we introduce several further improvements: allowing variable feature budgets across samples, enabling non-uniform costs between features, incorporating prior information, and exploring modern architectures to handle partial input information. We find that our method provides consistent gains over recent state-of-the-art methods across a variety of datasets.

Via

Access Paper or Ask Questions

Learning to Maximize Mutual Information for Dynamic Feature Selection

Jan 02, 2023

Ian Covert, Wei Qiu, Mingyu Lu, Nayoon Kim, Nathan White, Su-In Lee

Figure 1 for Learning to Maximize Mutual Information for Dynamic Feature Selection

Figure 2 for Learning to Maximize Mutual Information for Dynamic Feature Selection

Figure 3 for Learning to Maximize Mutual Information for Dynamic Feature Selection

Figure 4 for Learning to Maximize Mutual Information for Dynamic Feature Selection

Abstract:Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning (RL), but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality and outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.

Via

Access Paper or Ask Questions

Contrastive Corpus Attribution for Explaining Representations

Sep 30, 2022

Chris Lin, Hugh Chen, Chanwoo Kim, Su-In Lee

Figure 1 for Contrastive Corpus Attribution for Explaining Representations

Figure 2 for Contrastive Corpus Attribution for Explaining Representations

Figure 3 for Contrastive Corpus Attribution for Explaining Representations

Figure 4 for Contrastive Corpus Attribution for Explaining Representations

Abstract:Despite the widespread use of unsupervised models, very few methods are designed to explain them. Most explanation methods explain a scalar model output. However, unsupervised models output representation vectors, the elements of which are not good candidates to explain because they lack semantic meaning. To bridge this gap, recent works defined a scalar explanation output: a dot product-based similarity in the representation space to the sample being explained (i.e., an explicand). Although this enabled explanations of unsupervised models, the interpretation of this approach can still be opaque because similarity to the explicand's representation may not be meaningful to humans. To address this, we propose contrastive corpus similarity, a novel and semantically meaningful scalar explanation output based on a reference corpus and a contrasting foil set of samples. We demonstrate that contrastive corpus similarity is compatible with many post-hoc feature attribution methods to generate COntrastive COrpus Attributions (COCOA) and quantitatively verify that features important to the corpus are identified. We showcase the utility of COCOA in two ways: (i) we draw insights by explaining augmentations of the same image in a contrastive learning setting (SimCLR); and (ii) we perform zero-shot object localization by explaining the similarity of image representations to jointly learned text representations (CLIP).

Via

Access Paper or Ask Questions

Algorithms to estimate Shapley value feature attributions

Jul 15, 2022

Hugh Chen, Ian C. Covert, Scott M. Lundberg, Su-In Lee

Figure 1 for Algorithms to estimate Shapley value feature attributions

Figure 2 for Algorithms to estimate Shapley value feature attributions

Figure 3 for Algorithms to estimate Shapley value feature attributions

Figure 4 for Algorithms to estimate Shapley value feature attributions

Abstract:Feature attributions based on the Shapley value are popular for explaining machine learning models; however, their estimation is complex from both a theoretical and computational standpoint. We disentangle this complexity into two factors: (1)~the approach to removing feature information, and (2)~the tractable estimation strategy. These two factors provide a natural lens through which we can better understand and compare 24 distinct algorithms. Based on the various feature removal approaches, we describe the multiple types of Shapley value feature attributions and methods to calculate each one. Then, based on the tractable estimation strategies, we characterize two distinct families of approaches: model-agnostic and model-specific approximations. For the model-agnostic approximations, we benchmark a wide class of estimation approaches and tie them to alternative yet equivalent characterizations of the Shapley value. For the model-specific approximations, we clarify the assumptions crucial to each method's tractability for linear, tree, and deep models. Finally, we identify gaps in the literature and promising future research directions.

Via

Access Paper or Ask Questions

Learning to Estimate Shapley Values with Vision Transformers

Jun 10, 2022

Ian Covert, Chanwoo Kim, Su-In Lee

Figure 1 for Learning to Estimate Shapley Values with Vision Transformers

Figure 2 for Learning to Estimate Shapley Values with Vision Transformers

Figure 3 for Learning to Estimate Shapley Values with Vision Transformers

Figure 4 for Learning to Estimate Shapley Values with Vision Transformers

Abstract:Transformers have become a default architecture in computer vision, but understanding what drives their predictions remains a challenging problem. Current explanation approaches rely on attention values or input gradients, but these give a limited understanding of a model's dependencies. Shapley values offer a theoretically sound alternative, but their computational cost makes them impractical for large, high-dimensional models. In this work, we aim to make Shapley values practical for vision transformers (ViTs). To do so, we first leverage an attention masking approach to evaluate ViTs with partial information, and we then develop a procedure for generating Shapley value explanations via a separate, learned explainer model. Our experiments compare Shapley values to many baseline methods (e.g., attention rollout, GradCAM, LRP), and we find that our approach provides more accurate explanations than any existing method for ViTs.

Via

Access Paper or Ask Questions

A Deep Bayesian Bandits Approach for Anticancer Therapy: Exploration via Functional Prior

May 05, 2022

Mingyu Lu, Yifang Chen, Su-In Lee

Figure 1 for A Deep Bayesian Bandits Approach for Anticancer Therapy: Exploration via Functional Prior

Figure 2 for A Deep Bayesian Bandits Approach for Anticancer Therapy: Exploration via Functional Prior

Figure 3 for A Deep Bayesian Bandits Approach for Anticancer Therapy: Exploration via Functional Prior

Figure 4 for A Deep Bayesian Bandits Approach for Anticancer Therapy: Exploration via Functional Prior

Abstract:Learning personalized cancer treatment with machine learning holds great promise to improve cancer patients' chance of survival. Despite recent advances in machine learning and precision oncology, this approach remains challenging as collecting data in preclinical/clinical studies for modeling multiple treatment efficacies is often an expensive, time-consuming process. Moreover, the randomization in treatment allocation proves to be suboptimal since some participants/samples are not receiving the most appropriate treatments during the trial. To address this challenge, we formulate drug screening study as a "contextual bandit" problem, in which an algorithm selects anticancer therapeutics based on contextual information about cancer cell lines while adapting its treatment strategy to maximize treatment response in an "online" fashion. We propose using a novel deep Bayesian bandits framework that uses functional prior to approximate posterior for drug response prediction based on multi-modal information consisting of genomic features and drug structure. We empirically evaluate our method on three large-scale in vitro pharmacogenomic datasets and show that our approach outperforms several benchmarks in identifying optimal treatment for a given cell line.

Via

Access Paper or Ask Questions