Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Levent Sagun

A Differentiable Rank-Based Objective For Better Feature Learning

Feb 13, 2025

Krunoslav Lehman Pavasovic, David Lopez-Paz, Giulio Biroli, Levent Sagun

Abstract:In this paper, we leverage existing statistical methods to better understand feature learning from data. We tackle this by modifying the model-free variable selection method, Feature Ordering by Conditional Independence (FOCI), which is introduced in \cite{azadkia2021simple}. While FOCI is based on a non-parametric coefficient of conditional dependence, we introduce its parametric, differentiable approximation. With this approximate coefficient of correlation, we present a new algorithm called difFOCI, which is applicable to a wider range of machine learning problems thanks to its differentiable nature and learnable parameters. We present difFOCI in three contexts: (1) as a variable selection method with baseline comparisons to FOCI, (2) as a trainable model parametrized with a neural network, and (3) as a generic, widely applicable neural network regularizer, one that improves feature learning with better management of spurious correlations. We evaluate difFOCI on increasingly complex problems ranging from basic variable selection in toy examples to saliency map comparisons in convolutional networks. We then show how difFOCI can be incorporated in the context of fairness to facilitate classifications without relying on sensitive data.

Via

Access Paper or Ask Questions

Chained Tuning Leads to Biased Forgetting

Dec 21, 2024

Megan Ung, Alicia Sun, Samuel J. Bell, Bhaktipriya Radharapu, Levent Sagun, Adina Williams

Figure 1 for Chained Tuning Leads to Biased Forgetting

Figure 2 for Chained Tuning Leads to Biased Forgetting

Figure 3 for Chained Tuning Leads to Biased Forgetting

Figure 4 for Chained Tuning Leads to Biased Forgetting

Abstract:Large language models (LLMs) are often fine-tuned for use on downstream tasks, though this can degrade capabilities learned during previous training. This phenomenon, often referred to as catastrophic forgetting, has important potential implications for the safety of deployed models. In this work, we first show that models trained on downstream tasks forget their safety tuning to a greater extent than models trained in the opposite order.Second, we show that forgetting disproportionately impacts safety information about certain groups. To quantify this phenomenon, we define a new metric we term biased forgetting. We conduct a systematic evaluation of the effects of task ordering on forgetting and apply mitigations that can help the model recover from the forgetting observed. We hope our findings can better inform methods for chaining the finetuning of LLMs in continual learning settings to enable training of safer and less toxic models.

Via

Access Paper or Ask Questions

On the Role of Speech Data in Reducing Toxicity Detection Bias

Nov 12, 2024

Samuel J. Bell, Mariano Coria Meglioli, Megan Richards, Eduardo Sánchez, Christophe Ropers, Skyler Wang, Adina Williams, Levent Sagun, Marta R. Costa-jussà

Figure 1 for On the Role of Speech Data in Reducing Toxicity Detection Bias

Figure 2 for On the Role of Speech Data in Reducing Toxicity Detection Bias

Figure 3 for On the Role of Speech Data in Reducing Toxicity Detection Bias

Figure 4 for On the Role of Speech Data in Reducing Toxicity Detection Bias

Abstract:Text toxicity detection systems exhibit significant biases, producing disproportionate rates of false positives on samples mentioning demographic groups. But what about toxicity detection in speech? To investigate the extent to which text-based biases are mitigated by speech-based systems, we produce a set of high-quality group annotations for the multilingual MuTox dataset, and then leverage these annotations to systematically compare speech- and text-based toxicity classifiers. Our findings indicate that access to speech data during inference supports reduced bias against group mentions, particularly for ambiguous and disagreement-inducing samples. Our results also suggest that improving classifiers, rather than transcription pipelines, is more helpful for reducing group bias. We publicly release our annotations and provide recommendations for future toxicity dataset construction.

Via

Access Paper or Ask Questions

The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models

Nov 06, 2024

Anaelia Ovalle, Krunoslav Lehman Pavasovic, Louis Martin, Luke Zettlemoyer, Eric Michael Smith, Adina Williams, Levent Sagun

Figure 1 for The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models

Figure 2 for The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models

Figure 3 for The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models

Figure 4 for The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models

Abstract:Natural-language assistants are designed to provide users with helpful responses while avoiding harmful outputs, largely achieved through alignment to human preferences. Yet there is limited understanding of whether alignment techniques may inadvertently perpetuate or even amplify harmful biases inherited from their pre-aligned base models. This issue is compounded by the choice of bias evaluation benchmarks in popular preference-finetuned models, which predominantly focus on dominant social categories, such as binary gender, thereby limiting insights into biases affecting underrepresented groups. Towards addressing this gap, we center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias in LLMs. Our key contributions include: 1) a comprehensive survey of bias evaluation modalities across leading preference-finetuned LLMs, highlighting critical gaps in gender-diverse representation, 2) systematic evaluation of gender-diverse biases across 12 models spanning Direct Preference Optimization (DPO) stages, uncovering harms popular bias benchmarks fail to detect, and 3) a flexible framework for measuring harmful biases in implicit reward signals applicable to other social contexts. Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning (SFT), and can amplify two forms of real-world gender-diverse harms from their base models: stigmatization and gender non-affirmative language. We conclude with recommendations tailored to DPO and broader alignment practices, advocating for the adoption of community-informed bias evaluation frameworks to more effectively identify and address underrepresented harms in LLMs.

* Accepted to 2024 Neurips Queer in AI Workshop

Via

Access Paper or Ask Questions

Reassessing the Validity of Spurious Correlations Benchmarks

Sep 06, 2024

Samuel J. Bell, Diane Bouchacourt, Levent Sagun

Figure 1 for Reassessing the Validity of Spurious Correlations Benchmarks

Figure 2 for Reassessing the Validity of Spurious Correlations Benchmarks

Figure 3 for Reassessing the Validity of Spurious Correlations Benchmarks

Figure 4 for Reassessing the Validity of Spurious Correlations Benchmarks

Abstract:Neural networks can fail when the data contains spurious correlations. To understand this phenomenon, researchers have proposed numerous spurious correlations benchmarks upon which to evaluate mitigation methods. However, we observe that these benchmarks exhibit substantial disagreement, with the best methods on one benchmark performing poorly on another. We explore this disagreement, and examine benchmark validity by defining three desiderata that a benchmark should satisfy in order to meaningfully evaluate methods. Our results have implications for both benchmarks and mitigations: we find that certain benchmarks are not meaningful measures of method performance, and that several methods are not sufficiently robust for widespread use. We present a simple recipe for practitioners to choose methods using the most similar benchmark to their given problem.

Via

Access Paper or Ask Questions

Networked Inequality: Preferential Attachment Bias in Graph Neural Network Link Prediction

Sep 29, 2023

Arjun Subramonian, Levent Sagun, Yizhou Sun

Figure 1 for Networked Inequality: Preferential Attachment Bias in Graph Neural Network Link Prediction

Figure 2 for Networked Inequality: Preferential Attachment Bias in Graph Neural Network Link Prediction

Figure 3 for Networked Inequality: Preferential Attachment Bias in Graph Neural Network Link Prediction

Figure 4 for Networked Inequality: Preferential Attachment Bias in Graph Neural Network Link Prediction

Abstract:Graph neural network (GNN) link prediction is increasingly deployed in citation, collaboration, and online social networks to recommend academic literature, collaborators, and friends. While prior research has investigated the dyadic fairness of GNN link prediction, the within-group fairness and ``rich get richer'' dynamics of link prediction remain underexplored. However, these aspects have significant consequences for degree and power imbalances in networks. In this paper, we shed light on how degree bias in networks affects Graph Convolutional Network (GCN) link prediction. In particular, we theoretically uncover that GCNs with a symmetric normalized graph filter have a within-group preferential attachment bias. We validate our theoretical analysis on real-world citation, collaboration, and online social networks. We further bridge GCN's preferential attachment bias with unfairness in link prediction and propose a new within-group fairness metric. This metric quantifies disparities in link prediction scores between social groups, towards combating the amplification of degree and power disparities. Finally, we propose a simple training-time strategy to alleviate within-group unfairness, and we show that it is effective on citation, online social, and credit networks.

Via

Access Paper or Ask Questions

Weisfeiler and Lehman Go Measurement Modeling: Probing the Validity of the WL Test

Jul 11, 2023

Arjun Subramonian, Adina Williams, Maximilian Nickel, Yizhou Sun, Levent Sagun

Abstract:The expressive power of graph neural networks is usually measured by comparing how many pairs of graphs or nodes an architecture can possibly distinguish as non-isomorphic to those distinguishable by the $k$-dimensional Weisfeiler-Lehman ($k$-WL) test. In this paper, we uncover misalignments between practitioners' conceptualizations of expressive power and $k$-WL through a systematic analysis of the reliability and validity of $k$-WL. We further conduct a survey ($n = 18$) of practitioners to surface their conceptualizations of expressive power and their assumptions about $k$-WL. In contrast to practitioners' opinions, our analysis (which draws from graph theory and benchmark auditing) reveals that $k$-WL does not guarantee isometry, can be irrelevant to real-world graph tasks, and may not promote generalization or trustworthiness. We argue for extensional definitions and measurement of expressive power based on benchmarks; we further contribute guiding questions for constructing such benchmarks, which is critical for progress in graph machine learning.

Via

Access Paper or Ask Questions

Simplicity Bias Leads to Amplified Performance Disparities

Dec 13, 2022

Samuel J. Bell, Levent Sagun

Figure 1 for Simplicity Bias Leads to Amplified Performance Disparities

Figure 2 for Simplicity Bias Leads to Amplified Performance Disparities

Figure 3 for Simplicity Bias Leads to Amplified Performance Disparities

Figure 4 for Simplicity Bias Leads to Amplified Performance Disparities

Abstract:The simple idea that not all things are equally difficult has surprising implications when applied in a fairness context. In this work we explore how "difficulty" is model-specific, such that different models find different parts of a dataset challenging. When difficulty correlates with group information, we term this difficulty disparity. Drawing a connection with recent work exploring the inductive bias towards simplicity of SGD-trained models, we show that when such a disparity exists, it is further amplified by commonly-used models. We quantify this amplification factor across a range of settings aiming towards a fuller understanding of the role of model bias. We also present a challenge to the simplifying assumption that "fixing" a dataset is sufficient to ensure unbiased performance.

Via

Access Paper or Ask Questions

Measuring and signing fairness as performance under multiple stakeholder distributions

Jul 20, 2022

David Lopez-Paz, Diane Bouchacourt, Levent Sagun, Nicolas Usunier

Abstract:As learning machines increase their influence on decisions concerning human lives, analyzing their fairness properties becomes a subject of central importance. Yet, our best tools for measuring the fairness of learning systems are rigid fairness metrics encapsulated as mathematical one-liners, offer limited power to the stakeholders involved in the prediction task, and are easy to manipulate when we exhort excessive pressure to optimize them. To advance these issues, we propose to shift focus from shaping fairness metrics to curating the distributions of examples under which these are computed. In particular, we posit that every claim about fairness should be immediately followed by the tagline "Fair under what examples, and collected by whom?". By highlighting connections to the literature in domain generalization, we propose to measure fairness as the ability of the system to generalize under multiple stress tests -- distributions of examples with social relevance. We encourage each stakeholder to curate one or multiple stress tests containing examples reflecting their (possibly conflicting) interests. The machine passes or fails each stress test by falling short of or exceeding a pre-defined metric value. The test results involve all stakeholders in a discussion about how to improve the learning system, and provide flexible assessments of fairness dependent on context and based on interpretable data. We provide full implementation guidelines for stress testing, illustrate both the benefits and shortcomings of this framework, and introduce a cryptographic scheme to enable a degree of prediction accountability from system providers.

Via

Access Paper or Ask Questions

Understanding out-of-distribution accuracies through quantifying difficulty of test samples

Mar 28, 2022

Berfin Simsek, Melissa Hall, Levent Sagun

Figure 1 for Understanding out-of-distribution accuracies through quantifying difficulty of test samples

Figure 2 for Understanding out-of-distribution accuracies through quantifying difficulty of test samples

Figure 3 for Understanding out-of-distribution accuracies through quantifying difficulty of test samples

Figure 4 for Understanding out-of-distribution accuracies through quantifying difficulty of test samples

Abstract:Existing works show that although modern neural networks achieve remarkable generalization performance on the in-distribution (ID) dataset, the accuracy drops significantly on the out-of-distribution (OOD) datasets \cite{recht2018cifar, recht2019imagenet}. To understand why a variety of models consistently make more mistakes in the OOD datasets, we propose a new metric to quantify the difficulty of the test images (either ID or OOD) that depends on the interaction of the training dataset and the model. In particular, we introduce \textit{confusion score} as a label-free measure of image difficulty which quantifies the amount of disagreement on a given test image based on the class conditional probabilities estimated by an ensemble of trained models. Using the confusion score, we investigate CIFAR-10 and its OOD derivatives. Next, by partitioning test and OOD datasets via their confusion scores, we predict the relationship between ID and OOD accuracies for various architectures. This allows us to obtain an estimator of the OOD accuracy of a given model only using ID test labels. Our observations indicate that the biggest contribution to the accuracy drop comes from images with high confusion scores. Upon further inspection, we report on the nature of the misclassified images grouped by their confusion scores: \textit{(i)} images with high confusion scores contain \textit{weak spurious correlations} that appear in multiple classes in the training data and lack clear \textit{class-specific features}, and \textit{(ii)} images with low confusion scores exhibit spurious correlations that belong to another class, namely \textit{class-specific spurious correlations}.

* 18 pages, 15 figures

Via

Access Paper or Ask Questions