Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Avi Mendelson

Silenced Biases: The Dark Side LLMs Learned to Refuse

Nov 05, 2025

Rom Himelstein, Amit LeVi, Brit Youngmann, Yaniv Nemcovsky, Avi Mendelson

Abstract:Safety-aligned large language models (LLMs) are becoming increasingly widespread, especially in sensitive applications where fairness is essential and biased outputs can cause significant harm. However, evaluating the fairness of models is a complex challenge, and approaches that do so typically utilize standard question-answer (QA) styled schemes. Such methods often overlook deeper issues by interpreting the model's refusal responses as positive fairness measurements, which creates a false sense of fairness. In this work, we introduce the concept of silenced biases, which are unfair preferences encoded within models' latent space and are effectively concealed by safety-alignment. Previous approaches that considered similar indirect biases often relied on prompt manipulation or handcrafted implicit queries, which present limited scalability and risk contaminating the evaluation process with additional biases. We propose the Silenced Bias Benchmark (SBB), which aims to uncover these biases by employing activation steering to reduce model refusals during QA. SBB supports easy expansion to new demographic groups and subjects, presenting a fairness evaluation framework that encourages the future development of fair models and tools beyond the masking effects of alignment training. We demonstrate our approach over multiple LLMs, where our findings expose an alarming distinction between models' direct responses and their underlying fairness issues.

Via

Access Paper or Ask Questions

Representing LLMs in Prompt Semantic Task Space

Sep 26, 2025

Idan Kashani, Avi Mendelson, Yaniv Nemcovsky

Abstract:Large language models (LLMs) achieve impressive results over various tasks, and ever-expanding public repositories contain an abundance of pre-trained models. Therefore, identifying the best-performing LLM for a given task is a significant challenge. Previous works have suggested learning LLM representations to address this. However, these approaches present limited scalability and require costly retraining to encompass additional models and datasets. Moreover, the produced representation utilizes distinct spaces that cannot be easily interpreted. This work presents an efficient, training-free approach to representing LLMs as linear operators within the prompts' semantic task space, thus providing a highly interpretable representation of the models' application. Our method utilizes closed-form computation of geometrical properties and ensures exceptional scalability and real-time adaptability to dynamically expanding repositories. We demonstrate our approach on success prediction and model selection tasks, achieving competitive or state-of-the-art results with notable performance in out-of-sample scenarios.

* Accepted to Findings of the Association for Computational Linguistics: EMNLP 2025

Via

Access Paper or Ask Questions

Sparse patches adversarial attacks via extrapolating point-wise information

Nov 25, 2024

Yaniv Nemcovsky, Avi Mendelson, Chaim Baskin

Abstract:Sparse and patch adversarial attacks were previously shown to be applicable in realistic settings and are considered a security risk to autonomous systems. Sparse adversarial perturbations constitute a setting in which the adversarial perturbations are limited to affecting a relatively small number of points in the input. Patch adversarial attacks denote the setting where the sparse attacks are limited to a given structure, i.e., sparse patches with a given shape and number. However, previous patch adversarial attacks do not simultaneously optimize multiple patches' locations and perturbations. This work suggests a novel approach for sparse patches adversarial attacks via point-wise trimming dense adversarial perturbations. Our approach enables simultaneous optimization of multiple sparse patches' locations and perturbations for any given number and shape. Moreover, our approach is also applicable for standard sparse adversarial attacks, where we show that it significantly improves the state-of-the-art over multiple extensive settings. A reference implementation of the proposed method and the reported experiments is provided at \url{https://github.com/yanemcovsky/SparsePatches.git}

* AdvML-Frontiers 24: The 3nd Workshop on New Frontiers in Adversarial Machine Learning, NeurIPS 24

Via

Access Paper or Ask Questions

Hysteresis Activation Function for Efficient Inference

Nov 15, 2024

Moshe Kimhi, Idan Kashani, Avi Mendelson, Chaim Baskin

Abstract:The widely used ReLU is favored for its hardware efficiency, {as the implementation at inference is a one bit sign case,} yet suffers from issues such as the ``dying ReLU'' problem, where during training, neurons fail to activate and constantly remain at zero, as highlighted by Lu et al. Traditional approaches to mitigate this issue often introduce more complex and less hardware-friendly activation functions. In this work, we propose a Hysteresis Rectified Linear Unit (HeLU), an efficient activation function designed to address the ``dying ReLU'' problem with minimal complexity. Unlike traditional activation functions with fixed thresholds for training and inference, HeLU employs a variable threshold that refines the backpropagation. This refined mechanism allows simpler activation functions to achieve competitive performance comparable to their more complex counterparts without introducing unnecessary complexity or requiring inductive biases. Empirical evaluations demonstrate that HeLU enhances model generalization across diverse datasets, offering a promising solution for efficient and effective inference suitable for a wide range of neural network architectures.

* Accepted to 4th NeurIPS Efficient Natural Language and Speech Processing Workshop (ENLSP-IV 2024)

Via

Access Paper or Ask Questions

FBM: Fast-Bit Allocation for Mixed-Precision Quantization

May 30, 2022

Moshe Kimhi, Tal Rozen, Tal Kopetz, Olya Sirkin, Avi Mendelson, Chaim Baskin

Figure 1 for FBM: Fast-Bit Allocation for Mixed-Precision Quantization

Figure 2 for FBM: Fast-Bit Allocation for Mixed-Precision Quantization

Figure 3 for FBM: Fast-Bit Allocation for Mixed-Precision Quantization

Figure 4 for FBM: Fast-Bit Allocation for Mixed-Precision Quantization

Abstract:Quantized neural networks are well known for reducing latency, power consumption, and model size without significant degradation in accuracy, making them highly applicable for systems with limited resources and low power requirements. Mixed precision quantization offers better utilization of customized hardware that supports arithmetic operations at different bitwidths. Existing mixed-precision schemes rely on having a high exploration space, resulting in a large carbon footprint. In addition, these bit allocation strategies mostly induce constraints on the model size rather than utilizing the performance of neural network deployment on specific hardware. Our work proposes Fast-Bit Allocation for Mixed-Precision Quantization (FBM), which finds an optimal bitwidth allocation by measuring desired behaviors through a simulation of a specific device, or even on a physical one. While dynamic transitions of bit allocation in mixed precision quantization with ultra-low bitwidth are known to suffer from performance degradation, we present a fast recovery solution from such transitions. A comprehensive evaluation of the proposed method on CIFAR-10 and ImageNet demonstrates our method's superiority over current state-of-the-art schemes in terms of the trade-off between neural network accuracy and hardware efficiency. Our source code, experimental settings and quantized models are available at https://github.com/RamorayDrake/FBM/

Via

Access Paper or Ask Questions

Bimodal Distributed Binarized Neural Networks

Apr 05, 2022

Tal Rozen, Moshe Kimhi, Brian Chmiel, Avi Mendelson, Chaim Baskin

Figure 1 for Bimodal Distributed Binarized Neural Networks

Figure 2 for Bimodal Distributed Binarized Neural Networks

Figure 3 for Bimodal Distributed Binarized Neural Networks

Figure 4 for Bimodal Distributed Binarized Neural Networks

Abstract:Binary Neural Networks (BNNs) are an extremely promising method to reduce deep neural networks' complexity and power consumption massively. Binarization techniques, however, suffer from ineligible performance degradation compared to their full-precision counterparts. Prior work mainly focused on strategies for sign function approximation during forward and backward phases to reduce the quantization error during the binarization process. In this work, we propose a Bi-Modal Distributed binarization method (\methodname{}). That imposes bi-modal distribution of the network weights by kurtosis regularization. The proposed method consists of a training scheme that we call Weight Distribution Mimicking (WDM), which efficiently imitates the full-precision network weight distribution to their binary counterpart. Preserving this distribution during binarization-aware training creates robust and informative binary feature maps and significantly reduces the generalization error of the BNN. Extensive evaluations on CIFAR-10 and ImageNet demonstrate the superiority of our method over current state-of-the-art schemes. Our source code, experimental settings, training logs, and binary models are available at \url{https://github.com/BlueAnon/BD-BNN}.

Via

Access Paper or Ask Questions

Weisfeiler and Leman Go Infinite: Spectral and Combinatorial Pre-Colorings

Jan 31, 2022

Or Feldman, Amit Boyarski, Shai Feldman, Dani Kogan, Avi Mendelson, Chaim Baskin

Figure 1 for Weisfeiler and Leman Go Infinite: Spectral and Combinatorial Pre-Colorings

Figure 2 for Weisfeiler and Leman Go Infinite: Spectral and Combinatorial Pre-Colorings

Figure 3 for Weisfeiler and Leman Go Infinite: Spectral and Combinatorial Pre-Colorings

Figure 4 for Weisfeiler and Leman Go Infinite: Spectral and Combinatorial Pre-Colorings

Abstract:Graph isomorphism testing is usually approached via the comparison of graph invariants. Two popular alternatives that offer a good trade-off between expressive power and computational efficiency are combinatorial (i.e., obtained via the Weisfeiler-Leman (WL) test) and spectral invariants. While the exact power of the latter is still an open question, the former is regularly criticized for its limited power, when a standard configuration of uniform pre-coloring is used. This drawback hinders the applicability of Message Passing Graph Neural Networks (MPGNNs), whose expressive power is upper bounded by the WL test. Relaxing the assumption of uniform pre-coloring, we show that one can increase the expressive power of the WL test ad infinitum. Following that, we propose an efficient pre-coloring based on spectral features that provably increase the expressive power of the vanilla WL test. The above claims are accompanied by extensive synthetic and real data experiments. The code to reproduce our experiments is available at https://github.com/TPFI22/Spectral-and-Combinatorial

Via

Access Paper or Ask Questions

On Recoverability of Graph Neural Network Representations

Jan 30, 2022

Maxim Fishman, Chaim Baskin, Evgenii Zheltonozhskii, Ron Banner, Avi Mendelson

Figure 1 for On Recoverability of Graph Neural Network Representations

Figure 2 for On Recoverability of Graph Neural Network Representations

Figure 3 for On Recoverability of Graph Neural Network Representations

Figure 4 for On Recoverability of Graph Neural Network Representations

Abstract:Despite their growing popularity, graph neural networks (GNNs) still have multiple unsolved problems, including finding more expressive aggregation methods, propagation of information to distant nodes, and training on large-scale graphs. Understanding and solving such problems require developing analytic tools and techniques. In this work, we propose the notion of recoverability, which is tightly related to information aggregation in GNNs, and based on this concept, develop the method for GNN embedding analysis. We define recoverability theoretically and propose a method for its efficient empirical estimation. We demonstrate, through extensive experimental results on various datasets and different GNN architectures, that estimated recoverability correlates with aggregation method expressivity and graph sparsification quality. Therefore, we believe that the proposed method could provide an essential tool for understanding the roots of the aforementioned problems, and potentially lead to a GNN design that overcomes them. The code to reproduce our experiments is available at https://github.com/Anonymous1252022/Recoverability

Via

Access Paper or Ask Questions

Contrast to Divide: Self-Supervised Pre-Training for Learning with Noisy Labels

Mar 25, 2021

Evgenii Zheltonozhskii, Chaim Baskin, Avi Mendelson, Alex M. Bronstein, Or Litany

Figure 1 for Contrast to Divide: Self-Supervised Pre-Training for Learning with Noisy Labels

Figure 2 for Contrast to Divide: Self-Supervised Pre-Training for Learning with Noisy Labels

Figure 3 for Contrast to Divide: Self-Supervised Pre-Training for Learning with Noisy Labels

Figure 4 for Contrast to Divide: Self-Supervised Pre-Training for Learning with Noisy Labels

Abstract:The success of learning with noisy labels (LNL) methods relies heavily on the success of a warm-up stage where standard supervised training is performed using the full (noisy) training set. In this paper, we identify a "warm-up obstacle": the inability of standard warm-up stages to train high quality feature extractors and avert memorization of noisy labels. We propose "Contrast to Divide" (C2D), a simple framework that solves this problem by pre-training the feature extractor in a self-supervised fashion. Using self-supervised pre-training boosts the performance of existing LNL approaches by drastically reducing the warm-up stage's susceptibility to noise level, shortening its duration, and increasing extracted feature quality. C2D works out of the box with existing methods and demonstrates markedly improved performance, especially in the high noise regime, where we get a boost of more than 27% for CIFAR-100 with 90% noise over the previous state of the art. In real-life noise settings, C2D trained on mini-WebVision outperforms previous works both in WebVision and ImageNet validation sets by 3% top-1 accuracy. We perform an in-depth analysis of the framework, including investigating the performance of different pre-training approaches and estimating the effective upper bound of the LNL performance with semi-supervised learning. Code for reproducing our experiments is available at https://github.com/ContrastToDivide/C2D

Via

Access Paper or Ask Questions

Self-Supervised Learning for Large-Scale Unsupervised Image Clustering

Aug 24, 2020

Evgenii Zheltonozhskii, Chaim Baskin, Alex M. Bronstein, Avi Mendelson

Figure 1 for Self-Supervised Learning for Large-Scale Unsupervised Image Clustering

Figure 2 for Self-Supervised Learning for Large-Scale Unsupervised Image Clustering

Figure 3 for Self-Supervised Learning for Large-Scale Unsupervised Image Clustering

Figure 4 for Self-Supervised Learning for Large-Scale Unsupervised Image Clustering

Abstract:Unsupervised learning has always been appealing to machine learning researchers and practitioners, allowing them to avoid an expensive and complicated process of labeling the data. However, unsupervised learning of complex data is challenging, and even the best approaches show much weaker performance than their supervised counterparts. Self-supervised deep learning has become a strong instrument for representation learning in computer vision. However, those methods have not been evaluated in a fully unsupervised setting. In this paper, we propose a simple scheme for unsupervised classification based on self-supervised representations. We evaluate the proposed approach with several recent self-supervised methods showing that it achieves competitive results for ImageNet classification (39% accuracy on ImageNet with 1000 clusters and 46% with overclustering). We suggest adding the unsupervised evaluation to a set of standard benchmarks for self-supervised learning. The code is available at https://github.com/Randl/kmeans_selfsuper

Via

Access Paper or Ask Questions