Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yehonatan Elisha

Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon

Feb 11, 2025

Nurit Cohen-Inger, Yehonatan Elisha, Bracha Shapira, Lior Rokach, Seffi Cohen

Abstract:Large language models (LLMs) often appear to excel on public benchmarks, but these high scores may mask an overreliance on dataset-specific surface cues rather than true language understanding. We introduce the Chameleon Benchmark Overfit Detector (C-BOD), a meta-evaluation framework that systematically distorts benchmark prompts via a parametric transformation and detects overfitting of LLMs. By rephrasing inputs while preserving their semantic content and labels, C-BOD exposes whether a model's performance is driven by memorized patterns. Evaluated on the MMLU benchmark using 26 leading LLMs, our method reveals an average performance degradation of 2.15% under modest perturbations, with 20 out of 26 models exhibiting statistically significant differences. Notably, models with higher baseline accuracy exhibit larger performance differences under perturbation, and larger LLMs tend to be more sensitive to rephrasings indicating that both cases may overrely on fixed prompt patterns. In contrast, the Llama family and models with lower baseline accuracy show insignificant degradation, suggesting reduced dependency on superficial cues. Moreover, C-BOD's dataset- and model-agnostic design allows easy integration into training pipelines to promote more robust language understanding. Our findings challenge the community to look beyond leaderboard scores and prioritize resilience and generalization in LLM evaluation.

Via

Access Paper or Ask Questions

BEE: Metric-Adapted Explanations via Baseline Exploration-Exploitation

Dec 23, 2024

Oren Barkan, Yehonatan Elisha, Jonathan Weill, Noam Koenigstein

Figure 1 for BEE: Metric-Adapted Explanations via Baseline Exploration-Exploitation

Figure 2 for BEE: Metric-Adapted Explanations via Baseline Exploration-Exploitation

Figure 3 for BEE: Metric-Adapted Explanations via Baseline Exploration-Exploitation

Figure 4 for BEE: Metric-Adapted Explanations via Baseline Exploration-Exploitation

Abstract:Two prominent challenges in explainability research involve 1) the nuanced evaluation of explanations and 2) the modeling of missing information through baseline representations. The existing literature introduces diverse evaluation metrics, each scrutinizing the quality of explanations through distinct lenses. Additionally, various baseline representations have been proposed, each modeling the notion of missingness differently. Yet, a consensus on the ultimate evaluation metric and baseline representation remains elusive. This work acknowledges the diversity in explanation metrics and baselines, demonstrating that different metrics exhibit preferences for distinct explanation maps resulting from the utilization of different baseline representations and distributions. To address the diversity in metrics and accommodate the variety of baseline representations in a unified manner, we propose Baseline Exploration-Exploitation (BEE) - a path-integration method that introduces randomness to the integration process by modeling the baseline as a learned random tensor. This tensor follows a learned mixture of baseline distributions optimized through a contextual exploration-exploitation procedure to enhance performance on the specific metric of interest. By resampling the baseline from the learned distribution, BEE generates a comprehensive set of explanation maps, facilitating the selection of the best-performing explanation map in this broad set for the given metric. Extensive evaluations across various model architectures showcase the superior performance of BEE in comparison to state-of-the-art explanation methods on a variety of objective evaluation metrics.

* AAAI 2025

Via

Access Paper or Ask Questions

Deep Integrated Explanations

Oct 28, 2023

Oren Barkan, Yehonatan Elisha, Jonathan Weill, Yuval Asher, Amit Eshel, Noam Koenigstein

Figure 1 for Deep Integrated Explanations

Figure 2 for Deep Integrated Explanations

Figure 3 for Deep Integrated Explanations

Figure 4 for Deep Integrated Explanations

Abstract:This paper presents Deep Integrated Explanations (DIX) - a universal method for explaining vision models. DIX generates explanation maps by integrating information from the intermediate representations of the model, coupled with their corresponding gradients. Through an extensive array of both objective and subjective evaluations spanning diverse tasks, datasets, and model configurations, we showcase the efficacy of DIX in generating faithful and accurate explanation maps, while surpassing current state-of-the-art methods.

* CIKM 2023

Via

Access Paper or Ask Questions

Visual Explanations via Iterated Integrated Attributions

Oct 28, 2023

Oren Barkan, Yehonatan Elisha, Yuval Asher, Amit Eshel, Noam Koenigstein

Figure 1 for Visual Explanations via Iterated Integrated Attributions

Figure 2 for Visual Explanations via Iterated Integrated Attributions

Figure 3 for Visual Explanations via Iterated Integrated Attributions

Figure 4 for Visual Explanations via Iterated Integrated Attributions

Abstract:We introduce Iterated Integrated Attributions (IIA) - a generic method for explaining the predictions of vision models. IIA employs iterative integration across the input image, the internal representations generated by the model, and their gradients, yielding precise and focused explanation maps. We demonstrate the effectiveness of IIA through comprehensive evaluations across various tasks, datasets, and network architectures. Our results showcase that IIA produces accurate explanation maps, outperforming other state-of-the-art explanation techniques.

* ICCV 2023

Via

Access Paper or Ask Questions

Learning to Explain: A Model-Agnostic Framework for Explaining Black Box Models

Oct 25, 2023

Oren Barkan, Yuval Asher, Amit Eshel, Yehonatan Elisha, Noam Koenigstein

Figure 1 for Learning to Explain: A Model-Agnostic Framework for Explaining Black Box Models

Figure 2 for Learning to Explain: A Model-Agnostic Framework for Explaining Black Box Models

Figure 3 for Learning to Explain: A Model-Agnostic Framework for Explaining Black Box Models

Figure 4 for Learning to Explain: A Model-Agnostic Framework for Explaining Black Box Models

Abstract:We present Learning to Explain (LTX), a model-agnostic framework designed for providing post-hoc explanations for vision models. The LTX framework introduces an "explainer" model that generates explanation maps, highlighting the crucial regions that justify the predictions made by the model being explained. To train the explainer, we employ a two-stage process consisting of initial pretraining followed by per-instance finetuning. During both stages of training, we utilize a unique configuration where we compare the explained model's prediction for a masked input with its original prediction for the unmasked input. This approach enables the use of a novel counterfactual objective, which aims to anticipate the model's output using masked versions of the input image. Importantly, the LTX framework is not restricted to a specific model architecture and can provide explanations for both Transformer-based and convolutional models. Through our evaluations, we demonstrate that LTX significantly outperforms the current state-of-the-art in explainability across various metrics.

Via

Access Paper or Ask Questions