Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marie-Jeanne Lesot

LFI

Did I Faithfully Say What I Thought? Bridging the Gap Between Neural Activity and Self-Explanations in Large Language Models

Jun 12, 2025

Milan Bhan, Jean-Noel Vittaut, Nicolas Chesneau, Sarath Chandar, Marie-Jeanne Lesot

Abstract:Large Language Models (LLM) have demonstrated the capability of generating free text self Natural Language Explanation (self-NLE) to justify their answers. Despite their logical appearance, self-NLE do not necessarily reflect the LLM actual decision-making process, making such explanations unfaithful. While existing methods for measuring self-NLE faithfulness mostly rely on behavioral tests or computational block identification, none of them examines the neural activity underlying the model's reasoning. This work introduces a novel flexible framework for quantitatively measuring the faithfulness of LLM-generated self-NLE by directly comparing the latter with interpretations of the model's internal hidden states. The proposed framework is versatile and provides deep insights into self-NLE faithfulness by establishing a direct connection between self-NLE and model reasoning. This approach advances the understanding of self-NLE faithfulness and provides building blocks for generating more faithful self-NLE.

Via

Access Paper or Ask Questions

An action language-based formalisation of an abstract argumentation framework

Sep 29, 2024

Yann Munro, Camilo Sarmiento, Isabelle Bloch, Gauvain Bourgne, Catherine Pelachaud, Marie-Jeanne Lesot

Figure 1 for An action language-based formalisation of an abstract argumentation framework

Abstract:An abstract argumentation framework is a commonly used formalism to provide a static representation of a dialogue. However, the order of enunciation of the arguments in an argumentative dialogue is very important and can affect the outcome of this dialogue. In this paper, we propose a new framework for modelling abstract argumentation graphs, a model that incorporates the order of enunciation of arguments. By taking this order into account, we have the means to deduce a unique outcome for each dialogue, called an extension. We also establish several properties, such as termination and correctness, and discuss two notions of completeness. In particular, we propose a modification of the previous transformation based on a "last enunciated last updated" strategy, which verifies the second form of completeness.

* To be published in The 25th International Conference on Principles and Practice of Multi-Agent Systems

Via

Access Paper or Ask Questions

Why do explanations fail? A typology and discussion on failures in XAI

May 22, 2024

Clara Bove, Thibault Laugel, Marie-Jeanne Lesot, Charles Tijus, Marcin Detyniecki

Figure 1 for Why do explanations fail? A typology and discussion on failures in XAI

Figure 2 for Why do explanations fail? A typology and discussion on failures in XAI

Abstract:As Machine Learning (ML) models achieve unprecedented levels of performance, the XAI domain aims at making these models understandable by presenting end-users with intelligible explanations. Yet, some existing XAI approaches fail to meet expectations: several issues have been reported in the literature, generally pointing out either technical limitations or misinterpretations by users. In this paper, we argue that the resulting harms arise from a complex overlap of multiple failures in XAI, which existing ad-hoc studies fail to capture. This work therefore advocates for a holistic perspective, presenting a systematic investigation of limitations of current XAI methods and their impact on the interpretation of explanations. By distinguishing between system-specific and user-specific failures, we propose a typological framework that helps revealing the nuanced complexities of explanation failures. Leveraging this typology, we also discuss some research directions to help AI practitioners better understand the limitations of XAI systems and enhance the quality of ML explanations.

Via

Access Paper or Ask Questions

Mitigating Text Toxicity with Counterfactual Generation

May 16, 2024

Milan Bhan, Jean-Noel Vittaut, Nina Achache, Victor Legrand, Nicolas Chesneau, Annabelle Blangero, Juliette Murris, Marie-Jeanne Lesot

Figure 1 for Mitigating Text Toxicity with Counterfactual Generation

Figure 2 for Mitigating Text Toxicity with Counterfactual Generation

Figure 3 for Mitigating Text Toxicity with Counterfactual Generation

Figure 4 for Mitigating Text Toxicity with Counterfactual Generation

Abstract:Toxicity mitigation consists in rephrasing text in order to remove offensive or harmful meaning. Neural natural language processing (NLP) models have been widely used to target and mitigate textual toxicity. However, existing methods fail to detoxify text while preserving the initial non-toxic meaning at the same time. In this work, we propose to apply counterfactual generation methods from the eXplainable AI (XAI) field to target and mitigate textual toxicity. In particular, we perform text detoxification by applying local feature importance and counterfactual generation methods to a toxicity classifier distinguishing between toxic and non-toxic texts. We carry out text detoxification through counterfactual generation on three datasets and compare our approach to three competitors. Automatic and human evaluations show that recently developed NLP counterfactual generators can mitigate toxicity accurately while better preserving the meaning of the initial text as compared to classical detoxification methods. Finally, we take a step back from using automated detoxification tools, and discuss how to manage the polysemous nature of toxicity and the risk of malicious use of detoxification tools. This work is the first to bridge the gap between counterfactual generation and text detoxification and paves the way towards more practical application of XAI methods.

Via

Access Paper or Ask Questions

Self-AMPLIFY: Improving Small Language Models with Self Post Hoc Explanations

Feb 19, 2024

Milan Bhan, Jean-Noel Vittaut, Nicolas Chesneau, Marie-Jeanne Lesot

Abstract:Incorporating natural language rationales in the prompt and In-Context Learning (ICL) has led to a significant improvement of Large Language Models (LLMs) performance. However, rationales currently require human-annotation or the use of auxiliary proxy models to target promising samples or generate high-quality rationales. In this work, we propose Self-AMPLIFY to generate automatically rationales from post hoc explanation methods applied to Small Language Models (SLMs) to improve their own performance. Self-AMPLIFY is a 3-step method that targets samples, generates rationales and builds a final prompt to leverage ICL. Self-AMPLIFY performance is evaluated on two SLMs and two datasets requiring reasoning abilities: these experiments show that Self-AMPLIFY achieves good results against competitors. Self-AMPLIFY is the first method to apply post hoc explanation methods to SLM to generate rationales to improve their own performance in a fully automated manner.

Via

Access Paper or Ask Questions

Dynamic Interpretability for Model Comparison via Decision Rules

Sep 29, 2023

Adam Rida, Marie-Jeanne Lesot, Xavier Renard, Christophe Marsala

Abstract:Explainable AI (XAI) methods have mostly been built to investigate and shed light on single machine learning models and are not designed to capture and explain differences between multiple models effectively. This paper addresses the challenge of understanding and explaining differences between machine learning models, which is crucial for model selection, monitoring and lifecycle management in real-world applications. We propose DeltaXplainer, a model-agnostic method for generating rule-based explanations describing the differences between two binary classifiers. To assess the effectiveness of DeltaXplainer, we conduct experiments on synthetic and real-world datasets, covering various model comparison scenarios involving different types of concept drift.

Via

Access Paper or Ask Questions

Achieving Diversity in Counterfactual Explanations: a Review and Discussion

May 10, 2023

Thibault Laugel, Adulam Jeyasothy, Marie-Jeanne Lesot, Christophe Marsala, Marcin Detyniecki

Abstract:In the field of Explainable Artificial Intelligence (XAI), counterfactual examples explain to a user the predictions of a trained decision model by indicating the modifications to be made to the instance so as to change its associated prediction. These counterfactual examples are generally defined as solutions to an optimization problem whose cost function combines several criteria that quantify desiderata for a good explanation meeting user needs. A large variety of such appropriate properties can be considered, as the user needs are generally unknown and differ from one user to another; their selection and formalization is difficult. To circumvent this issue, several approaches propose to generate, rather than a single one, a set of diverse counterfactual examples to explain a prediction. This paper proposes a review of the numerous, sometimes conflicting, definitions that have been proposed for this notion of diversity. It discusses their underlying principles as well as the hypotheses on the user needs they rely on and proposes to categorize them along several dimensions (explicit vs implicit, universe in which they are defined, level at which they apply), leading to the identification of further research challenges on this topic.

Via

Access Paper or Ask Questions

TIGTEC : Token Importance Guided TExt Counterfactuals

Apr 24, 2023

Milan Bhan, Jean-Noel Vittaut, Nicolas Chesneau, Marie-Jeanne Lesot

Figure 1 for TIGTEC : Token Importance Guided TExt Counterfactuals

Figure 2 for TIGTEC : Token Importance Guided TExt Counterfactuals

Figure 3 for TIGTEC : Token Importance Guided TExt Counterfactuals

Figure 4 for TIGTEC : Token Importance Guided TExt Counterfactuals

Abstract:Counterfactual examples explain a prediction by highlighting changes of instance that flip the outcome of a classifier. This paper proposes TIGTEC, an efficient and modular method for generating sparse, plausible and diverse counterfactual explanations for textual data. TIGTEC is a text editing heuristic that targets and modifies words with high contribution using local feature importance. A new attention-based local feature importance is proposed. Counterfactual candidates are generated and assessed with a cost function integrating semantic distance, while the solution space is efficiently explored in a beam search fashion. The conducted experiments show the relevance of TIGTEC in terms of success rate, sparsity, diversity and plausibility. This method can be used in both model-specific or model-agnostic way, which makes it very convenient for generating counterfactual explanations.

Via

Access Paper or Ask Questions

Integrating Prior Knowledge in Post-hoc Explanations

Apr 25, 2022

Adulam Jeyasothy, Thibault Laugel, Marie-Jeanne Lesot, Christophe Marsala, Marcin Detyniecki

Figure 1 for Integrating Prior Knowledge in Post-hoc Explanations

Figure 2 for Integrating Prior Knowledge in Post-hoc Explanations

Figure 3 for Integrating Prior Knowledge in Post-hoc Explanations

Figure 4 for Integrating Prior Knowledge in Post-hoc Explanations

Abstract:In the field of eXplainable Artificial Intelligence (XAI), post-hoc interpretability methods aim at explaining to a user the predictions of a trained decision model. Integrating prior knowledge into such interpretability methods aims at improving the explanation understandability and allowing for personalised explanations adapted to each user. In this paper, we propose to define a cost function that explicitly integrates prior knowledge into the interpretability objectives: we present a general framework for the optimization problem of post-hoc interpretability methods, and show that user knowledge can thus be integrated to any method by adding a compatibility term in the cost function. We instantiate the proposed formalization in the case of counterfactual explanations and propose a new interpretability method called Knowledge Integration in Counterfactual Explanation (KICE) to optimize it. The paper performs an experimental study on several benchmark data sets to characterize the counterfactual instances generated by KICE, as compared to reference methods.

* preprint

Via

Access Paper or Ask Questions

Fuzzy Conceptual Graphs: a comparative discussion

Oct 26, 2021

Adam Faci, Marie-Jeanne Lesot, Claire Laudy

Figure 1 for Fuzzy Conceptual Graphs: a comparative discussion

Figure 2 for Fuzzy Conceptual Graphs: a comparative discussion

Abstract:Conceptual Graphs (CG) are a graph-based knowledge representation and reasoning formalism; fuzzy Conceptual Graphs (fCG) constitute an extension that enriches their expressiveness, exploiting the fuzzy set theory so as to relax their constraints at various levels. This paper proposes a comparative study of existing approaches over their respective advantages and possible limitations. The discussion revolves around three axes: (a) Critical view of each approach and comparison with previous propositions from the state of the art; (b) Presentation of the many possible interpretations of each definition to illustrate its potential and its limits; (c) Clarification of the part of CG impacted by the definition as well as the relaxed constraint.

* Proc. of the Int. Conf. of the European Society for Fuzzy Logic and Technology (EUSFLAT2021), Sep 2021, Bratislava, Slovakia

Via

Access Paper or Ask Questions