Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

François Portet

GETALP

Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable?

Feb 28, 2025

Maxime Méloux, Silviu Maniu, François Portet, Maxime Peyrard

Abstract:As AI systems are used in high-stakes applications, ensuring interpretability is crucial. Mechanistic Interpretability (MI) aims to reverse-engineer neural networks by extracting human-understandable algorithms to explain their behavior. This work examines a key question: for a given behavior, and under MI's criteria, does a unique explanation exist? Drawing on identifiability in statistics, where parameters are uniquely inferred under specific assumptions, we explore the identifiability of MI explanations. We identify two main MI strategies: (1) "where-then-what," which isolates a circuit replicating model behavior before interpreting it, and (2) "what-then-where," which starts with candidate algorithms and searches for neural activation subspaces implementing them, using causal alignment. We test both strategies on Boolean functions and small multi-layer perceptrons, fully enumerating candidate explanations. Our experiments reveal systematic non-identifiability: multiple circuits can replicate behavior, a circuit can have multiple interpretations, several algorithms can align with the network, and one algorithm can align with different subspaces. Is uniqueness necessary? A pragmatic approach may require only predictive and manipulability standards. If uniqueness is essential for understanding, stricter criteria may be needed. We also reference the inner interpretability framework, which validates explanations through multiple criteria. This work contributes to defining explanation standards in AI.

Via

Access Paper or Ask Questions

FedAli: Personalized Federated Learning with Aligned Prototypes through Optimal Transport

Nov 15, 2024

Sannara Ek, Kaile Wang, François Portet, Philippe Lalanda, Jiannong Cao

Abstract:Federated Learning (FL) enables collaborative, personalized model training across multiple devices without sharing raw data, making it ideal for pervasive computing applications that optimize user-centric performances in diverse environments. However, data heterogeneity among clients poses a significant challenge, leading to inconsistencies among trained client models and reduced performance. To address this, we introduce the Alignment with Prototypes (ALP) layers, which align incoming embeddings closer to learnable prototypes through an optimal transport plan. During local training, the ALP layer updates local prototypes and aligns embeddings toward global prototypes aggregated from all clients using our novel FL framework, Federated Alignment (FedAli). For model inferences, embeddings are guided toward local prototypes to better reflect the client's local data distribution. We evaluate FedAli on heterogeneous sensor-based human activity recognition and vision benchmark datasets, demonstrating that it outperforms existing FL strategies. We publicly release our source code to facilitate reproducibility and furthered research.

* Pre-print version 1

Via

Access Paper or Ask Questions

Spoken Dialogue System for Medical Prescription Acquisition on Smartphone: Development, Corpus and Evaluation

Nov 06, 2023

Ali Can Kocabiyikoglu, François Portet, Jean-Marc Babouchkine, Prudence Gibert, Hervé Blanchon, Gaëtan Gavazzi

Abstract:Hospital information systems (HIS) have become an essential part of healthcare institutions and now incorporate prescribing support software. Prescription support software allows for structured information capture, which improves the safety, appropriateness and efficiency of prescriptions and reduces the number of adverse drug events (ADEs). However, such a system increases the amount of time physicians spend at a computer entering information instead of providing medical care. In addition, any new visiting clinician must learn to manage complex interfaces since each HIS has its own interfaces. In this paper, we present a natural language interface for e-prescribing software in the form of a spoken dialogue system accessible on a smartphone. This system allows prescribers to record their prescriptions verbally, a form of interaction closer to their usual practice. The system extracts the formal representation of the prescription ready to be checked by the prescribing software and uses the dialogue to request mandatory information, correct errors or warn of particular situations. Since, to the best of our knowledge, there is no existing voice-based prescription dialogue system, we present the system developed in a low-resource environment, focusing on dialogue modeling, semantic extraction and data augmentation. The system was evaluated in the wild with 55 participants. This evaluation showed that our system has an average prescription time of 66.15 seconds for physicians and 35.64 seconds for other experts, and a task success rate of 76\% for physicians and 72\% for other experts. All evaluation data were recorded and annotated to form PxCorpus, the first spoken drug prescription corpus that has been made fully available to the community (\url{https://doi.org/10.5281/zenodo.6524162}).

Via

Access Paper or Ask Questions

Can GPT models Follow Human Summarization Guidelines? Evaluating ChatGPT and GPT-4 for Dialogue Summarization

Oct 25, 2023

Yongxin Zhou, Fabien Ringeval, François Portet

Figure 1 for Can GPT models Follow Human Summarization Guidelines? Evaluating ChatGPT and GPT-4 for Dialogue Summarization

Figure 2 for Can GPT models Follow Human Summarization Guidelines? Evaluating ChatGPT and GPT-4 for Dialogue Summarization

Figure 3 for Can GPT models Follow Human Summarization Guidelines? Evaluating ChatGPT and GPT-4 for Dialogue Summarization

Figure 4 for Can GPT models Follow Human Summarization Guidelines? Evaluating ChatGPT and GPT-4 for Dialogue Summarization

Abstract:This study explores the capabilities of prompt-driven Large Language Models (LLMs) like ChatGPT and GPT-4 in adhering to human guidelines for dialogue summarization. Experiments employed DialogSum (English social conversations) and DECODA (French call center interactions), testing various prompts: including prompts from existing literature and those from human summarization guidelines, as well as a two-step prompt approach. Our findings indicate that GPT models often produce lengthy summaries and deviate from human summarization guidelines. However, using human guidelines as an intermediate step shows promise, outperforming direct word-length constraint prompts in some cases. The results reveal that GPT models exhibit unique stylistic tendencies in their summaries. While BERTScores did not dramatically decrease for GPT outputs suggesting semantic similarity to human references and specialised pre-trained models, ROUGE scores reveal grammatical and lexical disparities between GPT-generated and human-written summaries. These findings shed light on the capabilities and limitations of GPT models in following human instructions for dialogue summarization.

Via

Access Paper or Ask Questions

FRACAS: A FRench Annotated Corpus of Attribution relations in newS

Sep 19, 2023

Ange Richard, Laura Alonzo-Canul, François Portet

Figure 1 for FRACAS: A FRench Annotated Corpus of Attribution relations in newS

Figure 2 for FRACAS: A FRench Annotated Corpus of Attribution relations in newS

Figure 3 for FRACAS: A FRench Annotated Corpus of Attribution relations in newS

Figure 4 for FRACAS: A FRench Annotated Corpus of Attribution relations in newS

Abstract:Quotation extraction is a widely useful task both from a sociological and from a Natural Language Processing perspective. However, very little data is available to study this task in languages other than English. In this paper, we present a manually annotated corpus of 1676 newswire texts in French for quotation extraction and source attribution. We first describe the composition of our corpus and the choices that were made in selecting the data. We then detail the annotation guidelines and annotation process, as well as a few statistics about the final corpus and the obtained balance between quote types (direct, indirect and mixed, which are particularly challenging). We end by detailing our inter-annotator agreement between the 8 annotators who worked on manual labelling, which is substantially high for such a difficult linguistic phenomenon.

Via

Access Paper or Ask Questions

Evaluating Emotional Nuances in Dialogue Summarization

Jul 23, 2023

Yongxin Zhou, Fabien Ringeval, François Portet

Abstract:Automatic dialogue summarization is a well-established task that aims to identify the most important content from human conversations to create a short textual summary. Despite recent progress in the field, we show that most of the research has focused on summarizing the factual information, leaving aside the affective content, which can yet convey useful information to analyse, monitor, or support human interactions. In this paper, we propose and evaluate a set of measures $PEmo$, to quantify how much emotion is preserved in dialog summaries. Results show that, summarization models of the state-of-the-art do not preserve well the emotional content in the summaries. We also show that by reducing the training set to only emotional dialogues, the emotional content is better preserved in the generated summaries, while conserving the most salient factual information.

Via

Access Paper or Ask Questions

Combining Public Human Activity Recognition Datasets to Mitigate Labeled Data Scarcity

Jun 23, 2023

Riccardo Presotto, Sannara Ek, Gabriele Civitarese, François Portet, Philippe Lalanda, Claudio Bettini

Abstract:The use of supervised learning for Human Activity Recognition (HAR) on mobile devices leads to strong classification performances. Such an approach, however, requires large amounts of labeled data, both for the initial training of the models and for their customization on specific clients (whose data often differ greatly from the training data). This is actually impractical to obtain due to the costs, intrusiveness, and time-consuming nature of data annotation. Moreover, even with the help of a significant amount of labeled data, model deployment on heterogeneous clients faces difficulties in generalizing well on unseen data. Other domains, like Computer Vision or Natural Language Processing, have proposed the notion of pre-trained models, leveraging large corpora, to reduce the need for annotated data and better manage heterogeneity. This promising approach has not been implemented in the HAR domain so far because of the lack of public datasets of sufficient size. In this paper, we propose a novel strategy to combine publicly available datasets with the goal of learning a generalized HAR model that can be fine-tuned using a limited amount of labeled data on an unseen target domain. Our experimental evaluation, which includes experimenting with different state-of-the-art neural network architectures, shows that combining public datasets can significantly reduce the number of labeled samples required to achieve satisfactory performance on an unseen target domain.

* IEEE SMARTCOMP 2023

Via

Access Paper or Ask Questions

Evaluation and comparison of federated learning algorithms for Human Activity Recognition on smartphones

Oct 30, 2022

Sannara Ek, François Portet, Philippe Lalanda, German Vega

Abstract:Pervasive computing promotes the integration of smart devices in our living spaces to develop services providing assistance to people. Such smart devices are increasingly relying on cloud-based Machine Learning, which raises questions in terms of security (data privacy), reliance (latency), and communication costs. In this context, Federated Learning (FL) has been introduced as a new machine learning paradigm enhancing the use of local devices. At the server level, FL aggregates models learned locally on distributed clients to obtain a more general model. In this way, no private data is sent over the network, and the communication cost is reduced. Unfortunately, however, the most popular federated learning algorithms have been shown not to be adapted to some highly heterogeneous pervasive computing environments. In this paper, we propose a new FL algorithm, termed FedDist, which can modify models (here, deep neural network) during training by identifying dissimilarities between neurons among the clients. This permits to account for clients' specificity without impairing generalization. FedDist evaluated with three state-of-the-art federated learning algorithms on three large heterogeneous mobile Human Activity Recognition datasets. Results have shown the ability of FedDist to adapt to heterogeneous data and the capability of FL to deal with asynchronous situations.

* Pervasive and Mobile Computing Volume 87, December 2022
* arXiv admin note: substantial text overlap with arXiv:2110.10223

Via

Access Paper or Ask Questions

Lightweight Transformers for Human Activity Recognition on Mobile Devices

Sep 22, 2022

Sannara EK, François Portet, Philippe Lalanda

Figure 1 for Lightweight Transformers for Human Activity Recognition on Mobile Devices

Figure 2 for Lightweight Transformers for Human Activity Recognition on Mobile Devices

Figure 3 for Lightweight Transformers for Human Activity Recognition on Mobile Devices

Figure 4 for Lightweight Transformers for Human Activity Recognition on Mobile Devices

Abstract:Human Activity Recognition (HAR) on mobile devices has shown to be achievable with lightweight neural models learned from data generated by the user's inertial measurement units (IMUs). Most approaches for instanced-based HAR have used Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTMs), or a combination of the two to achieve state-of-the-art results with real-time performances. Recently, the Transformers architecture in the language processing domain and then in the vision domain has pushed further the state-of-the-art over classical architectures. However, such Transformers architecture is heavyweight in computing resources, which is not well suited for embedded applications of HAR that can be found in the pervasive computing domain. In this study, we present Human Activity Recognition Transformer (HART), a lightweight, sensor-wise transformer architecture that has been specifically adapted to the domain of the IMUs embedded on mobile devices. Our experiments on HAR tasks with several publicly available datasets show that HART uses fewer FLoating-point Operations Per Second (FLOPS) and parameters while outperforming current state-of-the-art results. Furthermore, we present evaluations across various architectures on their performances in heterogeneous environments and show that our models can better generalize on different sensing devices or on-body positions.

Via

Access Paper or Ask Questions

Effectiveness of French Language Models on Abstractive Dialogue Summarization Task

Jul 17, 2022

Yongxin Zhou, François Portet, Fabien Ringeval

Figure 1 for Effectiveness of French Language Models on Abstractive Dialogue Summarization Task

Figure 2 for Effectiveness of French Language Models on Abstractive Dialogue Summarization Task

Figure 3 for Effectiveness of French Language Models on Abstractive Dialogue Summarization Task

Figure 4 for Effectiveness of French Language Models on Abstractive Dialogue Summarization Task

Abstract:Pre-trained language models have established the state-of-the-art on various natural language processing tasks, including dialogue summarization, which allows the reader to quickly access key information from long conversations in meetings, interviews or phone calls. However, such dialogues are still difficult to handle with current models because the spontaneity of the language involves expressions that are rarely present in the corpora used for pre-training the language models. Moreover, the vast majority of the work accomplished in this field has been focused on English. In this work, we present a study on the summarization of spontaneous oral dialogues in French using several language specific pre-trained models: BARThez, and BelGPT-2, as well as multilingual pre-trained models: mBART, mBARThez, and mT5. Experiments were performed on the DECODA (Call Center) dialogue corpus whose task is to generate abstractive synopses from call center conversations between a caller and one or several agents depending on the situation. Results show that the BARThez models offer the best performance far above the previous state-of-the-art on DECODA. We further discuss the limits of such pre-trained models and the challenges that must be addressed for summarizing spontaneous dialogues.

* Yongxin Zhou, Fran\c{c}ois Portet, Fabien Ringeval. Effectiveness of French Language Models on Abstractive Dialogue Summarization Task. LREC 2022, Marseille, France, 21-23 June 2022

Via

Access Paper or Ask Questions