Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Florian Yger

LITIS

A probabilistic view on Riemannian machine learning models for SPD matrices

May 05, 2025

Thibault de Surrel, Florian Yger, Fabien Lotte, Sylvain Chevallier

Abstract:The goal of this paper is to show how different machine learning tools on the Riemannian manifold $\mathcal{P}_d$ of Symmetric Positive Definite (SPD) matrices can be united under a probabilistic framework. For this, we will need several Gaussian distributions defined on $\mathcal{P}_d$. We will show how popular classifiers on $\mathcal{P}_d$ can be reinterpreted as Bayes Classifiers using these Gaussian distributions. These distributions will also be used for outlier detection and dimension reduction. By showing that those distributions are pervasive in the tools used on $\mathcal{P}_d$, we allow for other machine learning tools to be extended to $\mathcal{P}_d$.

Via

Access Paper or Ask Questions

Identifying Obfuscated Code through Graph-Based Semantic Analysis of Binary Code

Apr 02, 2025

Roxane Cohen, Robin David, Florian Yger, Fabrice Rossi

Abstract:Protecting sensitive program content is a critical issue in various situations, ranging from legitimate use cases to unethical contexts. Obfuscation is one of the most used techniques to ensure such protection. Consequently, attackers must first detect and characterize obfuscation before launching any attack against it. This paper investigates the problem of function-level obfuscation detection using graph-based approaches, comparing algorithms, from elementary baselines to promising techniques like GNN (Graph Neural Networks), on different feature choices. We consider various obfuscation types and obfuscators, resulting in two complex datasets. Our findings demonstrate that GNNs need meaningful features that capture aspects of function semantics to outperform baselines. Our approach shows satisfactory results, especially in a challenging 11-class classification task and in a practical malware analysis example.

* The 13th International Conference on Complex Networks and their Applications, Dec 2024, Istabul, Turkey

Via

Access Paper or Ask Questions

Wrapped Gaussian on the manifold of Symmetric Positive Definite Matrices

Feb 03, 2025

Thibault de Surrel, Fabien Lotte, Sylvain Chevallier, Florian Yger

Abstract:Circular and non-flat data distributions are prevalent across diverse domains of data science, yet their specific geometric structures often remain underutilized in machine learning frameworks. A principled approach to accounting for the underlying geometry of such data is pivotal, particularly when extending statistical models, like the pervasive Gaussian distribution. In this work, we tackle those issue by focusing on the manifold of symmetric positive definite matrices, a key focus in information geometry. We introduced a non-isotropic wrapped Gaussian by leveraging the exponential map, we derive theoretical properties of this distribution and propose a maximum likelihood framework for parameter estimation. Furthermore, we reinterpret established classifiers on SPD through a probabilistic lens and introduce new classifiers based on the wrapped Gaussian model. Experiments on synthetic and real-world datasets demonstrate the robustness and flexibility of this geometry-aware distribution, underscoring its potential to advance manifold-based data analysis. This work lays the groundwork for extending classical machine learning and statistical methods to more complex and structured data.

Via

Access Paper or Ask Questions

Meta-survey on outlier and anomaly detection

Dec 12, 2023

Madalina Olteanu, Fabrice Rossi, Florian Yger

Figure 1 for Meta-survey on outlier and anomaly detection

Figure 2 for Meta-survey on outlier and anomaly detection

Figure 3 for Meta-survey on outlier and anomaly detection

Figure 4 for Meta-survey on outlier and anomaly detection

Abstract:The impact of outliers and anomalies on model estimation and data processing is of paramount importance, as evidenced by the extensive body of research spanning various fields over several decades: thousands of research papers have been published on the subject. As a consequence, numerous reviews, surveys, and textbooks have sought to summarize the existing literature, encompassing a wide range of methods from both the statistical and data mining communities. While these endeavors to organize and summarize the research are invaluable, they face inherent challenges due to the pervasive nature of outliers and anomalies in all data-intensive applications, irrespective of the specific application field or scientific discipline. As a result, the resulting collection of papers remains voluminous and somewhat heterogeneous. To address the need for knowledge organization in this domain, this paper implements the first systematic meta-survey of general surveys and reviews on outlier and anomaly detection. Employing a classical systematic survey approach, the study collects nearly 500 papers using two specialized scientific search engines. From this comprehensive collection, a subset of 56 papers that claim to be general surveys on outlier detection is selected using a snowball search technique to enhance field coverage. A meticulous quality assessment phase further refines the selection to a subset of 25 high-quality general surveys. Using this curated collection, the paper investigates the evolution of the outlier detection field over a 20-year period, revealing emerging themes and methods. Furthermore, an analysis of the surveys sheds light on the survey writing practices adopted by scholars from different communities who have contributed to this field. Finally, the paper delves into several topics where consensus has emerged from the literature. These include taxonomies of outlier types, challenges posed by high-dimensional data, the importance of anomaly scores, the impact of learning conditions, difficulties in benchmarking, and the significance of neural networks. Non-consensual aspects are also discussed, particularly the distinction between local and global outliers and the challenges in organizing detection methods into meaningful taxonomies.

* Neurocomputing, 2023, 555, pp.126634

Via

Access Paper or Ask Questions

Structure-Preserving Transformers for Sequences of SPD Matrices

Sep 25, 2023

Mathieu Seraphim, Alexis Lechervy, Florian Yger, Luc Brun, Olivier Etard

Figure 1 for Structure-Preserving Transformers for Sequences of SPD Matrices

Figure 2 for Structure-Preserving Transformers for Sequences of SPD Matrices

Figure 3 for Structure-Preserving Transformers for Sequences of SPD Matrices

Abstract:In recent years, Transformer-based auto-attention mechanisms have been successfully applied to the analysis of a variety of context-reliant data types, from texts to images and beyond, including data from non-Euclidean geometries. In this paper, we present such a mechanism, designed to classify sequences of Symmetric Positive Definite matrices while preserving their Riemannian geometry throughout the analysis. We apply our method to automatic sleep staging on timeseries of EEG-derived covariance matrices from a standard dataset, obtaining high levels of stage-wise performance.

* Submitted to the ICASSP 2024 Conference. v2: error correction relative to v1 - Section 1, changed "less anisotropic" to "less isotropic". v3: updated citation 15 (has since been published)

Via

Access Paper or Ask Questions

Challenges in anomaly and change point detection

Dec 27, 2022

Madalina Olteanu, Fabrice Rossi, Florian Yger

Abstract:This paper presents an introduction to the state-of-the-art in anomaly and change-point detection. On the one hand, the main concepts needed to understand the vast scientific literature on those subjects are introduced. On the other, a selection of important surveys and books, as well as two selected active research topics in the field, are presented.

* 30th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2022), Oct 2022, Bruges, Belgium

Via

Access Paper or Ask Questions

Is the U-Net Directional-Relationship Aware?

Jul 06, 2022

Mateus Riva, Pietro Gori, Florian Yger, Isabelle Bloch

Figure 1 for Is the U-Net Directional-Relationship Aware?

Figure 2 for Is the U-Net Directional-Relationship Aware?

Figure 3 for Is the U-Net Directional-Relationship Aware?

Figure 4 for Is the U-Net Directional-Relationship Aware?

Abstract:CNNs are often assumed to be capable of using contextual information about distinct objects (such as their directional relations) inside their receptive field. However, the nature and limits of this capacity has never been explored in full. We explore a specific type of relationship~-- directional~-- using a standard U-Net trained to optimize a cross-entropy loss function for segmentation. We train this network on a pretext segmentation task requiring directional relation reasoning for success and state that, with enough data and a sufficiently large receptive field, it succeeds to learn the proposed task. We further explore what the network has learned by analysing scenarios where the directional relationships are perturbed, and show that the network has learned to reason using these relationships.

* Accepted at ICIP 2022

Via

Access Paper or Ask Questions

Multi-winner Approval Voting Goes Epistemic

Jan 17, 2022

Tahar Allouche, Jérôme Lang, Florian Yger

Figure 1 for Multi-winner Approval Voting Goes Epistemic

Figure 2 for Multi-winner Approval Voting Goes Epistemic

Figure 3 for Multi-winner Approval Voting Goes Epistemic

Figure 4 for Multi-winner Approval Voting Goes Epistemic

Abstract:Epistemic voting interprets votes as noisy signals about a ground truth. We consider contexts where the truth consists of a set of objective winners, knowing a lower and upper bound on its cardinality. A prototypical problem for this setting is the aggre-gation of multi-label annotations with prior knowledge on the size of the ground truth. We posit noisemodels, for which we define rules that output an optimal set of winners. We report on experiments on multi-label annotations (which we collected).

Via

Access Paper or Ask Questions

Non parametric estimation of causal populations in a counterfactual scenario

Dec 08, 2021

Celine Beji, Florian Yger, Jamal Atif

Figure 1 for Non parametric estimation of causal populations in a counterfactual scenario

Figure 2 for Non parametric estimation of causal populations in a counterfactual scenario

Figure 3 for Non parametric estimation of causal populations in a counterfactual scenario

Abstract:In causality, estimating the effect of a treatment without confounding inference remains a major issue because requires to assess the outcome in both case with and without treatment. Not being able to observe simultaneously both of them, the estimation of potential outcome remains a challenging task. We propose an innovative approach where the problem is reformulated as a missing data model. The aim is to estimate the hidden distribution of \emph{causal populations}, defined as a function of treatment and outcome. A Causal Auto-Encoder (CAE), enhanced by a prior dependent on treatment and outcome information, assimilates the latent space to the probability distribution of the target populations. The features are reconstructed after being reduced to a latent space and constrained by a mask introduced in the intermediate layer of the network, containing treatment and outcome information.

Via

Access Paper or Ask Questions

Truth-tracking via Approval Voting: Size Matters

Dec 07, 2021

Tahar Allouche, Jérôme Lang, Florian Yger

Figure 1 for Truth-tracking via Approval Voting: Size Matters

Figure 2 for Truth-tracking via Approval Voting: Size Matters

Abstract:Epistemic social choice aims at unveiling a hidden ground truth given votes, which are interpreted as noisy signals about it. We consider here a simple setting where votes consist of approval ballots: each voter approves a set of alternatives which they believe can possibly be the ground truth. Based on the intuitive idea that more reliable votes contain fewer alternatives, we define several noise models that are approval voting variants of the Mallows model. The likelihood-maximizing alternative is then characterized as the winner of a weighted approval rule, where the weight of a ballot decreases with its cardinality. We have conducted an experiment on three image annotation datasets; they conclude that rules based on our noise model outperform standard approval voting; the best performance is obtained by a variant of the Condorcet noise model.

* Accepted in the 36th AAAI Conference on Artificial Intelligence (AAAI 2022)

Via

Access Paper or Ask Questions