Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sébastien Gambs

UQAM

On the Usage of Gaussian Process for Efficient Data Valuation

Jun 04, 2025

Clément Bénesse, Patrick Mesana, Athénaïs Gautier, Sébastien Gambs

Abstract:In machine learning, knowing the impact of a given datum on model training is a fundamental task referred to as Data Valuation. Building on previous works from the literature, we have designed a novel canonical decomposition allowing practitioners to analyze any data valuation method as the combination of two parts: a utility function that captures characteristics from a given model and an aggregation procedure that merges such information. We also propose to use Gaussian Processes as a means to easily access the utility function on ``sub-models'', which are models trained on a subset of the training set. The strength of our approach stems from both its theoretical grounding in Bayesian theory, and its practical reach, by enabling fast estimation of valuations thanks to efficient update formulae.

Via

Access Paper or Ask Questions

P2NIA: Privacy-Preserving Non-Iterative Auditing

Apr 01, 2025

Jade Garcia Bourrée, Hadrien Lautraite, Sébastien Gambs, Gilles Tredan, Erwan Le Merrer, Benoît Rottembourg

Figure 1 for P2NIA: Privacy-Preserving Non-Iterative Auditing

Figure 2 for P2NIA: Privacy-Preserving Non-Iterative Auditing

Figure 3 for P2NIA: Privacy-Preserving Non-Iterative Auditing

Figure 4 for P2NIA: Privacy-Preserving Non-Iterative Auditing

Abstract:The emergence of AI legislation has increased the need to assess the ethical compliance of high-risk AI systems. Traditional auditing methods rely on platforms' application programming interfaces (APIs), where responses to queries are examined through the lens of fairness requirements. However, such approaches put a significant burden on platforms, as they are forced to maintain APIs while ensuring privacy, facing the possibility of data leaks. This lack of proper collaboration between the two parties, in turn, causes a significant challenge to the auditor, who is subject to estimation bias as they are unaware of the data distribution of the platform. To address these two issues, we present P2NIA, a novel auditing scheme that proposes a mutually beneficial collaboration for both the auditor and the platform. Extensive experiments demonstrate P2NIA's effectiveness in addressing both issues. In summary, our work introduces a privacy-preserving and non-iterative audit scheme that enhances fairness assessments using synthetic or local data, avoiding the challenges associated with traditional API-based audits.

* 19 pages, 8 figures

Via

Access Paper or Ask Questions

Training Set Reconstruction from Differentially Private Forests: How Effective is DP?

Feb 07, 2025

Alice Gorgé, Julien Ferry, Sébastien Gambs, Thibaut Vidal

Figure 1 for Training Set Reconstruction from Differentially Private Forests: How Effective is DP?

Figure 2 for Training Set Reconstruction from Differentially Private Forests: How Effective is DP?

Figure 3 for Training Set Reconstruction from Differentially Private Forests: How Effective is DP?

Figure 4 for Training Set Reconstruction from Differentially Private Forests: How Effective is DP?

Abstract:Recent research has shown that machine learning models are vulnerable to privacy attacks targeting their training data. Differential privacy (DP) has become a widely adopted countermeasure, as it offers rigorous privacy protections. In this paper, we introduce a reconstruction attack targeting state-of-the-art $\varepsilon$-DP random forests. By leveraging a constraint programming model that incorporates knowledge of the forest's structure and DP mechanism characteristics, our approach formally reconstructs the most likely dataset that could have produced a given forest. Through extensive computational experiments, we examine the interplay between model utility, privacy guarantees, and reconstruction accuracy across various configurations. Our results reveal that random forests trained with meaningful DP guarantees can still leak substantial portions of their training data. Specifically, while DP reduces the success of reconstruction attacks, the only forests fully robust to our attack exhibit predictive performance no better than a constant classifier. Building on these insights, we provide practical recommendations for the construction of DP random forests that are more resilient to reconstruction attacks and maintain non-trivial predictive performance.

Via

Access Paper or Ask Questions

WaKA: Data Attribution using K-Nearest Neighbors and Membership Privacy Principles

Nov 02, 2024

Patrick Mesana, Clément Bénesse, Hadrien Lautraite, Gilles Caporossi, Sébastien Gambs

Figure 1 for WaKA: Data Attribution using K-Nearest Neighbors and Membership Privacy Principles

Figure 2 for WaKA: Data Attribution using K-Nearest Neighbors and Membership Privacy Principles

Figure 3 for WaKA: Data Attribution using K-Nearest Neighbors and Membership Privacy Principles

Figure 4 for WaKA: Data Attribution using K-Nearest Neighbors and Membership Privacy Principles

Abstract:In this paper, we introduce WaKA (Wasserstein K-nearest neighbors Attribution), a novel attribution method that leverages principles from the LiRA (Likelihood Ratio Attack) framework and applies them to $ k $-nearest neighbors classifiers ($ k $-NN). WaKA efficiently measures the contribution of individual data points to the model's loss distribution, analyzing every possible $ k $-NN that can be constructed using the training set, without requiring sampling or shadow model training. WaKA can be used \emph{a posteriori} as a membership inference attack (MIA) to assess privacy risks, and \emph{a priori} for data minimization and privacy influence measurement. Thus, WaKA can be seen as bridging the gap between data attribution and membership inference attack (MIA) literature by distinguishing between the value of a data point and its privacy risk. For instance, we show that self-attribution values are more strongly correlated with the attack success rate than the contribution of a point to model generalization. WaKA's different usages were also evaluated across diverse real-world datasets, demonstrating performance very close to LiRA when used as an MIA on $ k $-NN classifiers, but with greater computational efficiency.

Via

Access Paper or Ask Questions

Smooth Sensitivity for Learning Differentially-Private yet Accurate Rule Lists

Mar 18, 2024

Timothée Ly, Julien Ferry, Marie-José Huguet, Sébastien Gambs, Ulrich Aivodji

Figure 1 for Smooth Sensitivity for Learning Differentially-Private yet Accurate Rule Lists

Figure 2 for Smooth Sensitivity for Learning Differentially-Private yet Accurate Rule Lists

Figure 3 for Smooth Sensitivity for Learning Differentially-Private yet Accurate Rule Lists

Figure 4 for Smooth Sensitivity for Learning Differentially-Private yet Accurate Rule Lists

Abstract:Differentially-private (DP) mechanisms can be embedded into the design of a machine learningalgorithm to protect the resulting model against privacy leakage, although this often comes with asignificant loss of accuracy. In this paper, we aim at improving this trade-off for rule lists modelsby establishing the smooth sensitivity of the Gini impurity and leveraging it to propose a DP greedyrule list algorithm. In particular, our theoretical analysis and experimental results demonstrate thatthe DP rule lists models integrating smooth sensitivity have higher accuracy that those using otherDP frameworks based on global sensitivity.

Via

Access Paper or Ask Questions

PANORAMIA: Privacy Auditing of Machine Learning Models without Retraining

Feb 12, 2024

Mishaal Kazmi, Hadrien Lautraite, Alireza Akbari, Mauricio Soroco, Qiaoyue Tang, Tao Wang, Sébastien Gambs, Mathias Lécuyer

Figure 1 for PANORAMIA: Privacy Auditing of Machine Learning Models without Retraining

Figure 2 for PANORAMIA: Privacy Auditing of Machine Learning Models without Retraining

Figure 3 for PANORAMIA: Privacy Auditing of Machine Learning Models without Retraining

Figure 4 for PANORAMIA: Privacy Auditing of Machine Learning Models without Retraining

Abstract:We introduce a privacy auditing scheme for ML models that relies on membership inference attacks using generated data as "non-members". This scheme, which we call PANORAMIA, quantifies the privacy leakage for large-scale ML models without control of the training process or model re-training and only requires access to a subset of the training data. To demonstrate its applicability, we evaluate our auditing scheme across multiple ML domains, ranging from image and tabular data classification to large-scale language models.

* 19 pages

Via

Access Paper or Ask Questions

SoK: Taming the Triangle -- On the Interplays between Fairness, Interpretability and Privacy in Machine Learning

Dec 22, 2023

Julien Ferry, Ulrich Aïvodji, Sébastien Gambs, Marie-José Huguet, Mohamed Siala

Figure 1 for SoK: Taming the Triangle -- On the Interplays between Fairness, Interpretability and Privacy in Machine Learning

Figure 2 for SoK: Taming the Triangle -- On the Interplays between Fairness, Interpretability and Privacy in Machine Learning

Abstract:Machine learning techniques are increasingly used for high-stakes decision-making, such as college admissions, loan attribution or recidivism prediction. Thus, it is crucial to ensure that the models learnt can be audited or understood by human users, do not create or reproduce discrimination or bias, and do not leak sensitive information regarding their training data. Indeed, interpretability, fairness and privacy are key requirements for the development of responsible machine learning, and all three have been studied extensively during the last decade. However, they were mainly considered in isolation, while in practice they interplay with each other, either positively or negatively. In this Systematization of Knowledge (SoK) paper, we survey the literature on the interactions between these three desiderata. More precisely, for each pairwise interaction, we summarize the identified synergies and tensions. These findings highlight several fundamental theoretical and empirical conflicts, while also demonstrating that jointly considering these different requirements is challenging when one aims at preserving a high level of utility. To solve this issue, we also discuss possible conciliation mechanisms, showing that a careful design can enable to successfully handle these different concerns in practice.

Via

Access Paper or Ask Questions

Crypto'Graph: Leveraging Privacy-Preserving Distributed Link Prediction for Robust Graph Learning

Sep 19, 2023

Sofiane Azogagh, Zelma Aubin Birba, Sébastien Gambs, Marc-Olivier Killijian

Abstract:Graphs are a widely used data structure for collecting and analyzing relational data. However, when the graph structure is distributed across several parties, its analysis is particularly challenging. In particular, due to the sensitivity of the data each party might want to keep their partial knowledge of the graph private, while still willing to collaborate with the other parties for tasks of mutual benefit, such as data curation or the removal of poisoned data. To address this challenge, we propose Crypto'Graph, an efficient protocol for privacy-preserving link prediction on distributed graphs. More precisely, it allows parties partially sharing a graph with distributed links to infer the likelihood of formation of new links in the future. Through the use of cryptographic primitives, Crypto'Graph is able to compute the likelihood of these new links on the joint network without revealing the structure of the private individual graph of each party, even though they know the number of nodes they have, since they share the same graph but not the same links. Crypto'Graph improves on previous works by enabling the computation of a certain number of similarity metrics without any additional cost. The use of Crypto'Graph is illustrated for defense against graph poisoning attacks, in which it is possible to identify potential adversarial links without compromising the privacy of the graphs of individual parties. The effectiveness of Crypto'Graph in mitigating graph poisoning attacks and achieving high prediction accuracy on a graph neural network node classification task is demonstrated through extensive experimentation on a real-world dataset.

Via

Access Paper or Ask Questions

Probabilistic Dataset Reconstruction from Interpretable Models

Aug 29, 2023

Julien Ferry, Ulrich Aïvodji, Sébastien Gambs, Marie-José Huguet, Mohamed Siala

Figure 1 for Probabilistic Dataset Reconstruction from Interpretable Models

Figure 2 for Probabilistic Dataset Reconstruction from Interpretable Models

Figure 3 for Probabilistic Dataset Reconstruction from Interpretable Models

Figure 4 for Probabilistic Dataset Reconstruction from Interpretable Models

Abstract:Interpretability is often pointed out as a key requirement for trustworthy machine learning. However, learning and releasing models that are inherently interpretable leaks information regarding the underlying training data. As such disclosure may directly conflict with privacy, a precise quantification of the privacy impact of such breach is a fundamental problem. For instance, previous work have shown that the structure of a decision tree can be leveraged to build a probabilistic reconstruction of its training dataset, with the uncertainty of the reconstruction being a relevant metric for the information leak. In this paper, we propose of a novel framework generalizing these probabilistic reconstructions in the sense that it can handle other forms of interpretable models and more generic types of knowledge. In addition, we demonstrate that under realistic assumptions regarding the interpretable models' structure, the uncertainty of the reconstruction can be computed efficiently. Finally, we illustrate the applicability of our approach on both decision trees and rule lists, by comparing the theoretical information leak associated to either exact or heuristic learning algorithms. Our results suggest that optimal interpretable models are often more compact and leak less information regarding their training data than greedily-built ones, for a given accuracy level.

Via

Access Paper or Ask Questions

Membership Inference Attack for Beluga Whales Discrimination

Feb 28, 2023

Voncarlos Marcelo Araújo, Sébastien Gambs, Clément Chion, Robert Michaud, Léo Schneider, Hadrien Lautraite

Abstract:To efficiently monitor the growth and evolution of a particular wildlife population, one of the main fundamental challenges to address in animal ecology is the re-identification of individuals that have been previously encountered but also the discrimination between known and unknown individuals (the so-called "open-set problem"), which is the first step to realize before re-identification. In particular, in this work, we are interested in the discrimination within digital photos of beluga whales, which are known to be among the most challenging marine species to discriminate due to their lack of distinctive features. To tackle this problem, we propose a novel approach based on the use of Membership Inference Attacks (MIAs), which are normally used to assess the privacy risks associated with releasing a particular machine learning model. More precisely, we demonstrate that the problem of discriminating between known and unknown individuals can be solved efficiently using state-of-the-art approaches for MIAs. Extensive experiments on three benchmark datasets related to whales, two different neural network architectures, and three MIA clearly demonstrate the performance of the approach. In addition, we have also designed a novel MIA strategy that we coined as ensemble MIA, which combines the outputs of different MIAs to increase the attack accuracy while diminishing the false positive rate. Overall, one of our main objectives is also to show that the research on privacy attacks can also be leveraged "for good" by helping to address practical challenges encountered in animal ecology.

* 15 pages

Via

Access Paper or Ask Questions