Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Matthieu Boussard

Privacy Amplification Through Synthetic Data: Insights from Linear Regression

Jun 05, 2025

Clément Pierquin, Aurélien Bellet, Marc Tommasi, Matthieu Boussard

Abstract:Synthetic data inherits the differential privacy guarantees of the model used to generate it. Additionally, synthetic data may benefit from privacy amplification when the generative model is kept hidden. While empirical studies suggest this phenomenon, a rigorous theoretical understanding is still lacking. In this paper, we investigate this question through the well-understood framework of linear regression. First, we establish negative results showing that if an adversary controls the seed of the generative model, a single synthetic data point can leak as much information as releasing the model itself. Conversely, we show that when synthetic data is generated from random inputs, releasing a limited number of synthetic data points amplifies privacy beyond the model's inherent guarantees. We believe our findings in linear regression can serve as a foundation for deriving more general bounds in the future.

* 26 pages, ICML 2025

Via

Access Paper or Ask Questions

The Impact of LoRA on the Emergence of Clusters in Transformers

Feb 23, 2024

Hugo Koubbi, Matthieu Boussard, Louis Hernandez

Abstract:In this paper, we employ the mathematical framework on Transformers developed by \citet{sander2022sinkformers,geshkovski2023emergence,geshkovski2023mathematical} to explore how variations in attention parameters and initial token values impact the structural dynamics of token clusters. Our analysis demonstrates that while the clusters within a modified attention matrix dynamics can exhibit significant divergence from the original over extended periods, they maintain close similarities over shorter intervals, depending on the parameter differences. This work contributes to the fine-tuning field through practical applications to the LoRA algorithm \cite{hu2021lora,peft}, enhancing our understanding of the behavior of LoRA-enhanced Transformer models.

Via

Access Paper or Ask Questions

Rényi Pufferfish Privacy: General Additive Noise Mechanisms and Privacy Amplification by Iteration

Dec 21, 2023

Clément Pierquin, Aurélien Bellet, Marc Tommasi, Matthieu Boussard

Figure 1 for Rényi Pufferfish Privacy: General Additive Noise Mechanisms and Privacy Amplification by Iteration

Figure 2 for Rényi Pufferfish Privacy: General Additive Noise Mechanisms and Privacy Amplification by Iteration

Figure 3 for Rényi Pufferfish Privacy: General Additive Noise Mechanisms and Privacy Amplification by Iteration

Figure 4 for Rényi Pufferfish Privacy: General Additive Noise Mechanisms and Privacy Amplification by Iteration

Abstract:Pufferfish privacy is a flexible generalization of differential privacy that allows to model arbitrary secrets and adversary's prior knowledge about the data. Unfortunately, designing general and tractable Pufferfish mechanisms that do not compromise utility is challenging. Furthermore, this framework does not provide the composition guarantees needed for a direct use in iterative machine learning algorithms. To mitigate these issues, we introduce a R\'enyi divergence-based variant of Pufferfish and show that it allows us to extend the applicability of the Pufferfish framework. We first generalize the Wasserstein mechanism to cover a wide range of noise distributions and introduce several ways to improve its utility. We also derive stronger guarantees against out-of-distribution adversaries. Finally, as an alternative to composition, we prove privacy amplification results for contractive noisy iterations and showcase the first use of Pufferfish in private convex optimization. A common ingredient underlying our results is the use and extension of shift reduction lemmas.

Via

Access Paper or Ask Questions

Information gain ratio correction: Improving prediction with more balanced decision tree splits

Jan 25, 2018

Antonin Leroux, Matthieu Boussard, Remi Dès

Figure 1 for Information gain ratio correction: Improving prediction with more balanced decision tree splits

Figure 2 for Information gain ratio correction: Improving prediction with more balanced decision tree splits

Abstract:Decision trees algorithms use a gain function to select the best split during the tree's induction. This function is crucial to obtain trees with high predictive accuracy. Some gain functions can suffer from a bias when it compares splits of different arities. Quinlan proposed a gain ratio in C4.5's information gain function to fix this bias. In this paper, we present an updated version of the gain ratio that performs better as it tries to fix the gain ratio's bias for unbalanced trees and some splits with low predictive interest.

* 7 pages

Via

Access Paper or Ask Questions