Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tanguy Marchand

SRATTA : Sample Re-ATTribution Attack of Secure Aggregation in Federated Learning

Jun 13, 2023

Tanguy Marchand, Régis Loeb, Ulysse Marteau-Ferey, Jean Ogier du Terrail, Arthur Pignet

Abstract:We consider a cross-silo federated learning (FL) setting where a machine learning model with a fully connected first layer is trained between different clients and a central server using FedAvg, and where the aggregation step can be performed with secure aggregation (SA). We present SRATTA an attack relying only on aggregated models which, under realistic assumptions, (i) recovers data samples from the different clients, and (ii) groups data samples coming from the same client together. While sample recovery has already been explored in an FL setting, the ability to group samples per client, despite the use of SA, is novel. This poses a significant unforeseen security threat to FL and effectively breaks SA. We show that SRATTA is both theoretically grounded and can be used in practice on realistic models and datasets. We also propose counter-measures, and claim that clients should play an active role to guarantee their privacy during training.

* Accepted to ICML2023

Via

Access Paper or Ask Questions

FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings

Oct 10, 2022

Jean Ogier du Terrail, Samy-Safwan Ayed, Edwige Cyffers, Felix Grimberg, Chaoyang He, Regis Loeb, Paul Mangold, Tanguy Marchand, Othmane Marfoq, Erum Mushtaq(+14 more)

Figure 1 for FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings

Figure 2 for FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings

Figure 3 for FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings

Figure 4 for FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings

Abstract:Federated Learning (FL) is a novel approach enabling several clients holding sensitive data to collaboratively train machine learning models, without centralizing data. The cross-silo FL setting corresponds to the case of few ($2$--$50$) reliable clients, each holding medium to large datasets, and is typically found in applications such as healthcare, finance, or industry. While previous works have proposed representative datasets for cross-device FL, few realistic healthcare cross-silo FL datasets exist, thereby slowing algorithmic research in this critical application. In this work, we propose a novel cross-silo dataset suite focused on healthcare, FLamby (Federated Learning AMple Benchmark of Your cross-silo strategies), to bridge the gap between theory and practice of cross-silo FL. FLamby encompasses 7 healthcare datasets with natural splits, covering multiple tasks, modalities, and data volumes, each accompanied with baseline training code. As an illustration, we additionally benchmark standard FL algorithms on all datasets. Our flexible and modular suite allows researchers to easily download datasets, reproduce results and re-use the different components for their research. FLamby is available at~\url{www.github.com/owkin/flamby}.

* Accepted to NeurIPS, Datasets and Benchmarks Track

Via

Access Paper or Ask Questions

SecureFedYJ: a safe feature Gaussianization protocol for Federated Learning

Oct 04, 2022

Tanguy Marchand, Boris Muzellec, Constance Beguier, Jean Ogier du Terrail, Mathieu Andreux

Figure 1 for SecureFedYJ: a safe feature Gaussianization protocol for Federated Learning

Figure 2 for SecureFedYJ: a safe feature Gaussianization protocol for Federated Learning

Figure 3 for SecureFedYJ: a safe feature Gaussianization protocol for Federated Learning

Figure 4 for SecureFedYJ: a safe feature Gaussianization protocol for Federated Learning

Abstract:The Yeo-Johnson (YJ) transformation is a standard parametrized per-feature unidimensional transformation often used to Gaussianize features in machine learning. In this paper, we investigate the problem of applying the YJ transformation in a cross-silo Federated Learning setting under privacy constraints. For the first time, we prove that the YJ negative log-likelihood is in fact convex, which allows us to optimize it with exponential search. We numerically show that the resulting algorithm is more stable than the state-of-the-art approach based on the Brent minimization method. Building on this simple algorithm and Secure Multiparty Computation routines, we propose SecureFedYJ, a federated algorithm that performs a pooled-equivalent YJ transformation without leaking more information than the final fitted parameters do. Quantitative experiments on real data demonstrate that, in addition to being secure, our approach reliably normalizes features across silos as well as if data were pooled, making it a viable approach for safe federated feature Gaussianization.

* Accepted to Neurips2022

Via

Access Paper or Ask Questions

Wavelet Conditional Renormalization Group

Jul 11, 2022

Tanguy Marchand, Misaki Ozawa, Giulio Biroli, Stéphane Mallat

Figure 1 for Wavelet Conditional Renormalization Group

Figure 2 for Wavelet Conditional Renormalization Group

Figure 3 for Wavelet Conditional Renormalization Group

Figure 4 for Wavelet Conditional Renormalization Group

Abstract:We develop a multiscale approach to estimate high-dimensional probability distributions from a dataset of physical fields or configurations observed in experiments or simulations. In this way we can estimate energy functions (or Hamiltonians) and efficiently generate new samples of many-body systems in various domains, from statistical physics to cosmology. Our method -- the Wavelet Conditional Renormalization Group (WC-RG) -- proceeds scale by scale, estimating models for the conditional probabilities of "fast degrees of freedom" conditioned by coarse-grained fields. These probability distributions are modeled by energy functions associated with scale interactions, and are represented in an orthogonal wavelet basis. WC-RG decomposes the microscopic energy function as a sum of interaction energies at all scales and can efficiently generate new samples by going from coarse to fine scales. Near phase transitions, it avoids the "critical slowing down" of direct estimation and sampling algorithms. This is explained theoretically by combining results from RG and wavelet theories, and verified numerically for the Gaussian and $\varphi^4$ field theories. We show that multiscale WC-RG energy-based models are more general than local potential models and can capture the physics of complex many-body interacting systems at all length scales. This is demonstrated for weak-gravitational-lensing fields reflecting dark matter distributions in cosmology, which include long-range interactions with long-tail probability distributions. WC-RG has a large number of potential applications in non-equilibrium systems, where the underlying distribution is not known {\it a priori}. Finally, we discuss the connection between WC-RG and deep network architectures.

* 36 pages, 21 figures

Via

Access Paper or Ask Questions

Self supervised learning improves dMMR/MSI detection from histology slides across multiple cancers

Sep 13, 2021

Charlie Saillard, Olivier Dehaene, Tanguy Marchand, Olivier Moindrot, Aurélie Kamoun, Benoit Schmauch, Simon Jegou

Figure 1 for Self supervised learning improves dMMR/MSI detection from histology slides across multiple cancers

Figure 2 for Self supervised learning improves dMMR/MSI detection from histology slides across multiple cancers

Figure 3 for Self supervised learning improves dMMR/MSI detection from histology slides across multiple cancers

Figure 4 for Self supervised learning improves dMMR/MSI detection from histology slides across multiple cancers

Abstract:Microsatellite instability (MSI) is a tumor phenotype whose diagnosis largely impacts patient care in colorectal cancers (CRC), and is associated with response to immunotherapy in all solid tumors. Deep learning models detecting MSI tumors directly from H&E stained slides have shown promise in improving diagnosis of MSI patients. Prior deep learning models for MSI detection have relied on neural networks pretrained on ImageNet dataset, which does not contain any medical image. In this study, we leverage recent advances in self-supervised learning by training neural networks on histology images from the TCGA dataset using MoCo V2. We show that these networks consistently outperform their counterparts pretrained using ImageNet and obtain state-of-the-art results for MSI detection with AUCs of 0.92 and 0.83 for CRC and gastric tumors, respectively. These models generalize well on an external CRC cohort (0.97 AUC on PAIP) and improve transfer from one organ to another. Finally we show that predictive image regions exhibit meaningful histological patterns, and that the use of MoCo features highlighted more relevant patterns according to an expert pathologist.

* Accepted for poster and oral presentation at the MICCAI 2021 COMPAY Workshop (submitted the 19th of July 2021)

Via

Access Paper or Ask Questions