Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jan Ramon

CRIStAL, MAGNET

Dropout-Robust Mechanisms for Differentially Private and Fully Decentralized Mean Estimation

Jun 04, 2025

César Sabater, Sonia Ben Mokhtar, Jan Ramon

Abstract:Achieving differentially private computations in decentralized settings poses significant challenges, particularly regarding accuracy, communication cost, and robustness against information leakage. While cryptographic solutions offer promise, they often suffer from high communication overhead or require centralization in the presence of network failures. Conversely, existing fully decentralized approaches typically rely on relaxed adversarial models or pairwise noise cancellation, the latter suffering from substantial accuracy degradation if parties unexpectedly disconnect. In this work, we propose IncA, a new protocol for fully decentralized mean estimation, a widely used primitive in data-intensive processing. Our protocol, which enforces differential privacy, requires no central orchestration and employs low-variance correlated noise, achieved by incrementally injecting sensitive information into the computation. First, we theoretically demonstrate that, when no parties permanently disconnect, our protocol achieves accuracy comparable to that of a centralized setting-already an improvement over most existing decentralized differentially private techniques. Second, we empirically show that our use of low-variance correlated noise significantly mitigates the accuracy loss experienced by existing techniques in the presence of dropouts.

* 23 pages, 4 figures

Via

Access Paper or Ask Questions

SoK: Verifiable Cross-Silo FL

Oct 11, 2024

Aleksei Korneev, Jan Ramon

Figure 1 for SoK: Verifiable Cross-Silo FL

Figure 2 for SoK: Verifiable Cross-Silo FL

Figure 3 for SoK: Verifiable Cross-Silo FL

Figure 4 for SoK: Verifiable Cross-Silo FL

Abstract:Federated Learning (FL) is a widespread approach that allows training machine learning (ML) models with data distributed across multiple devices. In cross-silo FL, which often appears in domains like healthcare or finance, the number of participants is moderate, and each party typically represents a well-known organization. For instance, in medicine data owners are often hospitals or data hubs which are well-established entities. However, malicious parties may still attempt to disturb the training procedure in order to obtain certain benefits, for example, a biased result or a reduction in computational load. While one can easily detect a malicious agent when data used for training is public, the problem becomes much more acute when it is necessary to maintain the privacy of the training dataset. To address this issue, there is recently growing interest in developing verifiable protocols, where one can check that parties do not deviate from the training procedure and perform computations correctly. In this paper, we present a systematization of knowledge on verifiable cross-silo FL. We analyze various protocols, fit them in a taxonomy, and compare their efficiency and threat models. We also analyze Zero-Knowledge Proof (ZKP) schemes and discuss how their overall cost in a FL context can be minimized. Lastly, we identify research gaps and discuss potential directions for future scientific work.

Via

Access Paper or Ask Questions

DP-SGD with weight clipping

Oct 27, 2023

Antoine Barczewski, Jan Ramon

Abstract:Recently, due to the popularity of deep neural networks and other methods whose training typically relies on the optimization of an objective function, and due to concerns for data privacy, there is a lot of interest in differentially private gradient descent methods. To achieve differential privacy guarantees with a minimum amount of noise, it is important to be able to bound precisely the sensitivity of the information which the participants will observe. In this study, we present a novel approach that mitigates the bias arising from traditional gradient clipping. By leveraging public information concerning the current global model and its location within the search domain, we can achieve improved gradient bounds, leading to enhanced sensitivity determinations and refined noise level adjustments. We extend the state of the art algorithms, present improved differential privacy guarantees requiring less noise and present an empirical evaluation.

Via

Access Paper or Ask Questions

Distributed Differentially Private Averaging with Improved Utility and Robustness to Malicious Parties

Jun 12, 2020

César Sabater, Aurélien Bellet, Jan Ramon

Figure 1 for Distributed Differentially Private Averaging with Improved Utility and Robustness to Malicious Parties

Figure 2 for Distributed Differentially Private Averaging with Improved Utility and Robustness to Malicious Parties

Abstract:Learning from data owned by several parties, as in federated learning, raises challenges regarding the privacy guarantees provided to participants and the correctness of the computation in the presence of malicious parties. We tackle these challenges in the context of distributed averaging, an essential building block of distributed and federated learning. Our first contribution is a novel distributed differentially private protocol which naturally scales with the number of parties. The key idea underlying our protocol is to exchange correlated Gaussian noise along the edges of a network graph, complemented by independent noise added by each party. We analyze the differential privacy guarantees of our protocol and the impact of the graph topology, showing that we can match the accuracy of the trusted curator model even when each party communicates with only a logarithmic number of other parties chosen at random. This is in contrast with protocols in the local model of privacy (with lower accuracy) or based on secure aggregation (where all pairs of users need to exchange messages). Our second contribution is to enable users to prove the correctness of their computations without compromising the efficiency and privacy guarantees of the protocol. Our construction relies on standard cryptographic primitives like commitment schemes and zero knowledge proofs.

* 39 pages

Via

Access Paper or Ask Questions

Hiding in the Crowd: A Massively Distributed Algorithm for Private Averaging with Malicious Adversaries

Mar 27, 2018

Pierre Dellenbach, Aurélien Bellet, Jan Ramon

Figure 1 for Hiding in the Crowd: A Massively Distributed Algorithm for Private Averaging with Malicious Adversaries

Figure 2 for Hiding in the Crowd: A Massively Distributed Algorithm for Private Averaging with Malicious Adversaries

Abstract:The amount of personal data collected in our everyday interactions with connected devices offers great opportunities for innovative services fueled by machine learning, as well as raises serious concerns for the privacy of individuals. In this paper, we propose a massively distributed protocol for a large set of users to privately compute averages over their joint data, which can then be used to learn predictive models. Our protocol can find a solution of arbitrary accuracy, does not rely on a third party and preserves the privacy of users throughout the execution in both the honest-but-curious and malicious adversary models. Specifically, we prove that the information observed by the adversary (the set of maliciours users) does not significantly reduce the uncertainty in its prediction of private values compared to its prior belief. The level of privacy protection depends on a quantity related to the Laplacian matrix of the network graph and generally improves with the size of the graph. Furthermore, we design a verification procedure which offers protection against malicious users joining the service with the goal of manipulating the outcome of the algorithm.

* 17 pages

Via

Access Paper or Ask Questions

Learning from networked examples

Jun 03, 2017

Yuyi Wang, Jan Ramon, Zheng-Chu Guo

Figure 1 for Learning from networked examples

Figure 2 for Learning from networked examples

Figure 3 for Learning from networked examples

Figure 4 for Learning from networked examples

Abstract:Many machine learning algorithms are based on the assumption that training examples are drawn independently. However, this assumption does not hold anymore when learning from a networked sample because two or more training examples may share some common objects, and hence share the features of these shared objects. We show that the classic approach of ignoring this problem potentially can have a harmful effect on the accuracy of statistics, and then consider alternatives. One of these is to only use independent examples, discarding other information. However, this is clearly suboptimal. We analyze sample error bounds in this networked setting, providing significantly improved results. An important component of our approach is formed by efficient sample weighting schemes, which leads to novel concentration inequalities.

Via

Access Paper or Ask Questions

Learning from networked examples in a k-partite graph

Feb 18, 2017

Yuyi Wang, Jan Ramon, Zheng-Chu Guo

Figure 1 for Learning from networked examples in a k-partite graph

Abstract:Many machine learning algorithms are based on the assumption that training examples are drawn independently. However, this assumption does not hold anymore when learning from a networked sample where two or more training examples may share common features. We propose an efficient weighting method for learning from networked examples and show the sample error bound which is better than previous work.

* a special case

Via

Access Paper or Ask Questions

Predicate Logic as a Modeling Language: Modeling and Solving some Machine Learning and Data Mining Problems with IDP3

Mar 28, 2014

Maurice Bruynooghe, Hendrik Blockeel, Bart Bogaerts, Broes De Cat, Stef De Pooter, Joachim Jansen, Anthony Labarre, Jan Ramon, Marc Denecker, Sicco Verwer

Figure 1 for Predicate Logic as a Modeling Language: Modeling and Solving some Machine Learning and Data Mining Problems with IDP3

Figure 2 for Predicate Logic as a Modeling Language: Modeling and Solving some Machine Learning and Data Mining Problems with IDP3

Figure 3 for Predicate Logic as a Modeling Language: Modeling and Solving some Machine Learning and Data Mining Problems with IDP3

Figure 4 for Predicate Logic as a Modeling Language: Modeling and Solving some Machine Learning and Data Mining Problems with IDP3

Abstract:This paper provides a gentle introduction to problem solving with the IDP3 system. The core of IDP3 is a finite model generator that supports first order logic enriched with types, inductive definitions, aggregates and partial functions. It offers its users a modeling language that is a slight extension of predicate logic and allows them to solve a wide range of search problems. Apart from a small introductory example, applications are selected from problems that arose within machine learning and data mining research. These research areas have recently shown a strong interest in declarative modeling and constraint solving as opposed to algorithmic approaches. The paper illustrates that the IDP3 system can be a valuable tool for researchers with such an interest. The first problem is in the domain of stemmatology, a domain of philology concerned with the relationship between surviving variant versions of text. The second problem is about a somewhat related problem within biology where phylogenetic trees are used to represent the evolution of species. The third and final problem concerns the classical problem of learning a minimal automaton consistent with a given set of strings. For this last problem, we show that the performance of our solution comes very close to that of a state-of-the art solution. For each of these applications, we analyze the problem, illustrate the development of a logic-based model and explore how alternatives can affect the performance.

* Theory and Practice of Logic Programming 15 (2014) 783-817
* To appear in Theory and Practice of Logic Programming (TPLP)

Via

Access Paper or Ask Questions

Top-down induction of clustering trees

Nov 21, 2000

Hendrik Blockeel, Luc De Raedt, Jan Ramon

Figure 1 for Top-down induction of clustering trees

Figure 2 for Top-down induction of clustering trees

Figure 3 for Top-down induction of clustering trees

Figure 4 for Top-down induction of clustering trees

Abstract:An approach to clustering is presented that adapts the basic top-down induction of decision trees method towards clustering. To this aim, it employs the principles of instance based learning. The resulting methodology is implemented in the TIC (Top down Induction of Clustering trees) system for first order clustering. The TIC system employs the first order logical decision tree representation of the inductive logic programming system Tilde. Various experiments with TIC are presented, in both propositional and relational domains.

* Machine Learning, Proceedings of the 15th International Conference (J. Shavlik, ed.), Morgan Kaufmann, 1998, pp. 55-63
* 9 pages, 3 figures

Via

Access Paper or Ask Questions