Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kushal Chakrabarti

On Model Protection in Federated Learning against Eavesdropping Attacks

Apr 02, 2025

Dipankar Maity, Kushal Chakrabarti

Abstract:In this study, we investigate the protection offered by federated learning algorithms against eavesdropping adversaries. In our model, the adversary is capable of intercepting model updates transmitted from clients to the server, enabling it to create its own estimate of the model. Unlike previous research, which predominantly focuses on safeguarding client data, our work shifts attention protecting the client model itself. Through a theoretical analysis, we examine how various factors, such as the probability of client selection, the structure of local objective functions, global aggregation at the server, and the eavesdropper's capabilities, impact the overall level of protection. We further validate our findings through numerical experiments, assessing the protection by evaluating the model accuracy achieved by the adversary. Finally, we compare our results with methods based on differential privacy, underscoring their limitations in this specific context.

Via

Access Paper or Ask Questions

Distributed Optimization via Energy Conservation Laws in Dilated Coordinates

Sep 28, 2024

Mayank Baranwal, Kushal Chakrabarti

Figure 1 for Distributed Optimization via Energy Conservation Laws in Dilated Coordinates

Figure 2 for Distributed Optimization via Energy Conservation Laws in Dilated Coordinates

Abstract:Optimizing problems in a distributed manner is critical for systems involving multiple agents with private data. Despite substantial interest, a unified method for analyzing the convergence rates of distributed optimization algorithms is lacking. This paper introduces an energy conservation approach for analyzing continuous-time dynamical systems in dilated coordinates. Instead of directly analyzing dynamics in the original coordinate system, we establish a conserved quantity, akin to physical energy, in the dilated coordinate system. Consequently, convergence rates can be explicitly expressed in terms of the inverse time-dilation factor. Leveraging this generalized approach, we formulate a novel second-order distributed accelerated gradient flow with a convergence rate of $O\left(1/t^{2-\epsilon}\right)$ in time $t$ for $\epsilon>0$. We then employ a semi second-order symplectic Euler discretization to derive a rate-matching algorithm with a convergence rate of $O\left(1/k^{2-\epsilon}\right)$ in $k$ iterations. To the best of our knowledge, this represents the most favorable convergence rate for any distributed optimization algorithm designed for smooth convex optimization. Its accelerated convergence behavior is benchmarked against various state-of-the-art distributed optimization algorithms on practical, large-scale problems.

* 10 pages; (Near) optimal convergence rate

Via

Access Paper or Ask Questions

A Methodology Establishing Linear Convergence of Adaptive Gradient Methods under PL Inequality

Jul 17, 2024

Kushal Chakrabarti, Mayank Baranwal

Abstract:Adaptive gradient-descent optimizers are the standard choice for training neural network models. Despite their faster convergence than gradient-descent and remarkable performance in practice, the adaptive optimizers are not as well understood as vanilla gradient-descent. A reason is that the dynamic update of the learning rate that helps in faster convergence of these methods also makes their analysis intricate. Particularly, the simple gradient-descent method converges at a linear rate for a class of optimization problems, whereas the practically faster adaptive gradient methods lack such a theoretical guarantee. The Polyak-{\L}ojasiewicz (PL) inequality is the weakest known class, for which linear convergence of gradient-descent and its momentum variants has been proved. Therefore, in this paper, we prove that AdaGrad and Adam, two well-known adaptive gradient methods, converge linearly when the cost function is smooth and satisfies the PL inequality. Our theoretical framework follows a simple and unified approach, applicable to both batch and stochastic gradients, which can potentially be utilized in analyzing linear convergence of other variants of Adam.

* Accepted for publication at the main track of 27th European Conference on Artificial Intelligence (ECAI-2024)

Via

Access Paper or Ask Questions

Linear Convergence of Pre-Conditioned PI Consensus Algorithm under Restricted Strong Convexity

Sep 30, 2023

Kushal Chakrabarti, Mayank Baranwal

Abstract:This paper considers solving distributed convex optimization problems in peer-to-peer multi-agent networks. The network is assumed to be synchronous and connected. By using the proportional-integral (PI) control strategy, various algorithms with fixed stepsize have been developed. The earliest among them is the PI consensus algorithm. Using Lyapunov theory, we guarantee exponential convergence of the PI consensus algorithm for restricted strongly convex functions with rate-matching discretization, without requiring convexity of individual local cost functions, for the first time. In order to accelerate the PI consensus algorithm, we incorporate local pre-conditioning in the form of constant positive definite matrices and numerically validate its efficiency compared to the prominent distributed convex optimization algorithms. Unlike classical pre-conditioning, where only the gradients are multiplied by a pre-conditioner, the proposed pre-conditioning modifies both the gradients and the consensus terms, thereby controlling the effect of the communication graph between the agents on the PI consensus algorithm.

Via

Access Paper or Ask Questions

Iteratively Preconditioned Gradient-Descent Approach for Moving Horizon Estimation Problems

Jun 22, 2023

Tianchen Liu, Kushal Chakrabarti, Nikhil Chopra

Figure 1 for Iteratively Preconditioned Gradient-Descent Approach for Moving Horizon Estimation Problems

Figure 2 for Iteratively Preconditioned Gradient-Descent Approach for Moving Horizon Estimation Problems

Figure 3 for Iteratively Preconditioned Gradient-Descent Approach for Moving Horizon Estimation Problems

Abstract:Moving horizon estimation (MHE) is a widely studied state estimation approach in several practical applications. In the MHE problem, the state estimates are obtained via the solution of an approximated nonlinear optimization problem. However, this optimization step is known to be computationally complex. Given this limitation, this paper investigates the idea of iteratively preconditioned gradient-descent (IPG) to solve MHE problem with the aim of an improved performance than the existing solution techniques. To our knowledge, the preconditioning technique is used for the first time in this paper to reduce the computational cost and accelerate the crucial optimization step for MHE. The convergence guarantee of the proposed iterative approach for a class of MHE problems is presented. Additionally, sufficient conditions for the MHE problem to be convex are also derived. Finally, the proposed method is implemented on a unicycle localization example. The simulation results demonstrate that the proposed approach can achieve better accuracy with reduced computational costs.

Via

Access Paper or Ask Questions

A Control Theoretic Framework for Adaptive Gradient Optimizers in Machine Learning

Jun 04, 2022

Kushal Chakrabarti, Nikhil Chopra

Figure 1 for A Control Theoretic Framework for Adaptive Gradient Optimizers in Machine Learning

Figure 2 for A Control Theoretic Framework for Adaptive Gradient Optimizers in Machine Learning

Figure 3 for A Control Theoretic Framework for Adaptive Gradient Optimizers in Machine Learning

Figure 4 for A Control Theoretic Framework for Adaptive Gradient Optimizers in Machine Learning

Abstract:Adaptive gradient methods have become popular in optimizing deep neural networks; recent examples include AdaGrad and Adam. Although Adam usually converges faster, variations of Adam, for instance, the AdaBelief algorithm, have been proposed to enhance Adam's poor generalization ability compared to the classical stochastic gradient method. This paper develops a generic framework for adaptive gradient methods that solve non-convex optimization problems. We first model the adaptive gradient methods in a state-space framework, which allows us to present simpler convergence proofs of adaptive optimizers such as AdaGrad, Adam, and AdaBelief. We then utilize the transfer function paradigm from classical control theory to propose a new variant of Adam, coined AdamSSM. We add an appropriate pole-zero pair in the transfer function from squared gradients to the second moment estimate. We prove the convergence of the proposed AdamSSM algorithm. Applications on benchmark machine learning tasks of image classification using CNN architectures and language modeling using LSTM architecture demonstrate that the AdamSSM algorithm improves the gap between generalization accuracy and faster convergence than the recent adaptive gradient methods.

Via

Access Paper or Ask Questions

On Accelerating Distributed Convex Optimizations

Aug 19, 2021

Kushal Chakrabarti, Nirupam Gupta, Nikhil Chopra

Figure 1 for On Accelerating Distributed Convex Optimizations

Figure 2 for On Accelerating Distributed Convex Optimizations

Figure 3 for On Accelerating Distributed Convex Optimizations

Figure 4 for On Accelerating Distributed Convex Optimizations

Abstract:This paper studies a distributed multi-agent convex optimization problem. The system comprises multiple agents in this problem, each with a set of local data points and an associated local cost function. The agents are connected to a server, and there is no inter-agent communication. The agents' goal is to learn a parameter vector that optimizes the aggregate of their local costs without revealing their local data points. In principle, the agents can solve this problem by collaborating with the server using the traditional distributed gradient-descent method. However, when the aggregate cost is ill-conditioned, the gradient-descent method (i) requires a large number of iterations to converge, and (ii) is highly unstable against process noise. We propose an iterative pre-conditioning technique to mitigate the deleterious effects of the cost function's conditioning on the convergence rate of distributed gradient-descent. Unlike the conventional pre-conditioning techniques, the pre-conditioner matrix in our proposed technique updates iteratively to facilitate implementation on the distributed network. In the distributed setting, we provably show that the proposed algorithm converges linearly with an improved rate of convergence than the traditional and adaptive gradient-descent methods. Additionally, for the special case when the minimizer of the aggregate cost is unique, our algorithm converges superlinearly. We demonstrate our algorithm's superior performance compared to prominent distributed algorithms for solving real logistic regression problems and emulating neural network training via a noisy quadratic model, thereby signifying the proposed algorithm's efficiency for distributively solving non-convex optimization. Moreover, we empirically show that the proposed algorithm results in faster training without compromising the generalization performance.

Via

Access Paper or Ask Questions

Generalized AdaGrad and Adam: A State-Space Perspective

May 31, 2021

Kushal Chakrabarti, Nikhil Chopra

Figure 1 for Generalized AdaGrad and Adam: A State-Space Perspective

Figure 2 for Generalized AdaGrad and Adam: A State-Space Perspective

Abstract:Accelerated gradient-based methods are being extensively used for solving non-convex machine learning problems, especially when the data points are abundant or the available data is distributed across several agents. Two of the prominent accelerated gradient algorithms are AdaGrad and Adam. AdaGrad is the simplest accelerated gradient method, which is particularly effective for sparse data. Adam has been shown to perform favorably in deep learning problems compared to other methods. In this paper, we propose a new fast optimizer, Generalized AdaGrad (G-AdaGrad), for accelerating the solution of potentially non-convex machine learning problems. Specifically, we adopt a state-space perspective for analyzing the convergence of gradient acceleration algorithms, namely G-AdaGrad and Adam, in machine learning. Our proposed state-space models are governed by ordinary differential equations. We present simple convergence proofs of these two algorithms in the deterministic settings with minimal assumptions. Our analysis also provides intuition behind improving upon AdaGrad's convergence rate. We provide empirical results on MNIST dataset to reinforce our claims on the convergence and performance of G-AdaGrad and Adam.

Via

Access Paper or Ask Questions

Robustness of Iteratively Pre-Conditioned Gradient-Descent Method: The Case of Distributed Linear Regression Problem

Jan 26, 2021

Kushal Chakrabarti, Nirupam Gupta, Nikhil Chopra

Figure 1 for Robustness of Iteratively Pre-Conditioned Gradient-Descent Method: The Case of Distributed Linear Regression Problem

Figure 2 for Robustness of Iteratively Pre-Conditioned Gradient-Descent Method: The Case of Distributed Linear Regression Problem

Figure 3 for Robustness of Iteratively Pre-Conditioned Gradient-Descent Method: The Case of Distributed Linear Regression Problem

Figure 4 for Robustness of Iteratively Pre-Conditioned Gradient-Descent Method: The Case of Distributed Linear Regression Problem

Abstract:This paper considers the problem of multi-agent distributed linear regression in the presence of system noises. In this problem, the system comprises multiple agents wherein each agent locally observes a set of data points, and the agents' goal is to compute a linear model that best fits the collective data points observed by all the agents. We consider a server-based distributed architecture where the agents interact with a common server to solve the problem; however, the server cannot access the agents' data points. We consider a practical scenario wherein the system either has observation noise, i.e., the data points observed by the agents are corrupted, or has process noise, i.e., the computations performed by the server and the agents are corrupted. In noise-free systems, the recently proposed distributed linear regression algorithm, named the Iteratively Pre-conditioned Gradient-descent (IPG) method, has been claimed to converge faster than related methods. In this paper, we study the robustness of the IPG method, against both the observation noise and the process noise. We empirically show that the robustness of the IPG method compares favorably to the state-of-the-art algorithms.

* in IEEE Control Systems Letters. Related articles: arXiv:2003.07180v2 [math.OC], arXiv:2008.02856v1 [math.OC], and arXiv:2011.07595v2 [math.OC]

Via

Access Paper or Ask Questions

Accelerating Distributed SGD for Linear Regression using Iterative Pre-Conditioning

Nov 28, 2020

Kushal Chakrabarti, Nirupam Gupta, Nikhil Chopra

Figure 1 for Accelerating Distributed SGD for Linear Regression using Iterative Pre-Conditioning

Figure 2 for Accelerating Distributed SGD for Linear Regression using Iterative Pre-Conditioning

Figure 3 for Accelerating Distributed SGD for Linear Regression using Iterative Pre-Conditioning

Figure 4 for Accelerating Distributed SGD for Linear Regression using Iterative Pre-Conditioning

Abstract:This paper considers the multi-agent distributed linear least-squares problem. The system comprises multiple agents, each agent with a locally observed set of data points, and a common server with whom the agents can interact. The agents' goal is to compute a linear model that best fits the collective data points observed by all the agents. In the server-based distributed settings, the server cannot access the data points held by the agents. The recently proposed Iteratively Pre-conditioned Gradient-descent (IPG) method has been shown to converge faster than other existing distributed algorithms that solve this problem. In the IPG algorithm, the server and the agents perform numerous iterative computations. Each of these iterations relies on the entire batch of data points observed by the agents for updating the current estimate of the solution. Here, we extend the idea of iterative pre-conditioning to the stochastic settings, where the server updates the estimate and the iterative pre-conditioning matrix based on a single randomly selected data point at every iteration. We show that our proposed Iteratively Pre-conditioned Stochastic Gradient-descent (IPSG) method converges linearly in expectation to a proximity of the solution. Importantly, we empirically show that the proposed IPSG method's convergence rate compares favorably to prominent stochastic algorithms for solving the linear least-squares problem in server-based networks.

* Changes in the replacement: Application to distributed state estimation problem has been added in Appendix B. Related articles: arXiv:2003.07180v2 [math.OC] and arXiv:2008.02856v1 [math.OC]

Via

Access Paper or Ask Questions