Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Saurav Prakash

Embracing Federated Learning: Enabling Weak Client Participation via Partial Model Training

Jun 21, 2024

Sunwoo Lee, Tuo Zhang, Saurav Prakash, Yue Niu, Salman Avestimehr

Abstract:In Federated Learning (FL), clients may have weak devices that cannot train the full model or even hold it in their memory space. To implement large-scale FL applications, thus, it is crucial to develop a distributed learning method that enables the participation of such weak clients. We propose EmbracingFL, a general FL framework that allows all available clients to join the distributed training regardless of their system resource capacity. The framework is built upon a novel form of partial model training method in which each client trains as many consecutive output-side layers as its system resources allow. Our study demonstrates that EmbracingFL encourages each layer to have similar data representations across clients, improving FL efficiency. The proposed partial model training method guarantees convergence to a neighbor of stationary points for non-convex and smooth problems. We evaluate the efficacy of EmbracingFL under a variety of settings with a mixed number of strong, moderate (~40% memory), and weak (~15% memory) clients, datasets (CIFAR-10, FEMNIST, and IMDB), and models (ResNet20, CNN, and LSTM). Our empirical study shows that EmbracingFL consistently achieves high accuracy as like all clients are strong, outperforming the state-of-the-art width reduction methods (i.e. HeteroFL and FjORD).

* IEEE Transactions on Mobile Computing, Early Access, (2024)

Via

Access Paper or Ask Questions

ATP: Enabling Fast LLM Serving via Attention on Top Principal Keys

Mar 01, 2024

Yue Niu, Saurav Prakash, Salman Avestimehr

Abstract:We propose a new attention mechanism with linear complexity, ATP, that fixates \textbf{A}ttention on \textbf{T}op \textbf{P}rincipal keys, rather than on each individual token. Particularly, ATP is driven by an important observation that input sequences are typically low-rank, i.e., input sequences can be represented by a few principal bases. Therefore, instead of directly iterating over all the input tokens, ATP transforms inputs into an orthogonal space and computes attention only on the top principal bases (keys). Owing to the observed low-rank structure in input sequences, ATP is able to capture semantic relationships in input sequences with a few principal keys. Furthermore, the attention complexity is reduced from \emph{quadratic} to \emph{linear} without incurring a noticeable performance drop. ATP further reduces complexity for other linear layers with low-rank inputs, leading to more speedup compared to prior works that solely target the attention module. Our evaluations on various models (e.g., BERT and Llama) demonstrate that ATP achieves comparable accuracy with much lower computation and memory complexity than the standard attention mechanism. In particular, ATP barely loses accuracy with only $1/2$ principal keys, and only incurs around $2\%$ accuracy drops with $1/4$ principal keys.

* 10 pages, 7 figures, 8 tables

Via

Access Paper or Ask Questions

All Rivers Run to the Sea: Private Learning with Asymmetric Flows

Dec 05, 2023

Yue Niu, Ramy E. Ali, Saurav Prakash, Salman Avestimehr

Abstract:Data privacy is of great concern in cloud machine-learning service platforms, when sensitive data are exposed to service providers. While private computing environments (e.g., secure enclaves), and cryptographic approaches (e.g., homomorphic encryption) provide strong privacy protection, their computing performance still falls short compared to cloud GPUs. To achieve privacy protection with high computing performance, we propose Delta, a new private training and inference framework, with comparable model performance as non-private centralized training. Delta features two asymmetric data flows: the main information-sensitive flow and the residual flow. The main part flows into a small model while the residuals are offloaded to a large model. Specifically, Delta embeds the information-sensitive representations into a low-dimensional space while pushing the information-insensitive part into high-dimension residuals. To ensure privacy protection, the low-dimensional information-sensitive part is secured and fed to a small model in a private environment. On the other hand, the residual part is sent to fast cloud GPUs, and processed by a large model. To further enhance privacy and reduce the communication cost, Delta applies a random binary quantization technique along with a DP-based technique to the residuals before sharing them with the public platform. We theoretically show that Delta guarantees differential privacy in the public environment and greatly reduces the complexity in the private environment. We conduct empirical analyses on CIFAR-10, CIFAR-100 and ImageNet datasets and ResNet-18 and ResNet-34, showing that Delta achieves strong privacy protection, fast training, and inference without significantly compromising the model utility.

* 9 pages, 7 figures, 6 tables

Via

Access Paper or Ask Questions

Federated Classification in Hyperbolic Spaces via Secure Aggregation of Convex Hulls

Aug 14, 2023

Saurav Prakash, Jin Sima, Chao Pan, Eli Chien, Olgica Milenkovic

Figure 1 for Federated Classification in Hyperbolic Spaces via Secure Aggregation of Convex Hulls

Figure 2 for Federated Classification in Hyperbolic Spaces via Secure Aggregation of Convex Hulls

Figure 3 for Federated Classification in Hyperbolic Spaces via Secure Aggregation of Convex Hulls

Figure 4 for Federated Classification in Hyperbolic Spaces via Secure Aggregation of Convex Hulls

Abstract:Hierarchical and tree-like data sets arise in many applications, including language processing, graph data mining, phylogeny and genomics. It is known that tree-like data cannot be embedded into Euclidean spaces of finite dimension with small distortion. This problem can be mitigated through the use of hyperbolic spaces. When such data also has to be processed in a distributed and privatized setting, it becomes necessary to work with new federated learning methods tailored to hyperbolic spaces. As an initial step towards the development of the field of federated learning in hyperbolic spaces, we propose the first known approach to federated classification in hyperbolic spaces. Our contributions are as follows. First, we develop distributed versions of convex SVM classifiers for Poincar\'e discs. In this setting, the information conveyed from clients to the global classifier are convex hulls of clusters present in individual client data. Second, to avoid label switching issues, we introduce a number-theoretic approach for label recovery based on the so-called integer $B_h$ sequences. Third, we compute the complexity of the convex hulls in hyperbolic spaces to assess the extent of data leakage; at the same time, in order to limit the communication cost for the hulls, we propose a new quantization method for the Poincar\'e disc coupled with Reed-Solomon-like encoding. Fourth, at server level, we introduce a new approach for aggregating convex hulls of the clients based on balanced graph partitioning. We test our method on a collection of diverse data sets, including hierarchical single-cell RNA-seq data from different patients distributed across different repositories that have stringent privacy constraints. The classification accuracy of our method is up to $\sim 11\%$ better than its Euclidean counterpart, demonstrating the importance of privacy-preserving learning in hyperbolic spaces.

Via

Access Paper or Ask Questions

Machine Unlearning of Federated Clusters

Oct 28, 2022

Chao Pan, Jin Sima, Saurav Prakash, Vishal Rana, Olgica Milenkovic

Figure 1 for Machine Unlearning of Federated Clusters

Figure 2 for Machine Unlearning of Federated Clusters

Figure 3 for Machine Unlearning of Federated Clusters

Figure 4 for Machine Unlearning of Federated Clusters

Abstract:Federated clustering is an unsupervised learning problem that arises in a number of practical applications, including personalized recommender and healthcare systems. With the adoption of recent laws ensuring the "right to be forgotten", the problem of machine unlearning for federated clustering methods has become of significant importance. This work proposes the first known unlearning mechanism for federated clustering with privacy criteria that support simple, provable, and efficient data removal at the client and server level. The gist of our approach is to combine special initialization procedures with quantization methods that allow for secure aggregation of estimated local cluster counts at the server unit. As part of our platform, we introduce secure compressed multiset aggregation (SCMA), which is of independent interest for secure sparse model aggregation. In order to simultaneously facilitate low communication complexity and secret sharing protocols, we integrate Reed-Solomon encoding with special evaluation points into the new SCMA pipeline and derive bounds on the time and communication complexity of different components of the scheme. Compared to completely retraining K-means++ locally and globally for each removal request, we obtain an average speed-up of roughly 84x across seven datasets, two of which contain biological and medical information that is subject to frequent unlearning requests.

Via

Access Paper or Ask Questions

Federated Learning of Large Models at the Edge via Principal Sub-Model Training

Aug 28, 2022

Yue Niu, Saurav Prakash, Souvik Kundu, Sunwoo Lee, Salman Avestimehr

Figure 1 for Federated Learning of Large Models at the Edge via Principal Sub-Model Training

Figure 2 for Federated Learning of Large Models at the Edge via Principal Sub-Model Training

Figure 3 for Federated Learning of Large Models at the Edge via Principal Sub-Model Training

Figure 4 for Federated Learning of Large Models at the Edge via Principal Sub-Model Training

Abstract:Limited compute and communication capabilities of edge users create a significant bottleneck for federated learning (FL) of large models. We consider a realistic, but much less explored, cross-device FL setting in which no client has the capacity to train a full large model nor is willing to share any intermediate activations with the server. To this end, we present Principal Sub-Model (PriSM) training methodology, which leverages models low-rank structure and kernel orthogonality to train sub-models in the orthogonal kernel space. More specifically, by applying singular value decomposition (SVD) to original kernels in the server model, PriSM first obtains a set of principal orthogonal kernels in which each one is weighed by its singular value. Thereafter, PriSM utilizes our novel sampling strategy that selects different subsets of the principal kernels independently to create sub-models for clients. Importantly, a kernel with a large singular value is assigned with a high sampling probability. Thus, each sub-model is a low-rank approximation of the full large model, and all clients together achieve the near full-model training. Our extensive evaluations on multiple datasets in various resource-constrained settings show that PriSM can yield an improved performance of up to 10% compared to existing alternatives, with only around 20% sub-model training.

Via

Access Paper or Ask Questions

Federated Sparse Training: Lottery Aware Model Compression for Resource Constrained Edge

Aug 27, 2022

Sara Babakniya, Souvik Kundu, Saurav Prakash, Yue Niu, Salman Avestimehr

Figure 1 for Federated Sparse Training: Lottery Aware Model Compression for Resource Constrained Edge

Figure 2 for Federated Sparse Training: Lottery Aware Model Compression for Resource Constrained Edge

Figure 3 for Federated Sparse Training: Lottery Aware Model Compression for Resource Constrained Edge

Figure 4 for Federated Sparse Training: Lottery Aware Model Compression for Resource Constrained Edge

Abstract:Limited computation and communication capabilities of clients pose significant challenges in federated learning (FL) over resource-limited edge nodes. A potential solution to this problem is to deploy off-the-shelf sparse learning algorithms that train a binary sparse mask on each client with the expectation of training a consistent sparse server mask. However, as we investigate in this paper, such naive deployments result in a significant accuracy drop compared to FL with dense models, especially under low client's resource budget. In particular, our investigations reveal a serious lack of consensus among the trained masks on clients, which prevents convergence on the server mask and potentially leads to a substantial drop in model performance. Based on such key observations, we propose federated lottery aware sparsity hunting (FLASH), a unified sparse learning framework to make the server win a lottery in terms of a sparse sub-model, which can greatly improve performance under highly resource-limited client settings. Moreover, to address the issue of device heterogeneity, we leverage our findings to propose hetero-FLASH, where clients can have different target sparsity budgets based on their device resource limits. Extensive experimental evaluations with multiple models on various datasets (both IID and non-IID) show superiority of our models in yielding up to $\mathord{\sim}10.1\%$ improved accuracy with $\mathord{\sim}10.26\times$ fewer communication costs, compared to existing alternatives, at similar hyperparameter settings.

* 9 pages, 2 figures, 4 tables

Via

Access Paper or Ask Questions

Coded Computing for Low-Latency Federated Learning over Wireless Edge Networks

Nov 12, 2020

Saurav Prakash, Sagar Dhakal, Mustafa Akdeniz, Yair Yona, Shilpa Talwar, Salman Avestimehr, Nageen Himayat

Figure 1 for Coded Computing for Low-Latency Federated Learning over Wireless Edge Networks

Figure 2 for Coded Computing for Low-Latency Federated Learning over Wireless Edge Networks

Figure 3 for Coded Computing for Low-Latency Federated Learning over Wireless Edge Networks

Figure 4 for Coded Computing for Low-Latency Federated Learning over Wireless Edge Networks

Abstract:Federated learning enables training a global model from data located at the client nodes, without data sharing and moving client data to a centralized server. Performance of federated learning in a multi-access edge computing (MEC) network suffers from slow convergence due to heterogeneity and stochastic fluctuations in compute power and communication link qualities across clients. We propose a novel coded computing framework, CodedFedL, that injects structured coding redundancy into federated learning for mitigating stragglers and speeding up the training procedure. CodedFedL enables coded computing for non-linear federated learning by efficiently exploiting distributed kernel embedding via random Fourier features that transforms the training task into computationally favourable distributed linear regression. Furthermore, clients generate local parity datasets by coding over their local datasets, while the server combines them to obtain the global parity dataset. Gradient from the global parity dataset compensates for straggling gradients during training, and thereby speeds up convergence. For minimizing the epoch deadline time at the MEC server, we provide a tractable approach for finding the amount of coding redundancy and the number of local data points that a client processes during training, by exploiting the statistical properties of compute as well as communication delays. We also characterize the leakage in data privacy when clients share their local parity datasets with the server. We analyze the convergence rate and iteration complexity of CodedFedL under simplifying assumptions, by treating CodedFedL as a stochastic gradient descent algorithm. Furthermore, we conduct numerical experiments using practical network parameters and benchmark datasets, where CodedFedL speeds up the overall training time by up to $15\times$ in comparison to the benchmark schemes.

* Final version to appear in the first issue of the IEEE JSAC Series on Machine Learning for Communications and Networks

Via

Access Paper or Ask Questions

Coded Computing for Federated Learning at the Edge

Jul 14, 2020

Saurav Prakash, Sagar Dhakal, Mustafa Akdeniz, A. Salman Avestimehr, Nageen Himayat

Figure 1 for Coded Computing for Federated Learning at the Edge

Figure 2 for Coded Computing for Federated Learning at the Edge

Figure 3 for Coded Computing for Federated Learning at the Edge

Figure 4 for Coded Computing for Federated Learning at the Edge

Abstract:Federated Learning (FL) is an exciting new paradigm that enables training a global model from data generated locally at the client nodes, without moving client data to a centralized server. Performance of FL in a multi-access edge computing (MEC) network suffers from slow convergence due to heterogeneity and stochastic fluctuations in compute power and communication link qualities across clients. A recent work, Coded Federated Learning (CFL), proposes to mitigate stragglers and speed up training for linear regression tasks by assigning redundant computations at the MEC server. Coding redundancy in CFL is computed by exploiting statistical properties of compute and communication delays. We develop CodedFedL that addresses the difficult task of extending CFL to distributed non-linear regression and classification problems with multioutput labels. The key innovation of our work is to exploit distributed kernel embedding using random Fourier features that transforms the training task into distributed linear regression. We provide an analytical solution for load allocation, and demonstrate significant performance gains for CodedFedL through experiments over benchmark datasets using practical network parameters.

* Work accepted for presentation at the International Workshop on Federated Learning for User Privacy and Data Confidentiality, in Conjunction with ICML 2020 (FL-ICML'20). This work was part of Saurav Prakash's internship projects at Intel

Via

Access Paper or Ask Questions

Coded Federated Learning

Feb 21, 2020

Sagar Dhakal, Saurav Prakash, Yair Yona, Shilpa Talwar, Nageen Himayat

Abstract:Federated learning is a method of training a global model from decentralized data distributed across client devices. Here, model parameters are computed locally by each client device and exchanged with a central server, which aggregates the local models for a global view, without requiring sharing of training data. The convergence performance of federated learning is severely impacted in heterogeneous computing platforms such as those at the wireless edge, where straggling computations and communication links can significantly limit timely model parameter updates. This paper develops a novel coded computing technique for federated learning to mitigate the impact of stragglers. In the proposed Coded Federated Learning (CFL) scheme, each client device privately generates parity training data and shares it with the central server only once at the start of the training phase. The central server can then preemptively perform redundant gradient computations on the composite parity data to compensate for the erased or delayed parameter updates. Our results show that CFL allows the global model to converge nearly four times faster when compared to an uncoded approach

* Presented at the Wireless Edge Intelligence Workshop, IEEE GLOBECOM 2019

Via

Access Paper or Ask Questions