Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tahseen Rabbani

Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs

Feb 07, 2025

Thierry Bossy, Julien Vignoud, Tahseen Rabbani, Juan R. Troncoso Pastoriza, Martin Jaggi

Abstract:Federated learning (FL) is a popular paradigm for collaborative training which avoids direct data exposure between clients. However, data privacy issues still remain: FL-trained large language models are capable of memorizing and completing phrases and sentences contained in training data when given with their prefixes. Thus, it is possible for adversarial and honest-but-curious clients to recover training data of other participants simply through targeted prompting. In this work, we demonstrate that a popular and simple fine-tuning strategy, low-rank adaptation (LoRA), reduces memorization during FL up to a factor of 10. We study this effect by performing a medical question-answering fine-tuning task and injecting multiple replicas of out-of-distribution sensitive sequences drawn from an external clinical dataset. We observe a reduction in memorization for a wide variety of Llama 2 and 3 models, and find that LoRA can reduce memorization in centralized learning as well. Furthermore, we show that LoRA can be combined with other privacy-preserving techniques such as gradient clipping and Gaussian noising, secure aggregation, and Goldfish loss to further improve record-level privacy while maintaining performance.

Via

Access Paper or Ask Questions

HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing

Dec 24, 2024

Minghui Liu, Tahseen Rabbani, Tony O'Halloran, Ananth Sankaralingam, Mary-Anne Hartley, Brian Gravelle, Furong Huang, Cornelia Fermüller, Yiannis Aloimonos

Figure 1 for HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing

Figure 2 for HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing

Figure 3 for HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing

Figure 4 for HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing

Abstract:Transformer-based large language models (LLMs) use the key-value (KV) cache to significantly accelerate inference by storing the key and value embeddings of past tokens. However, this cache consumes significant GPU memory. In this work, we introduce HashEvict, an algorithm that uses locality-sensitive hashing (LSH) to compress the KV cache. HashEvict quickly locates tokens in the cache that are cosine dissimilar to the current query token. This is achieved by computing the Hamming distance between binarized Gaussian projections of the current token query and cached token keys, with a projection length much smaller than the embedding dimension. We maintain a lightweight binary structure in GPU memory to facilitate these calculations. Unlike existing compression strategies that compute attention to determine token retention, HashEvict makes these decisions pre-attention, thereby reducing computational costs. Additionally, HashEvict is dynamic - at every decoding step, the key and value of the current token replace the embeddings of a token expected to produce the lowest attention score. We demonstrate that HashEvict can compress the KV cache by 30%-70% while maintaining high performance across reasoning, multiple-choice, long-context retrieval and summarization tasks.

* 10 pages, 6 figures, 2 tables

Via

Access Paper or Ask Questions

Sketch-GNN: Scalable Graph Neural Networks with Sublinear Training Complexity

Jun 21, 2024

Mucong Ding, Tahseen Rabbani, Bang An, Evan Z Wang, Furong Huang

Figure 1 for Sketch-GNN: Scalable Graph Neural Networks with Sublinear Training Complexity

Figure 2 for Sketch-GNN: Scalable Graph Neural Networks with Sublinear Training Complexity

Figure 3 for Sketch-GNN: Scalable Graph Neural Networks with Sublinear Training Complexity

Abstract:Graph Neural Networks (GNNs) are widely applied to graph learning problems such as node classification. When scaling up the underlying graphs of GNNs to a larger size, we are forced to either train on the complete graph and keep the full graph adjacency and node embeddings in memory (which is often infeasible) or mini-batch sample the graph (which results in exponentially growing computational complexities with respect to the number of GNN layers). Various sampling-based and historical-embedding-based methods are proposed to avoid this exponential growth of complexities. However, none of these solutions eliminates the linear dependence on graph size. This paper proposes a sketch-based algorithm whose training time and memory grow sublinearly with respect to graph size by training GNNs atop a few compact sketches of graph adjacency and node embeddings. Based on polynomial tensor-sketch (PTS) theory, our framework provides a novel protocol for sketching non-linear activations and graph convolution matrices in GNNs, as opposed to existing methods that sketch linear weights or gradients in neural networks. In addition, we develop a locality-sensitive hashing (LSH) technique that can be trained to improve the quality of sketches. Experiments on large-graph benchmarks demonstrate the scalability and competitive performance of our Sketch-GNNs versus their full-size GNN counterparts.

* NeurIPS 2022

Via

Access Paper or Ask Questions

Calibrated Dataset Condensation for Faster Hyperparameter Search

May 27, 2024

Mucong Ding, Yuancheng Xu, Tahseen Rabbani, Xiaoyu Liu, Brian Gravelle, Teresa Ranadive, Tai-Ching Tuan, Furong Huang

Abstract:Dataset condensation can be used to reduce the computational cost of training multiple models on a large dataset by condensing the training dataset into a small synthetic set. State-of-the-art approaches rely on matching the model gradients between the real and synthetic data. However, there is no theoretical guarantee of the generalizability of the condensed data: data condensation often generalizes poorly across hyperparameters/architectures in practice. This paper considers a different condensation objective specifically geared toward hyperparameter search. We aim to generate a synthetic validation dataset so that the validation-performance rankings of the models, with different hyperparameters, on the condensed and original datasets are comparable. We propose a novel hyperparameter-calibrated dataset condensation (HCDC) algorithm, which obtains the synthetic validation dataset by matching the hyperparameter gradients computed via implicit differentiation and efficient inverse Hessian approximation. Experiments demonstrate that the proposed framework effectively maintains the validation-performance rankings of models and speeds up hyperparameter/architecture search for tasks on both images and graphs.

Via

Access Paper or Ask Questions

A Linear Time and Space Local Point Cloud Geometry Encoder via Vectorized Kernel Mixture

Apr 02, 2024

Dehao Yuan, Cornelia Fermüller, Tahseen Rabbani, Furong Huang, Yiannis Aloimonos

Abstract:We propose VecKM, a novel local point cloud geometry encoder that is descriptive, efficient and robust to noise. VecKM leverages a unique approach by vectorizing a kernel mixture to represent the local point clouds. Such representation is descriptive and robust to noise, which is supported by two theorems that confirm its ability to reconstruct and preserve the similarity of the local shape. Moreover, VecKM is the first successful attempt to reduce the computation and memory costs from $O(n^2+nKd)$ to $O(nd)$ by sacrificing a marginal constant factor, where $n$ is the size of the point cloud and $K$ is neighborhood size. The efficiency is primarily due to VecKM's unique factorizable property that eliminates the need of explicitly grouping points into neighborhoods. In the normal estimation task, VecKM demonstrates not only 100x faster inference speed but also strongest descriptiveness and robustness compared with existing popular encoders. In classification and segmentation tasks, integrating VecKM as a preprocessing module achieves consistently better performance than the PointNet, PointNet++, and point transformer baselines, and runs consistently faster by up to 10x.

Via

Access Paper or Ask Questions

Benchmarking the Robustness of Image Watermarks

Jan 22, 2024

Bang An, Mucong Ding, Tahseen Rabbani, Aakriti Agrawal, Yuancheng Xu, Chenghao Deng, Sicheng Zhu, Abdirisak Mohamed, Yuxin Wen, Tom Goldstein(+1 more)

Abstract:This paper investigates the weaknesses of image watermarking techniques. We present WAVES (Watermark Analysis Via Enhanced Stress-testing), a novel benchmark for assessing watermark robustness, overcoming the limitations of current evaluation methods.WAVES integrates detection and identification tasks, and establishes a standardized evaluation protocol comprised of a diverse range of stress tests. The attacks in WAVES range from traditional image distortions to advanced and novel variations of diffusive, and adversarial attacks. Our evaluation examines two pivotal dimensions: the degree of image quality degradation and the efficacy of watermark detection after attacks. We develop a series of Performance vs. Quality 2D plots, varying over several prominent image similarity metrics, which are then aggregated in a heuristically novel manner to paint an overall picture of watermark robustness and attack potency. Our comprehensive evaluation reveals previously undetected vulnerabilities of several modern watermarking algorithms. We envision WAVES as a toolkit for the future development of robust watermarking systems. The project is available at https://wavesbench.github.io/

Via

Access Paper or Ask Questions

conv_einsum: A Framework for Representation and Fast Evaluation of Multilinear Operations in Convolutional Tensorial Neural Networks

Jan 07, 2024

Tahseen Rabbani, Jiahao Su, Xiaoyu Liu, David Chan, Geoffrey Sangston, Furong Huang

Figure 1 for conv_einsum: A Framework for Representation and Fast Evaluation of Multilinear Operations in Convolutional Tensorial Neural Networks

Figure 2 for conv_einsum: A Framework for Representation and Fast Evaluation of Multilinear Operations in Convolutional Tensorial Neural Networks

Figure 3 for conv_einsum: A Framework for Representation and Fast Evaluation of Multilinear Operations in Convolutional Tensorial Neural Networks

Figure 4 for conv_einsum: A Framework for Representation and Fast Evaluation of Multilinear Operations in Convolutional Tensorial Neural Networks

Abstract:Modern ConvNets continue to achieve state-of-the-art results over a vast array of vision and image classification tasks, but at the cost of increasing parameters. One strategy for compactifying a network without sacrificing much expressive power is to reshape it into a tensorial neural network (TNN), which is a higher-order tensorization of its layers, followed by a factorization, such as a CP-decomposition, which strips a weight down to its critical basis components. Passes through TNNs can be represented as sequences of multilinear operations (MLOs), where the evaluation path can greatly affect the number of floating point operations (FLOPs) incurred. While functions such as the popular einsum can evaluate simple MLOs such as contractions, existing implementations cannot process multi-way convolutions, resulting in scant assessments of how optimal evaluation paths through tensorized convolutional layers can improve training speed. In this paper, we develop a unifying framework for representing tensorial convolution layers as einsum-like strings and a meta-algorithm conv_einsum which is able to evaluate these strings in a FLOPs-minimizing manner. Comprehensive experiments, using our open-source implementation, over a wide range of models, tensor decompositions, and diverse tasks, demonstrate that conv_einsum significantly increases both computational and memory-efficiency of convolutional TNNs.

Via

Access Paper or Ask Questions

Large-Scale Distributed Learning via Private On-Device Locality-Sensitive Hashing

Jun 05, 2023

Tahseen Rabbani, Marco Bornstein, Furong Huang

Figure 1 for Large-Scale Distributed Learning via Private On-Device Locality-Sensitive Hashing

Figure 2 for Large-Scale Distributed Learning via Private On-Device Locality-Sensitive Hashing

Figure 3 for Large-Scale Distributed Learning via Private On-Device Locality-Sensitive Hashing

Figure 4 for Large-Scale Distributed Learning via Private On-Device Locality-Sensitive Hashing

Abstract:Locality-sensitive hashing (LSH) based frameworks have been used efficiently to select weight vectors in a dense hidden layer with high cosine similarity to an input, enabling dynamic pruning. While this type of scheme has been shown to improve computational training efficiency, existing algorithms require repeated randomized projection of the full layer weight, which is impractical for computational- and memory-constrained devices. In a distributed setting, deferring LSH analysis to a centralized host is (i) slow if the device cluster is large and (ii) requires access to input data which is forbidden in a federated context. Using a new family of hash functions, we develop one of the first private, personalized, and memory-efficient on-device LSH frameworks. Our framework enables privacy and personalization by allowing each device to generate hash tables, without the help of a central host, using device-specific hashing hyper-parameters (e.g. number of hash tables or hash length). Hash tables are generated with a compressed set of the full weights, and can be serially generated and discarded if the process is memory-intensive. This allows devices to avoid maintaining (i) the fully-sized model and (ii) large amounts of hash tables in local memory for LSH analysis. We prove several statistical and sensitivity properties of our hash functions, and experimentally demonstrate that our framework is competitive in training large-scale recommender networks compared to other LSH frameworks which assume unrestricted on-device capacity.

* 18 pages, 10 figures

Via

Access Paper or Ask Questions

SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication

Oct 25, 2022

Marco Bornstein, Tahseen Rabbani, Evan Wang, Amrit Singh Bedi, Furong Huang

Figure 1 for SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication

Figure 2 for SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication

Figure 3 for SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication

Figure 4 for SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication

Abstract:The decentralized Federated Learning (FL) setting avoids the role of a potentially unreliable or untrustworthy central host by utilizing groups of clients to collaboratively train a model via localized training and model/gradient sharing. Most existing decentralized FL algorithms require synchronization of client models where the speed of synchronization depends upon the slowest client. In this work, we propose SWIFT: a novel wait-free decentralized FL algorithm that allows clients to conduct training at their own speed. Theoretically, we prove that SWIFT matches the gold-standard iteration convergence rate $\mathcal{O}(1/\sqrt{T})$ of parallel stochastic gradient descent for convex and non-convex smooth optimization (total iterations $T$). Furthermore, we provide theoretical results for IID and non-IID settings without any bounded-delay assumption for slow clients which is required by other asynchronous decentralized FL algorithms. Although SWIFT achieves the same iteration convergence rate with respect to $T$ as other state-of-the-art (SOTA) parallel stochastic algorithms, it converges faster with respect to run-time due to its wait-free structure. Our experimental results demonstrate that SWIFT's run-time is reduced due to a large reduction in communication time per epoch, which falls by an order of magnitude compared to synchronous counterparts. Furthermore, SWIFT produces loss levels for image classification, over IID and non-IID data settings, upwards of 50% faster than existing SOTA algorithms.

* 30 pages, 9 figures

Via

Access Paper or Ask Questions

Comfetch: Federated Learning of Large Networks on Memory-Constrained Clients via Sketching

Sep 17, 2021

Tahseen Rabbani, Brandon Feng, Yifan Yang, Arjun Rajkumar, Amitabh Varshney, Furong Huang

Figure 1 for Comfetch: Federated Learning of Large Networks on Memory-Constrained Clients via Sketching

Figure 2 for Comfetch: Federated Learning of Large Networks on Memory-Constrained Clients via Sketching

Figure 3 for Comfetch: Federated Learning of Large Networks on Memory-Constrained Clients via Sketching

Figure 4 for Comfetch: Federated Learning of Large Networks on Memory-Constrained Clients via Sketching

Abstract:A popular application of federated learning is using many clients to train a deep neural network, the parameters of which are maintained on a central server. While recent efforts have focused on reducing communication complexity, existing algorithms assume that each participating client is able to download the current and full set of parameters, which may not be a practical assumption depending on the memory constraints of clients such as mobile devices. In this work, we propose a novel algorithm Comfetch, which allows clients to train large networks using compressed versions of the global architecture via Count Sketch, thereby reducing communication and local memory costs. We provide a theoretical convergence guarantee and experimentally demonstrate that it is possible to learn large networks, such as a deep convolutional network and an LSTM, through federated agents training on their sketched counterparts. The resulting global models exhibit competitive test accuracy when compared against the state-of-the-art FetchSGD and the classical FedAvg, both of which require clients to download the full architecture.

Via

Access Paper or Ask Questions