Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sakshi Choudhary

Eigen Attention: Attention in Low-Rank Space for KV Cache Compression

Aug 10, 2024

Utkarsh Saxena, Gobinda Saha, Sakshi Choudhary, Kaushik Roy

Figure 1 for Eigen Attention: Attention in Low-Rank Space for KV Cache Compression

Figure 2 for Eigen Attention: Attention in Low-Rank Space for KV Cache Compression

Figure 3 for Eigen Attention: Attention in Low-Rank Space for KV Cache Compression

Figure 4 for Eigen Attention: Attention in Low-Rank Space for KV Cache Compression

Abstract:Large language models (LLMs) represent a groundbreaking advancement in the domain of natural language processing due to their impressive reasoning abilities. Recently, there has been considerable interest in increasing the context lengths for these models to enhance their applicability to complex tasks. However, at long context lengths and large batch sizes, the key-value (KV) cache, which stores the attention keys and values, emerges as the new bottleneck in memory usage during inference. To address this, we propose Eigen Attention, which performs the attention operation in a low-rank space, thereby reducing the KV cache memory overhead. Our proposed approach is orthogonal to existing KV cache compression techniques and can be used synergistically with them. Through extensive experiments over OPT, MPT, and Llama model families, we demonstrate that Eigen Attention results in up to 40% reduction in KV cache sizes and up to 60% reduction in attention operation latency with minimal drop in performance.

* 12 page, 6 figures, 6 tables

Via

Access Paper or Ask Questions

SADDLe: Sharpness-Aware Decentralized Deep Learning with Heterogeneous Data

May 22, 2024

Sakshi Choudhary, Sai Aparna Aketi, Kaushik Roy

Abstract:Decentralized training enables learning with distributed datasets generated at different locations without relying on a central server. In realistic scenarios, the data distribution across these sparsely connected learning agents can be significantly heterogeneous, leading to local model over-fitting and poor global model generalization. Another challenge is the high communication cost of training models in such a peer-to-peer fashion without any central coordination. In this paper, we jointly tackle these two-fold practical challenges by proposing SADDLe, a set of sharpness-aware decentralized deep learning algorithms. SADDLe leverages Sharpness-Aware Minimization (SAM) to seek a flatter loss landscape during training, resulting in better model generalization as well as enhanced robustness to communication compression. We present two versions of our approach and conduct extensive experiments to show that SADDLe leads to 1-20% improvement in test accuracy compared to other existing techniques. Additionally, our proposed approach is robust to communication compression, with an average drop of only 1% in the presence of up to 4x compression.

Via

Access Paper or Ask Questions

Averaging Rate Scheduler for Decentralized Learning on Heterogeneous Data

Mar 05, 2024

Sai Aparna Aketi, Sakshi Choudhary, Kaushik Roy

Abstract:State-of-the-art decentralized learning algorithms typically require the data distribution to be Independent and Identically Distributed (IID). However, in practical scenarios, the data distribution across the agents can have significant heterogeneity. In this work, we propose averaging rate scheduling as a simple yet effective way to reduce the impact of heterogeneity in decentralized learning. Our experiments illustrate the superiority of the proposed method (~3% improvement in test accuracy) compared to the conventional approach of employing a constant averaging rate.

* 9 pages, 3 figures, 4 tables. arXiv admin note: text overlap with arXiv:2305.04792

Via

Access Paper or Ask Questions

CoDeC: Communication-Efficient Decentralized Continual Learning

Mar 27, 2023

Sakshi Choudhary, Sai Aparna Aketi, Gobinda Saha, Kaushik Roy

Abstract:Training at the edge utilizes continuously evolving data generated at different locations. Privacy concerns prohibit the co-location of this spatially as well as temporally distributed data, deeming it crucial to design training algorithms that enable efficient continual learning over decentralized private data. Decentralized learning allows serverless training with spatially distributed data. A fundamental barrier in such distributed learning is the high bandwidth cost of communicating model updates between agents. Moreover, existing works under this training paradigm are not inherently suitable for learning a temporal sequence of tasks while retaining the previously acquired knowledge. In this work, we propose CoDeC, a novel communication-efficient decentralized continual learning algorithm which addresses these challenges. We mitigate catastrophic forgetting while learning a task sequence in a decentralized learning setup by combining orthogonal gradient projection with gossip averaging across decentralized agents. Further, CoDeC includes a novel lossless communication compression scheme based on the gradient subspaces. We express layer-wise gradients as a linear combination of the basis vectors of these gradient subspaces and communicate the associated coefficients. We theoretically analyze the convergence rate for our algorithm and demonstrate through an extensive set of experiments that CoDeC successfully learns distributed continual tasks with minimal forgetting. The proposed compression scheme results in up to 4.8x reduction in communication costs with iso-performance as the full communication baseline.

Via

Access Paper or Ask Questions