Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shivam Sahni

Liger Kernel: Efficient Triton Kernels for LLM Training

Oct 14, 2024

Byron, Hsu, Yun Dai, Vignesh Kothapalli, Qingquan Song, Shao Tang, Siyu Zhu, Steven Shimizu, Shivam Sahni, Haowen Ning(+1 more)

Figure 1 for Liger Kernel: Efficient Triton Kernels for LLM Training

Figure 2 for Liger Kernel: Efficient Triton Kernels for LLM Training

Figure 3 for Liger Kernel: Efficient Triton Kernels for LLM Training

Figure 4 for Liger Kernel: Efficient Triton Kernels for LLM Training

Abstract:Training Large Language Models (LLMs) efficiently at scale presents a formidable challenge, driven by their ever-increasing computational demands and the need for enhanced performance. In this work, we introduce Liger-Kernel, an open-sourced set of Triton kernels developed specifically for LLM training. With kernel optimization techniques like kernel operation fusing and input chunking, our kernels achieve on average a 20% increase in training throughput and a 60% reduction in GPU memory usage for popular LLMs compared to HuggingFace implementations. In addition, Liger-Kernel is designed with modularity, accessibility, and adaptability in mind, catering to both casual and expert users. Comprehensive benchmarks and integration tests are built in to ensure compatibility, performance, correctness, and convergence across diverse computing environments and model architectures. The source code is available under a permissive license at: github.com/linkedin/Liger-Kernel.

* 17 pages, 12 figures

Via

Access Paper or Ask Questions

Exploring Multilingual Text Data Distillation

Aug 09, 2023

Shivam Sahni, Harsh Patel

Figure 1 for Exploring Multilingual Text Data Distillation

Figure 2 for Exploring Multilingual Text Data Distillation

Figure 3 for Exploring Multilingual Text Data Distillation

Figure 4 for Exploring Multilingual Text Data Distillation

Abstract:With the rise of deep learning, large datasets and complex models have become common, requiring significant computing power. To address this, data distillation has emerged as a technique to quickly train models with lower memory and time requirements. However, data distillation on text-based datasets hasn't been explored much because of the challenges rising due to its discrete nature. Additionally, existing dataset distillation methods often struggle to generalize to new architectures. In the paper, we propose several data distillation techniques for multilingual text classification datasets using language-model-based learning methods. We conduct experiments to analyze their performance in terms of classification strength, and cross-architecture generalization. Furthermore, we investigate the language-specific fairness of the data summaries generated by these methods. Our approach builds upon existing techniques, enhancing cross-architecture generalization in the text data distillation domain.

Via

Access Paper or Ask Questions

Exploring Explainability Methods for Graph Neural Networks

Nov 03, 2022

Harsh Patel, Shivam Sahni

Figure 1 for Exploring Explainability Methods for Graph Neural Networks

Figure 2 for Exploring Explainability Methods for Graph Neural Networks

Figure 3 for Exploring Explainability Methods for Graph Neural Networks

Figure 4 for Exploring Explainability Methods for Graph Neural Networks

Abstract:With the growing use of deep learning methods, particularly graph neural networks, which encode intricate interconnectedness information, for a variety of real tasks, there is a necessity for explainability in such settings. In this paper, we demonstrate the applicability of popular explainability approaches on Graph Attention Networks (GAT) for a graph-based super-pixel image classification task. We assess the qualitative and quantitative performance of these techniques on three different datasets and describe our findings. The results shed a fresh light on the notion of explainability in GNNs, particularly GATs.

Via

Access Paper or Ask Questions

Image Inpainting using Partial Convolution

Aug 19, 2021

Harsh Patel, Amey Kulkarni, Shivam Sahni, Udit Vyas

Figure 1 for Image Inpainting using Partial Convolution

Figure 2 for Image Inpainting using Partial Convolution

Figure 3 for Image Inpainting using Partial Convolution

Figure 4 for Image Inpainting using Partial Convolution

Abstract:Image Inpainting is one of the very popular tasks in the field of image processing with broad applications in computer vision. In various practical applications, images are often deteriorated by noise due to the presence of corrupted, lost, or undesirable information. There have been various restoration techniques used in the past with both classical and deep learning approaches for handling such issues. Some traditional methods include image restoration by filling gap pixels using the nearby known pixels or using the moving average over the same. The aim of this paper is to perform image inpainting using robust deep learning methods that use partial convolution layers.

Via

Access Paper or Ask Questions