Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hsu

Liger Kernel: Efficient Triton Kernels for LLM Training

Oct 14, 2024

Byron, Hsu, Yun Dai, Vignesh Kothapalli, Qingquan Song, Shao Tang, Siyu Zhu, Steven Shimizu, Shivam Sahni, Haowen Ning(+1 more)

Figure 1 for Liger Kernel: Efficient Triton Kernels for LLM Training

Figure 2 for Liger Kernel: Efficient Triton Kernels for LLM Training

Figure 3 for Liger Kernel: Efficient Triton Kernels for LLM Training

Figure 4 for Liger Kernel: Efficient Triton Kernels for LLM Training

Abstract:Training Large Language Models (LLMs) efficiently at scale presents a formidable challenge, driven by their ever-increasing computational demands and the need for enhanced performance. In this work, we introduce Liger-Kernel, an open-sourced set of Triton kernels developed specifically for LLM training. With kernel optimization techniques like kernel operation fusing and input chunking, our kernels achieve on average a 20% increase in training throughput and a 60% reduction in GPU memory usage for popular LLMs compared to HuggingFace implementations. In addition, Liger-Kernel is designed with modularity, accessibility, and adaptability in mind, catering to both casual and expert users. Comprehensive benchmarks and integration tests are built in to ensure compatibility, performance, correctness, and convergence across diverse computing environments and model architectures. The source code is available under a permissive license at: github.com/linkedin/Liger-Kernel.

* 17 pages, 12 figures

Via

Access Paper or Ask Questions

Fast Landmark Localization with 3D Component Reconstruction and CNN for Cross-Pose Recognition

Aug 31, 2017

Gee-Sern, Hsu, Hung-Cheng Shie, Cheng-Hua Hsieh

Figure 1 for Fast Landmark Localization with 3D Component Reconstruction and CNN for Cross-Pose Recognition

Figure 2 for Fast Landmark Localization with 3D Component Reconstruction and CNN for Cross-Pose Recognition

Figure 3 for Fast Landmark Localization with 3D Component Reconstruction and CNN for Cross-Pose Recognition

Figure 4 for Fast Landmark Localization with 3D Component Reconstruction and CNN for Cross-Pose Recognition

Abstract:Two approaches are proposed for cross-pose face recognition, one is based on the 3D reconstruction of facial components and the other is based on the deep Convolutional Neural Network (CNN). Unlike most 3D approaches that consider holistic faces, the proposed approach considers 3D facial components. It segments a 2D gallery face into components, reconstructs the 3D surface for each component, and recognizes a probe face by component features. The segmentation is based on the landmarks located by a hierarchical algorithm that combines the Faster R-CNN for face detection and the Reduced Tree Structured Model for landmark localization. The core part of the CNN-based approach is a revised VGG network. We study the performances with different settings on the training set, including the synthesized data from 3D reconstruction, the real-life data from an in-the-wild database, and both types of data combined. We investigate the performances of the network when it is employed as a classifier or designed as a feature extractor. The two recognition approaches and the fast landmark localization are evaluated in extensive experiments, and compared to stateof-the-art methods to demonstrate their efficacy.

* 14 pages, 12 figures, 4 tables

Via

Access Paper or Ask Questions