Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Seungcheol Park

Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models

Jun 04, 2025

Seungcheol Park, Jeongin Bae, Beomseok Kwon, Minjun Kim, Byeongwook Kim, Se Jung Kwon, U Kang, Dongsoo Lee

Abstract:How can we quantize large language models while preserving accuracy? Quantization is essential for deploying large language models (LLMs) efficiently. Binary-coding quantization (BCQ) and uniform quantization (UQ) are promising quantization schemes that have strong expressiveness and optimizability, respectively. However, neither scheme leverages both advantages. In this paper, we propose UniQuanF (Unified Quantization with Flexible Mapping), an accurate quantization method for LLMs. UniQuanF harnesses both strong expressiveness and optimizability by unifying the flexible mapping technique in UQ and non-uniform quantization levels of BCQ. We propose unified initialization, and local and periodic mapping techniques to optimize the parameters in UniQuanF precisely. After optimization, our unification theorem removes computational and memory overhead, allowing us to utilize the superior accuracy of UniQuanF without extra deployment costs induced by the unification. Experimental results demonstrate that UniQuanF outperforms existing UQ and BCQ methods, achieving up to 4.60% higher accuracy on GSM8K benchmark.

* ACL 2025 Main Track

Via

Access Paper or Ask Questions

A Comprehensive Survey of Compression Algorithms for Language Models

Jan 27, 2024

Seungcheol Park, Jaehyeon Choi, Sojin Lee, U Kang

Abstract:How can we compress language models without sacrificing accuracy? The number of compression algorithms for language models is rapidly growing to benefit from remarkable advances of recent language models without side effects due to the gigantic size of language models, such as increased carbon emissions and expensive maintenance fees. While numerous compression algorithms have shown remarkable progress in compressing language models, it ironically becomes challenging to capture emerging trends and identify the fundamental concepts underlying them due to the excessive number of algorithms. In this paper, we survey and summarize diverse compression algorithms including pruning, quantization, knowledge distillation, low-rank approximation, parameter sharing, and efficient architecture design. We not only summarize the overall trend of diverse compression algorithms but also select representative algorithms and provide in-depth analyses of them. We discuss the value of each category of compression algorithms, and the desired properties of low-cost compression algorithms which have a significant impact due to the emergence of large language models. Finally, we introduce promising future research topics based on our survey results.

Via

Access Paper or Ask Questions

Knowledge-preserving Pruning for Pre-trained Language Models without Retraining

Aug 07, 2023

Seungcheol Park, Hojun Choi, U Kang

Abstract:Given a pre-trained language model, how can we efficiently compress it without retraining? Retraining-free structured pruning algorithms are crucial in pre-trained language model compression due to their significantly reduced pruning cost and capability to prune large language models. However, existing retraining-free algorithms encounter severe accuracy degradation, as they fail to preserve the useful knowledge of pre-trained models. In this paper, we propose K-pruning (Knowledge-preserving pruning), an accurate retraining-free structured pruning algorithm for pre-trained language models. K-pruning identifies and prunes attention heads and neurons deemed to be superfluous, based on the amount of their inherent knowledge. K-pruning applies an iterative process of pruning followed by knowledge reconstruction for each sub-layer to preserve the knowledge of the pre-trained models. Consequently, K-pruning shows up to 58.02%p higher F1 score than existing retraining-free pruning algorithms under a high compression rate of 80% on the SQuAD benchmark.

Via

Access Paper or Ask Questions

How to Pick the Best Source Data? Measuring Transferability for Heterogeneous Domains

Dec 23, 2019

Seungcheol Park, Huiwen Xu, Taehun Kim, Inhwan Hwang, Kyung-Jun Kim, U Kang

Figure 1 for How to Pick the Best Source Data? Measuring Transferability for Heterogeneous Domains

Figure 2 for How to Pick the Best Source Data? Measuring Transferability for Heterogeneous Domains

Figure 3 for How to Pick the Best Source Data? Measuring Transferability for Heterogeneous Domains

Figure 4 for How to Pick the Best Source Data? Measuring Transferability for Heterogeneous Domains

Abstract:Given a set of source data with pre-trained classification models, how can we fast and accurately select the most useful source data to improve the performance of a target task? We address the problem of measuring transferability for heterogeneous domains, where the source and the target data have different feature spaces and distributions. We propose Transmeter, a novel method to efficiently and accurately measure transferability of two datasets. Transmeter utilizes a pre-trained source classifier and a reconstruction loss to increase its efficiency and performance. Furthermore, Transmeter uses feature transformation layers, label-wise discriminators, and a mean distance loss to learn common representations for source and target domains. As a result, Transmeter and its variant give the most accurate performance in measuring transferability, while giving comparable running times compared to those of competitors.

Via

Access Paper or Ask Questions