Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Junhan Kim

Attention-aware Post-training Quantization without Backpropagation

Jun 19, 2024

Junhan Kim, Ho-young Kim, Eulrang Cho, Chungman Lee, Joonyoung Kim, Yongkweon Jeon

Figure 1 for Attention-aware Post-training Quantization without Backpropagation

Figure 2 for Attention-aware Post-training Quantization without Backpropagation

Figure 3 for Attention-aware Post-training Quantization without Backpropagation

Figure 4 for Attention-aware Post-training Quantization without Backpropagation

Abstract:Quantization is a promising solution for deploying large-scale language models (LLMs) on resource-constrained devices. Existing quantization approaches, however, rely on gradient-based optimization, regardless of it being post-training quantization (PTQ) or quantization-aware training (QAT), which becomes problematic for hyper-scale LLMs with billions of parameters. This overhead can be alleviated via recently proposed backpropagation-free PTQ methods; however, their performance is somewhat limited by their lack of consideration of inter-layer dependencies. In this paper, we thus propose a novel PTQ algorithm that considers inter-layer dependencies without relying on backpropagation. The fundamental concept involved is the development of attention-aware Hessian matrices, which facilitates the consideration of inter-layer dependencies within the attention module. Extensive experiments demonstrate that the proposed algorithm significantly outperforms conventional PTQ methods, particularly for low bit-widths.

* 20 pages, under review

Via

Access Paper or Ask Questions

Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Feb 14, 2024

Junhan Kim, Kyungphil Park, Chungman Lee, Ho-young Kim, Joonyoung Kim, Yongkweon Jeon

Figure 1 for Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Figure 2 for Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Figure 3 for Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Figure 4 for Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Abstract:With the increasing complexity of generative AI models, post-training quantization (PTQ) has emerged as a promising solution for deploying hyper-scale models on edge devices such as mobile devices and TVs. Existing PTQ schemes, however, consume considerable time and resources, which could be a bottleneck in real situations where frequent model updates and multiple hyper-parameter tunings are required. As a cost-effective alternative, one-shot PTQ schemes have been proposed. Still, the performance is somewhat limited because they cannot consider the inter-layer dependency within the attention module, which is a very important feature of Transformers. In this paper, we thus propose a novel PTQ algorithm that balances accuracy and efficiency. The key idea of the proposed algorithm called aespa is to perform quantization layer-wise for efficiency while considering cross-layer dependency to preserve the attention score. Through extensive experiments on various language models and complexity analysis, we demonstrate that aespa is accurate and efficient in quantizing Transformer models.

* 17 pages, under review

Via

Access Paper or Ask Questions

Vision Transformer-based Feature Extraction for Generalized Zero-Shot Learning

Feb 02, 2023

Jiseob Kim, Kyuhong Shim, Junhan Kim, Byonghyo Shim

Abstract:Generalized zero-shot learning (GZSL) is a technique to train a deep learning model to identify unseen classes using the image attribute. In this paper, we put forth a new GZSL approach exploiting Vision Transformer (ViT) to maximize the attribute-related information contained in the image feature. In ViT, the entire image region is processed without the degradation of the image resolution and the local image information is preserved in patch features. To fully enjoy these benefits of ViT, we exploit patch features as well as the CLS feature in extracting the attribute-related image feature. In particular, we propose a novel attention-based module, called attribute attention module (AAM), to aggregate the attribute-related information in patch features. In AAM, the correlation between each patch feature and the synthetic image attribute is used as the importance weight for each patch. From extensive experiments on benchmark datasets, we demonstrate that the proposed technique outperforms the state-of-the-art GZSL approaches by a large margin.

* 21 pages, 10 figures

Via

Access Paper or Ask Questions

Semantic Feature Extraction for Generalized Zero-shot Learning

Dec 29, 2021

Junhan Kim, Kyuhong Shim, Byonghyo Shim

Figure 1 for Semantic Feature Extraction for Generalized Zero-shot Learning

Figure 2 for Semantic Feature Extraction for Generalized Zero-shot Learning

Figure 3 for Semantic Feature Extraction for Generalized Zero-shot Learning

Figure 4 for Semantic Feature Extraction for Generalized Zero-shot Learning

Abstract:Generalized zero-shot learning (GZSL) is a technique to train a deep learning model to identify unseen classes using the attribute. In this paper, we put forth a new GZSL technique that improves the GZSL classification performance greatly. Key idea of the proposed approach, henceforth referred to as semantic feature extraction-based GZSL (SE-GZSL), is to use the semantic feature containing only attribute-related information in learning the relationship between the image and the attribute. In doing so, we can remove the interference, if any, caused by the attribute-irrelevant information contained in the image feature. To train a network extracting the semantic feature, we present two novel loss functions, 1) mutual information-based loss to capture all the attribute-related information in the image feature and 2) similarity-based loss to remove unwanted attribute-irrelevant information. From extensive experiments using various datasets, we show that the proposed SE-GZSL technique outperforms conventional GZSL approaches by a large margin.

* Accepted at AAAI2022

Via

Access Paper or Ask Questions

Gradual Federated Learning with Simulated Annealing

Oct 11, 2021

Luong Trung Nguyen, Junhan Kim, Byonghyo Shim

Figure 1 for Gradual Federated Learning with Simulated Annealing

Figure 2 for Gradual Federated Learning with Simulated Annealing

Figure 3 for Gradual Federated Learning with Simulated Annealing

Figure 4 for Gradual Federated Learning with Simulated Annealing

Abstract:Federated averaging (FedAvg) is a popular federated learning (FL) technique that updates the global model by averaging local models and then transmits the updated global model to devices for their local model update. One main limitation of FedAvg is that the average-based global model is not necessarily better than local models in the early stage of the training process so that FedAvg might diverge in realistic scenarios, especially when the data is non-identically distributed across devices and the number of data samples varies significantly from device to device. In this paper, we propose a new FL technique based on simulated annealing. The key idea of the proposed technique, henceforth referred to as \textit{simulated annealing-based FL} (SAFL), is to allow a device to choose its local model when the global model is immature. Specifically, by exploiting the simulated annealing strategy, we make each device choose its local model with high probability in early iterations when the global model is immature. From extensive numerical experiments using various benchmark datasets, we demonstrate that SAFL outperforms the conventional FedAvg technique in terms of the convergence speed and the classification accuracy.

Via

Access Paper or Ask Questions