Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Simyung Chang

Chain-of-Rank: Enhancing Large Language Models for Domain-Specific RAG in Edge Device

Feb 21, 2025

Juntae Lee, Jihwan Bang, Seunghan Yang, Kyuhong Shim, Simyung Chang

Figure 1 for Chain-of-Rank: Enhancing Large Language Models for Domain-Specific RAG in Edge Device

Figure 2 for Chain-of-Rank: Enhancing Large Language Models for Domain-Specific RAG in Edge Device

Figure 3 for Chain-of-Rank: Enhancing Large Language Models for Domain-Specific RAG in Edge Device

Figure 4 for Chain-of-Rank: Enhancing Large Language Models for Domain-Specific RAG in Edge Device

Abstract:Retrieval-augmented generation (RAG) with large language models (LLMs) is especially valuable in specialized domains, where precision is critical. To more specialize the LLMs into a target domain, domain-specific RAG has recently been developed by allowing the LLM to access the target domain early via finetuning. The domain-specific RAG makes more sense in resource-constrained environments like edge devices, as they should perform a specific task (e.g. personalization) reliably using only small-scale LLMs. While the domain-specific RAG is well-aligned with edge devices in this respect, it often relies on widely-used reasoning techniques like chain-of-thought (CoT). The reasoning step is useful to understand the given external knowledge, and yet it is computationally expensive and difficult for small-scale LLMs to learn it. Tackling this, we propose the Chain of Rank (CoR) which shifts the focus from intricate lengthy reasoning to simple ranking of the reliability of input external documents. Then, CoR reduces computational complexity while maintaining high accuracy, making it particularly suited for resource-constrained environments. We attain the state-of-the-art (SOTA) results in benchmarks, and analyze its efficacy.

* NAACL 2025 (Findings)

Via

Access Paper or Ask Questions

Unlocking Transfer Learning for Open-World Few-Shot Recognition

Nov 15, 2024

Byeonggeun Kim, Juntae Lee, Kyuhong Shim, Simyung Chang

Abstract:Few-Shot Open-Set Recognition (FSOSR) targets a critical real-world challenge, aiming to categorize inputs into known categories, termed closed-set classes, while identifying open-set inputs that fall outside these classes. Although transfer learning where a model is tuned to a given few-shot task has become a prominent paradigm in closed-world, we observe that it fails to expand to open-world. To unlock this challenge, we propose a two-stage method which consists of open-set aware meta-learning with open-set free transfer learning. In the open-set aware meta-learning stage, a model is trained to establish a metric space that serves as a beneficial starting point for the subsequent stage. During the open-set free transfer learning stage, the model is further adapted to a specific target task through transfer learning. Additionally, we introduce a strategy to simulate open-set examples by modifying the training dataset or generating pseudo open-set examples. The proposed method achieves state-of-the-art performance on two widely recognized benchmarks, miniImageNet and tieredImageNet, with only a 1.5\% increase in training effort. Our work demonstrates the effectiveness of transfer learning in FSOSR.

Via

Access Paper or Ask Questions

Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP

Oct 11, 2024

Eunji Kim, Kyuhong Shim, Simyung Chang, Sungroh Yoon

Figure 1 for Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP

Figure 2 for Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP

Figure 3 for Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP

Figure 4 for Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP

Abstract:A text encoder within Vision-Language Models (VLMs) like CLIP plays a crucial role in translating textual input into an embedding space shared with images, thereby facilitating the interpretative analysis of vision tasks through natural language. Despite the varying significance of different textual elements within a sentence depending on the context, efforts to account for variation of importance in constructing text embeddings have been lacking. We propose a framework of Semantic Token Reweighting to build Interpretable text embeddings (SToRI), which incorporates controllability as well. SToRI refines the text encoding process in CLIP by differentially weighting semantic elements based on contextual importance, enabling finer control over emphasis responsive to data-driven insights and user preferences. The efficacy of SToRI is demonstrated through comprehensive experiments on few-shot image classification and image retrieval tailored to user preferences.

* Accepted at EMNLP 2024 Findings

Via

Access Paper or Ask Questions

InfiniPot: Infinite Context Processing on Memory-Constrained LLMs

Oct 02, 2024

Minsoo Kim, Kyuhong Shim, Jungwook Choi, Simyung Chang

Figure 1 for InfiniPot: Infinite Context Processing on Memory-Constrained LLMs

Figure 2 for InfiniPot: Infinite Context Processing on Memory-Constrained LLMs

Figure 3 for InfiniPot: Infinite Context Processing on Memory-Constrained LLMs

Figure 4 for InfiniPot: Infinite Context Processing on Memory-Constrained LLMs

Abstract:Handling long input contexts remains a significant challenge for Large Language Models (LLMs), particularly in resource-constrained environments such as mobile devices. Our work aims to address this limitation by introducing InfiniPot, a novel KV cache control framework designed to enable pre-trained LLMs to manage extensive sequences within fixed memory constraints efficiently, without requiring additional training. InfiniPot leverages Continual Context Distillation (CCD), an iterative process that compresses and retains essential information through novel importance metrics, effectively maintaining critical data even without access to future context. Our comprehensive evaluations indicate that InfiniPot significantly outperforms models trained for long contexts in various NLP tasks, establishing its efficacy and versatility. This work represents a substantial advancement toward making LLMs applicable to a broader range of real-world scenarios.

* EMNLP 2024 Main

Via

Access Paper or Ask Questions

Feature Diversification and Adaptation for Federated Domain Generalization

Jul 11, 2024

Seunghan Yang, Seokeon Choi, Hyunsin Park, Sungha Choi, Simyung Chang, Sungrack Yun

Figure 1 for Feature Diversification and Adaptation for Federated Domain Generalization

Figure 2 for Feature Diversification and Adaptation for Federated Domain Generalization

Figure 3 for Feature Diversification and Adaptation for Federated Domain Generalization

Figure 4 for Feature Diversification and Adaptation for Federated Domain Generalization

Abstract:Federated learning, a distributed learning paradigm, utilizes multiple clients to build a robust global model. In real-world applications, local clients often operate within their limited domains, leading to a `domain shift' across clients. Privacy concerns limit each client's learning to its own domain data, which increase the risk of overfitting. Moreover, the process of aggregating models trained on own limited domain can be potentially lead to a significant degradation in the global model performance. To deal with these challenges, we introduce the concept of federated feature diversification. Each client diversifies the own limited domain data by leveraging global feature statistics, i.e., the aggregated average statistics over all participating clients, shared through the global model's parameters. This data diversification helps local models to learn client-invariant representations while preserving privacy. Our resultant global model shows robust performance on unseen test domain data. To enhance performance further, we develop an instance-adaptive inference approach tailored for test domain data. Our proposed instance feature adapter dynamically adjusts feature statistics to align with the test input, thereby reducing the domain gap between the test and training domains. We show that our method achieves state-of-the-art performance on several domain generalization benchmarks within a federated learning setting.

* Accepted to ECCV 2024

Via

Access Paper or Ask Questions

Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference

Jun 11, 2024

Jihwan Bang, Juntae Lee, Kyuhong Shim, Seunghan Yang, Simyung Chang

Figure 1 for Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference

Figure 2 for Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference

Figure 3 for Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference

Figure 4 for Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference

Abstract:The customization of large language models (LLMs) for user-specified tasks gets important. However, maintaining all the customized LLMs on cloud servers incurs substantial memory and computational overheads, and uploading user data can also lead to privacy concerns. On-device LLMs can offer a promising solution by mitigating these issues. Yet, the performance of on-device LLMs is inherently constrained by the limitations of small-scaled models. To overcome these restrictions, we first propose Crayon, a novel approach for on-device LLM customization. Crayon begins by constructing a pool of diverse base adapters, and then we instantly blend them into a customized adapter without extra training. In addition, we develop a device-server hybrid inference strategy, which deftly allocates more demanding queries or non-customized tasks to a larger, more capable LLM on a server. This ensures optimal performance without sacrificing the benefits of on-device customization. We carefully craft a novel benchmark from multiple question-answer datasets, and show the efficacy of our method in the LLM customization.

* ACL 2024 Main

Via

Access Paper or Ask Questions

Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data

Aug 31, 2023

Seunghan Yang, Byeonggeun Kim, Kyuhong Shim, Simyung Chang

Figure 1 for Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data

Figure 2 for Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data

Figure 3 for Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data

Figure 4 for Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data

Abstract:Few-shot keyword spotting (FS-KWS) models usually require large-scale annotated datasets to generalize to unseen target keywords. However, existing KWS datasets are limited in scale and gathering keyword-like labeled data is costly undertaking. To mitigate this issue, we propose a framework that uses easily collectible, unlabeled reading speech data as an auxiliary source. Self-supervised learning has been widely adopted for learning representations from unlabeled data; however, it is known to be suitable for large models with enough capacity and is not practical for training a small footprint FS-KWS model. Instead, we automatically annotate and filter the data to construct a keyword-like dataset, LibriWord, enabling supervision on auxiliary data. We then adopt multi-task learning that helps the model to enhance the representation power from out-of-domain auxiliary data. Our method notably improves the performance over competitive methods in the FS-KWS benchmark.

* Interspeech 2023

Via

Access Paper or Ask Questions

Knowledge Distillation from Non-streaming to Streaming ASR Encoder using Auxiliary Non-streaming Layer

Aug 31, 2023

Kyuhong Shim, Jinkyu Lee, Simyung Chang, Kyuwoong Hwang

Abstract:Streaming automatic speech recognition (ASR) models are restricted from accessing future context, which results in worse performance compared to the non-streaming models. To improve the performance of streaming ASR, knowledge distillation (KD) from the non-streaming to streaming model has been studied, mainly focusing on aligning the output token probabilities. In this paper, we propose a layer-to-layer KD from the teacher encoder to the student encoder. To ensure that features are extracted using the same context, we insert auxiliary non-streaming branches to the student and perform KD from the non-streaming teacher layer to the non-streaming auxiliary layer. We design a special KD loss that leverages the autoregressive predictive coding (APC) mechanism to encourage the streaming model to predict unseen future contexts. Experimental results show that the proposed method can significantly reduce the word error rate compared to previous token probability distillation methods.

* Accepted to Interspeech 2023

Via

Access Paper or Ask Questions

Scalable Weight Reparametrization for Efficient Transfer Learning

Feb 26, 2023

Byeonggeun Kim, Jun-Tae Lee, Seunghan yang, Simyung Chang

Abstract:This paper proposes a novel, efficient transfer learning method, called Scalable Weight Reparametrization (SWR) that is efficient and effective for multiple downstream tasks. Efficient transfer learning involves utilizing a pre-trained model trained on a larger dataset and repurposing it for downstream tasks with the aim of maximizing the reuse of the pre-trained model. However, previous works have led to an increase in updated parameters and task-specific modules, resulting in more computations, especially for tiny models. Additionally, there has been no practical consideration for controlling the number of updated parameters. To address these issues, we suggest learning a policy network that can decide where to reparametrize the pre-trained model, while adhering to a given constraint for the number of updated parameters. The policy network is only used during the transfer learning process and not afterward. As a result, our approach attains state-of-the-art performance in a proposed multi-lingual keyword spotting and a standard benchmark, ImageNet-to-Sketch, while requiring zero additional computations and significantly fewer additional parameters.

* ICASSP2023 Accepted

Via

Access Paper or Ask Questions

Quadapter: Adapter for GPT-2 Quantization

Nov 30, 2022

Minseop Park, Jaeseong You, Markus Nagel, Simyung Chang

Figure 1 for Quadapter: Adapter for GPT-2 Quantization

Figure 2 for Quadapter: Adapter for GPT-2 Quantization

Figure 3 for Quadapter: Adapter for GPT-2 Quantization

Figure 4 for Quadapter: Adapter for GPT-2 Quantization

Abstract:Transformer language models such as GPT-2 are difficult to quantize because of outliers in activations leading to a large quantization error. To adapt to the error, one must use quantization-aware training, which entails a fine-tuning process based on the dataset and the training pipeline identical to those for the original model. Pretrained language models, however, often do not grant access to their datasets and training pipelines, forcing us to rely on arbitrary ones for fine-tuning. In that case, it is observed that quantization-aware training overfits the model to the fine-tuning data. For quantization without overfitting, we introduce a quantization adapter (Quadapter), a small set of parameters that are learned to make activations quantization-friendly by scaling them channel-wise. It keeps the model parameters unchanged. By applying our method to the challenging task of quantizing GPT-2, we demonstrate that it effectively prevents the overfitting and improves the quantization performance.

Via

Access Paper or Ask Questions