Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

YoungBin Kim

Beyond Single-User Dialogue: Assessing Multi-User Dialogue State Tracking Capabilities of Large Language Models

Jun 12, 2025

Sangmin Song, Juhwan Choi, JungMin Yun, YoungBin Kim

Abstract:Large language models (LLMs) have demonstrated remarkable performance in zero-shot dialogue state tracking (DST), reducing the need for task-specific training. However, conventional DST benchmarks primarily focus on structured user-agent conversations, failing to capture the complexities of real-world multi-user interactions. In this study, we assess the robustness of LLMs in multi-user DST while minimizing dataset construction costs. Inspired by recent advances in LLM-based data annotation, we extend an existing DST dataset by generating utterances of a second user based on speech act theory. Our methodology systematically incorporates a second user's utterances into conversations, enabling a controlled evaluation of LLMs in multi-user settings. Experimental results reveal a significant performance drop compared to single-user DST, highlighting the limitations of current LLMs in extracting and tracking dialogue states amidst multiple speakers. Our findings emphasize the need for future research to enhance LLMs for multi-user DST scenarios, paving the way for more realistic and robust DST models.

Via

Access Paper or Ask Questions

Plug-in and Fine-tuning: Bridging the Gap between Small Language Models and Large Language Models

Jun 09, 2025

Kyeonghyun Kim, Jinhee Jang, Juhwan Choi, Yoonji Lee, Kyohoon Jin, YoungBin Kim

Abstract:Large language models (LLMs) are renowned for their extensive linguistic knowledge and strong generalization capabilities, but their high computational demands make them unsuitable for resource-constrained environments. In contrast, small language models (SLMs) are computationally efficient but often lack the broad generalization capacity of LLMs. To bridge this gap, we propose PiFi, a novel framework that combines the strengths of both LLMs and SLMs to achieve high performance while maintaining efficiency. PiFi integrates a single frozen layer from an LLM into a SLM and fine-tunes the combined model for specific tasks, boosting performance without a significant increase in computational cost. We show that PiFi delivers consistent performance improvements across a range of natural language processing tasks, including both natural language understanding and generation. Moreover, our findings demonstrate PiFi's ability to effectively leverage LLM knowledge, enhancing generalization to unseen domains and facilitating the transfer of linguistic abilities.

* ACL 2025 main conference

Via

Access Paper or Ask Questions

Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind

Apr 28, 2025

Mouad Abrini, Omri Abend, Dina Acklin, Henny Admoni, Gregor Aichinger, Nitay Alon, Zahra Ashktorab, Ashish Atreja, Moises Auron, Alexander Aufreiter(+98 more)

Abstract:This volume includes a selection of papers presented at the Workshop on Advancing Artificial Intelligence through Theory of Mind held at AAAI 2025 in Philadelphia US on 3rd March 2025. The purpose of this volume is to provide an open access and curated anthology for the ToM and AI research community.

* workshop proceedings

Via

Access Paper or Ask Questions

See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias

Mar 18, 2025

JuneHyoung Kwon, MiHyeon Kim, Eunju Lee, Juhwan Choi, YoungBin Kim

Figure 1 for See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias

Figure 2 for See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias

Figure 3 for See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias

Figure 4 for See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias

Abstract:Vision-language (VL) models have demonstrated strong performance across various tasks. However, these models often rely on a specific modality for predictions, leading to "dominant modality bias.'' This bias significantly hurts performance, especially when one modality is impaired. In this study, we analyze model behavior under dominant modality bias and theoretically show that unaligned gradients or differences in gradient magnitudes prevent balanced convergence of the loss. Based on these findings, we propose a novel framework, BalGrad to mitigate dominant modality bias. Our approach includes inter-modality gradient reweighting, adjusting the gradient of KL divergence based on each modality's contribution, and inter-task gradient projection to align task directions in a non-conflicting manner. Experiments on UPMC Food-101, Hateful Memes, and MM-IMDb datasets confirm that BalGrad effectively alleviates over-reliance on specific modalities when making predictions.

* Accepted to NAACL 2025 Main

Via

Access Paper or Ask Questions

Questioning Internal Knowledge Structure of Large Language Models Through the Lens of the Olympic Games

Sep 10, 2024

Juhwan Choi, YoungBin Kim

Abstract:Large language models (LLMs) have become a dominant approach in natural language processing, yet their internal knowledge structures remain largely unexplored. In this paper, we analyze the internal knowledge structures of LLMs using historical medal tallies from the Olympic Games. We task the models with providing the medal counts for each team and identifying which teams achieved specific rankings. Our results reveal that while state-of-the-art LLMs perform remarkably well in reporting medal counts for individual teams, they struggle significantly with questions about specific rankings. This suggests that the internal knowledge structures of LLMs are fundamentally different from those of humans, who can easily infer rankings from known medal counts. To support further research, we publicly release our code, dataset, and model outputs.

Via

Access Paper or Ask Questions

VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks

Jul 29, 2024

Juhwan Choi, Junehyoung Kwon, JungMin Yun, Seunguk Yu, YoungBin Kim

Figure 1 for VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks

Figure 2 for VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks

Figure 3 for VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks

Figure 4 for VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks

Abstract:Domain generalizability is a crucial aspect of a deep learning model since it determines the capability of the model to perform well on data from unseen domains. However, research on the domain generalizability of deep learning models for vision-language tasks remains limited, primarily because of the lack of required datasets. To address these challenges, we propose VolDoGer: Vision-Language Dataset for Domain Generalization, a dedicated dataset designed for domain generalization that addresses three vision-language tasks: image captioning, visual question answering, and visual entailment. We constructed VolDoGer by extending LLM-based data annotation techniques to vision-language tasks, thereby alleviating the burden of recruiting human annotators. We evaluated the domain generalizability of various models, ranging from fine-tuned models to a recent multimodal large language model, through VolDoGer.

* 31 pages, 5 figures, 20 tables

Via

Access Paper or Ask Questions

Learning to Detour: Shortcut Mitigating Augmentation for Weakly Supervised Semantic Segmentation

May 28, 2024

JuneHyoung Kwon, Eunju Lee, Yunsung Cho, YoungBin Kim

Abstract:Weakly supervised semantic segmentation (WSSS) employing weak forms of labels has been actively studied to alleviate the annotation cost of acquiring pixel-level labels. However, classifiers trained on biased datasets tend to exploit shortcut features and make predictions based on spurious correlations between certain backgrounds and objects, leading to a poor generalization performance. In this paper, we propose shortcut mitigating augmentation (SMA) for WSSS, which generates synthetic representations of object-background combinations not seen in the training data to reduce the use of shortcut features. Our approach disentangles the object-relevant and background features. We then shuffle and combine the disentangled representations to create synthetic features of diverse object-background combinations. SMA-trained classifier depends less on contexts and focuses more on the target object when making predictions. In addition, we analyzed the behavior of the classifier on shortcut usage after applying our augmentation using an attribution method-based metric. The proposed method achieved the improved performance of semantic segmentation result on PASCAL VOC 2012 and MS COCO 2014 datasets.

* Accepted to WACV 2024

Via

Access Paper or Ask Questions

UniGen: Universal Domain Generalization for Sentiment Classification via Zero-shot Dataset Generation

May 02, 2024

Juhwan Choi, Yeonghwa Kim, Seunguk Yu, JungMin Yun, YoungBin Kim

Abstract:Although pre-trained language models have exhibited great flexibility and versatility with prompt-based few-shot learning, they suffer from the extensive parameter size and limited applicability for inference. Recent studies have suggested that PLMs be used as dataset generators and a tiny task-specific model be trained to achieve efficient inference. However, their applicability to various domains is limited because they tend to generate domain-specific datasets. In this work, we propose a novel approach to universal domain generalization that generates a dataset regardless of the target domain. This allows for generalization of the tiny task model to any domain that shares the label space, thus enhancing the real-world applicability of the dataset generation paradigm. Our experiments indicate that the proposed method accomplishes generalizability across various domains while using a parameter set that is orders of magnitude smaller than PLMs.

Via

Access Paper or Ask Questions

Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation

Apr 15, 2024

Juhwan Choi, Jungmin Yun, Kyohoon Jin, YoungBin Kim

Figure 1 for Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation

Figure 2 for Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation

Figure 3 for Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation

Figure 4 for Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation

Abstract:The quality of the dataset is crucial for ensuring optimal performance and reliability of downstream task models. However, datasets often contain noisy data inadvertently included during the construction process. Numerous attempts have been made to correct this issue through human annotators. However, hiring and managing human annotators is expensive and time-consuming. As an alternative, recent studies are exploring the use of large language models (LLMs) for data annotation. In this study, we present a case study that extends the application of LLM-based data annotation to enhance the quality of existing datasets through a cleansing strategy. Specifically, we leverage approaches such as chain-of-thought (CoT) and majority voting to imitate human annotation and classify unrelated documents from the Multi-News dataset, which is widely used for the multi-document summarization task. Through our proposed cleansing method, we introduce an enhanced Multi-News+. By employing LLMs for data cleansing, we demonstrate an efficient and effective approach to improving dataset quality without relying on expensive human annotation efforts.

Via

Access Paper or Ask Questions

Adverb Is the Key: Simple Text Data Augmentation with Adverb Deletion

Mar 29, 2024

Juhwan Choi, YoungBin Kim

Abstract:In the field of text data augmentation, rule-based methods are widely adopted for real-world applications owing to their cost-efficiency. However, conventional rule-based approaches suffer from the possibility of losing the original semantics of the given text. We propose a novel text data augmentation strategy that avoids such phenomena through a straightforward deletion of adverbs, which play a subsidiary role in the sentence. Our comprehensive experiments demonstrate the efficiency and effectiveness of our proposed approach for not just single text classification, but also natural language inference that requires semantic preservation. We publicly released our source code for reproducibility.

* ICLR 2024 Tiny Papers

Via

Access Paper or Ask Questions