Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Seo Hyun Kim

KLASS: KL-Guided Fast Inference in Masked Diffusion Models

Nov 07, 2025

Seo Hyun Kim, Sunwoo Hong, Hojung Jung, Youngrok Park, Se-Young Yun

Abstract:Masked diffusion models have demonstrated competitive results on various tasks including language generation. However, due to its iterative refinement process, the inference is often bottlenecked by slow and static sampling speed. To overcome this problem, we introduce `KL-Adaptive Stability Sampling' (KLASS), a fast yet effective sampling method that exploits token-level KL divergence to identify stable, high-confidence predictions. By unmasking multiple tokens in each iteration without any additional model training, our approach speeds up generation significantly while maintaining sample quality. On reasoning benchmarks, KLASS achieves up to $2.78\times$ wall-clock speedups while improving performance over standard greedy decoding, attaining state-of-the-art results among diffusion-based samplers. We further validate KLASS across diverse domains, including text, image, and molecular generation, showing its effectiveness as a broadly applicable sampler across different models.

* NeurIPS 2025 Spotlight. Code: https://github.com/shkim0116/KLASS

Via

Access Paper or Ask Questions

Self-Training Elicits Concise Reasoning in Large Language Models

Feb 28, 2025

Tergel Munkhbat, Namgyu Ho, Seo Hyun Kim, Yongjin Yang, Yujin Kim, Se-Young Yun

Abstract:Chain-of-thought (CoT) reasoning has enabled large language models (LLMs) to utilize additional computation through intermediate tokens to solve complex tasks. However, we posit that typical reasoning traces contain many redundant tokens, incurring extraneous inference costs. Upon examination of the output distribution of current LLMs, we find evidence on their latent ability to reason more concisely, relative to their default behavior. To elicit this capability, we propose simple fine-tuning methods which leverage self-generated concise reasoning paths obtained by best-of-N sampling and few-shot conditioning, in task-specific settings. Our combined method achieves a 30% reduction in output tokens on average, across five model families on GSM8K and MATH, while maintaining average accuracy. By exploiting the fundamental stochasticity and in-context learning capabilities of LLMs, our self-training approach robustly elicits concise reasoning on a wide range of models, including those with extensive post-training. Code is available at https://github.com/TergelMunkhbat/concise-reasoning

* 23 pages, 10 figures, 18 tables

Via

Access Paper or Ask Questions

THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation

Jun 16, 2024

Seo Hyun Kim, Kai Tzu-iunn Ong, Taeyoon Kwon, Namyoung Kim, Keummin Ka, SeongHyeon Bae, Yohan Jo, Seung-won Hwang, Dongha Lee, Jinyoung Yeo

Figure 1 for THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation

Figure 2 for THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation

Figure 3 for THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation

Figure 4 for THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation

Abstract:Large language models (LLMs) are capable of processing lengthy dialogue histories during prolonged interaction with users without additional memory modules; however, their responses tend to overlook or incorrectly recall information from the past. In this paper, we revisit memory-augmented response generation in the era of LLMs. While prior work focuses on getting rid of outdated memories, we argue that such memories can provide contextual cues that help dialogue systems understand the development of past events and, therefore, benefit response generation. We present Theanine, a framework that augments LLMs' response generation with memory timelines -- series of memories that demonstrate the development and causality of relevant past events. Along with Theanine, we introduce TeaFarm, a counterfactual-driven question-answering pipeline addressing the limitation of G-Eval in long-term conversations. Supplementary videos of our methods and the TeaBag dataset for TeaFarm evaluation are in https://theanine-693b0.web.app/.

* Under Review

Via

Access Paper or Ask Questions

Ever-Evolving Memory by Blending and Refining the Past

Mar 03, 2024

Seo Hyun Kim, Keummin Ka, Yohan Jo, Seung-won Hwang, Dongha Lee, Jinyoung Yeo

Figure 1 for Ever-Evolving Memory by Blending and Refining the Past

Figure 2 for Ever-Evolving Memory by Blending and Refining the Past

Figure 3 for Ever-Evolving Memory by Blending and Refining the Past

Figure 4 for Ever-Evolving Memory by Blending and Refining the Past

Abstract:For a human-like chatbot, constructing a long-term memory is crucial. A naive approach for making a memory could be simply listing the summarized dialogue. However, this can lead to problems when the speaker's status change over time and contradictory information gets accumulated. It is important that the memory stays organized to lower the confusion for the response generator. In this paper, we propose a novel memory scheme for long-term conversation, CREEM. Unlike existing approaches that construct memory based solely on current sessions, our proposed model blending past memories during memory formation. Additionally, we introduce refining process to handle redundant or outdated information. This innovative approach seeks for overall improvement and coherence of chatbot responses by ensuring a more informed and dynamically evolving long-term memory.

* 9 pages, 2 figures

Via

Access Paper or Ask Questions