Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jeonghoon Kim

Laboratory for Natural and Artificial Kinästhese, Convergence Research Center for Artificial Intelligence, Department of Artificial Intelligence, Dongguk University, Seoul, South Korea

Enhancing Hallucination Detection via Future Context

Jul 28, 2025

Joosung Lee, Cheonbok Park, Hwiyeol Jo, Jeonghoon Kim, Joonsuk Park, Kang Min Yoo

Abstract:Large Language Models (LLMs) are widely used to generate plausible text on online platforms, without revealing the generation process. As users increasingly encounter such black-box outputs, detecting hallucinations has become a critical challenge. To address this challenge, we focus on developing a hallucination detection framework for black-box generators. Motivated by the observation that hallucinations, once introduced, tend to persist, we sample future contexts. The sampled future contexts provide valuable clues for hallucination detection and can be effectively integrated with various sampling-based methods. We extensively demonstrate performance improvements across multiple methods using our proposed sampling approach.

Via

Access Paper or Ask Questions

Cross-lingual Collapse: How Language-Centric Foundation Models Shape Reasoning in Large Language Models

Jun 06, 2025

Cheonbok Park, Jeonghoon Kim, Joosung Lee, Sanghwan Bae, Jaegul Choo, Kangmin Yoo

Abstract:We identify \textbf{Cross-lingual Collapse}, a systematic drift in which the chain-of-thought (CoT) of a multilingual language model reverts to its dominant pre-training language even when the prompt is expressed in a different language. Recent large language models (LLMs) with reinforcement learning with verifiable reward (RLVR) have achieved strong logical reasoning performances by exposing their intermediate reasoning traces, giving rise to large reasoning models (LRMs). However, the mechanism behind multilingual reasoning in LRMs is not yet fully explored. To investigate the issue, we fine-tune multilingual LRMs with Group-Relative Policy Optimization (GRPO) on translated versions of the GSM$8$K and SimpleRL-Zoo datasets in three different languages: Chinese, Korean, and Ukrainian. During training, we monitor both task accuracy and language consistency of the reasoning chains. Our experiments reveal three key findings: (i) GRPO rapidly amplifies pre-training language imbalances, leading to the erosion of low-resource languages within just a few hundred updates; (ii) language consistency reward mitigates this drift but does so at the expense of an almost 5 - 10 pp drop in accuracy. and (iii) the resulting language collapse is severely damaging and largely irreversible, as subsequent fine-tuning struggles to steer the model back toward its original target-language reasoning capabilities. Together, these findings point to a remarkable conclusion: \textit{not all languages are trained equally for reasoning}. Furthermore, our paper sheds light on the roles of reward shaping, data difficulty, and pre-training priors in eliciting multilingual reasoning.

* Preprint

Via

Access Paper or Ask Questions

ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search

May 21, 2025

Hyunseok Lee, Jeonghoon Kim, Beomjun Kim, Jihoon Tack, Chansong Jo, Jaehong Lee, Cheonbok Park, Sookyo In, Jinwoo Shin, Kang Min Yoo

Abstract:Recent advances in Multimodal Large Language Models (MLLMs) have enabled autonomous agents to interact with computers via Graphical User Interfaces (GUIs), where accurately localizing the coordinates of interface elements (e.g., buttons) is often required for fine-grained actions. However, this remains significantly challenging, leading prior works to rely on large-scale web datasets to improve the grounding accuracy. In this work, we propose Reasoning Graphical User Interface Grounding for Data Efficiency (ReGUIDE), a novel and effective framework for web grounding that enables MLLMs to learn data efficiently through self-generated reasoning and spatial-aware criticism. More specifically, ReGUIDE learns to (i) self-generate a language reasoning process for the localization via online reinforcement learning, and (ii) criticize the prediction using spatial priors that enforce equivariance under input transformations. At inference time, ReGUIDE further boosts performance through a test-time scaling strategy, which combines spatial search with coordinate aggregation. Our experiments demonstrate that ReGUIDE significantly advances web grounding performance across multiple benchmarks, outperforming baselines with substantially fewer training data points (e.g., only 0.2% samples compared to the best open-sourced baselines).

Via

Access Paper or Ask Questions

SAGE-Amine: Generative Amine Design with Multi-Property Optimization for Efficient CO2 Capture

Mar 04, 2025

Hocheol Lim, Hyein Cho, Jeonghoon Kim

Figure 1 for SAGE-Amine: Generative Amine Design with Multi-Property Optimization for Efficient CO2 Capture

Figure 2 for SAGE-Amine: Generative Amine Design with Multi-Property Optimization for Efficient CO2 Capture

Figure 3 for SAGE-Amine: Generative Amine Design with Multi-Property Optimization for Efficient CO2 Capture

Figure 4 for SAGE-Amine: Generative Amine Design with Multi-Property Optimization for Efficient CO2 Capture

Abstract:Efficient CO2 capture is vital for mitigating climate change, with amine-based solvents being widely used due to their strong reactivity with CO2. However, optimizing key properties such as basicity, viscosity, and absorption capacity remains challenging, as traditional methods rely on labor-intensive experimentation and predefined chemical databases, limiting the exploration of novel solutions. Here, SAGE-Amine was introduced, a generative modeling approach that integrates Scoring-Assisted Generative Exploration (SAGE) with quantitative structure-property relationship models to design new amines tailored for CO2 capture. Unlike conventional virtual screening restricted to existing compounds, SAGE-Amine generates novel amines by leveraging autoregressive natural language processing models trained on amine datasets. SAGE-Amine identified known amines for CO2 capture from scratch and successfully performed single-property optimization, increasing basicity or reducing viscosity or vapor pressure. Furthermore, it facilitated multi-property optimization, simultaneously achieving high basicity with low viscosity and vapor pressure. The 10 top-ranked amines were suggested using SAGE-Amine and their thermodynamic properties were further assessed using COSMO-RS simulations, confirming their potential for CO2 capture. These results highlight the potential of generative modeling in accelerating the discovery of amine solvents and expanding the possibilities for industrial CO2 capture applications.

* 33 pages, 5 figures

Via

Access Paper or Ask Questions

Peri-LN: Revisiting Layer Normalization in the Transformer Architecture

Feb 04, 2025

Jeonghoon Kim, Byeongchan Lee, Cheonbok Park, Yeontaek Oh, Beomjun Kim, Taehwan Yoo, Seongjin Shin, Dongyoon Han, Jinwoo Shin, Kang Min Yoo

Figure 1 for Peri-LN: Revisiting Layer Normalization in the Transformer Architecture

Figure 2 for Peri-LN: Revisiting Layer Normalization in the Transformer Architecture

Figure 3 for Peri-LN: Revisiting Layer Normalization in the Transformer Architecture

Figure 4 for Peri-LN: Revisiting Layer Normalization in the Transformer Architecture

Abstract:Designing Transformer architectures with the optimal layer normalization (LN) strategy that ensures large-scale training stability and expedite convergence has remained elusive, even in this era of large language models (LLMs). To this end, we present a comprehensive analytical foundation for understanding how different LN strategies influence training dynamics in large-scale Transformer training. Until recently, Pre-LN and Post-LN have long dominated standard practices despite their limitations in large-scale training. However, several open-source large-scale models have recently begun silently adopting a third strategy without much explanation. This strategy places layer normalization (LN) peripherally around sublayers, a design we term Peri-LN. While Peri-LN has demonstrated promising empirical performance, its precise mechanisms and benefits remain almost unexplored. Our in-depth analysis shows that Peri-LN strikes an ideal balance in variance growth -- unlike Pre-LN and Post-LN, which are prone to vanishing gradients and ``massive activations.'' To validate our theoretical insight, we conduct large-scale experiments on Transformers up to 3.2B parameters, showing that Peri-LN consistently achieves more balanced variance growth, steadier gradient flow, and convergence stability. Our results suggest that Peri-LN warrants broader consideration for large-scale Transformer architectures, providing renewed insights into the optimal placement and application of LN.

* Preprint

Via

Access Paper or Ask Questions

LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices

Jul 16, 2024

Jung Hyun Lee, Jeonghoon Kim, June Yong Yang, Se Jung Kwon, Eunho Yang, Kang Min Yoo, Dongsoo Lee

Figure 1 for LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices

Figure 2 for LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices

Figure 3 for LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices

Figure 4 for LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices

Abstract:With the commercialization of large language models (LLMs), weight-activation quantization has emerged to compress and accelerate LLMs, achieving high throughput while reducing inference costs. However, existing post-training quantization (PTQ) techniques for quantizing weights and activations of LLMs still suffer from non-negligible accuracy drops, especially on massive multitask language understanding. To address this issue, we propose Low-Rank Quantization (LRQ) $-$ a simple yet effective post-training weight quantization method for LLMs that reconstructs the outputs of an intermediate Transformer block by leveraging low-rank weight-scaling matrices, replacing the conventional full weight-scaling matrices that entail as many learnable scales as their associated weights. Thanks to parameter sharing via low-rank structure, LRQ only needs to learn significantly fewer parameters while enabling the individual scaling of weights, thus boosting the generalization capability of quantized LLMs. We show the superiority of LRQ over prior LLM PTQ works under (i) $8$-bit weight and per-tensor activation quantization, (ii) $4$-bit weight and $8$-bit per-token activation quantization, and (iii) low-bit weight-only quantization schemes. Our code is available at \url{https://github.com/onliwad101/FlexRound_LRQ} to inspire LLM researchers and engineers.

* Preprint

Via

Access Paper or Ask Questions

Improving Multi-hop Logical Reasoning in Knowledge Graphs with Context-Aware Query Representation Learning

Jun 11, 2024

Jeonghoon Kim, Heesoo Jung, Hyeju Jang, Hogun Park

Abstract:Multi-hop logical reasoning on knowledge graphs is a pivotal task in natural language processing, with numerous approaches aiming to answer First-Order Logic (FOL) queries. Recent geometry (e.g., box, cone) and probability (e.g., beta distribution)-based methodologies have effectively addressed complex FOL queries. However, a common challenge across these methods lies in determining accurate geometric bounds or probability parameters for these queries. The challenge arises because existing methods rely on linear sequential operations within their computation graphs, overlooking the logical structure of the query and the relation-induced information that can be gleaned from the relations of the query, which we call the context of the query. To address the problem, we propose a model-agnostic methodology that enhances the effectiveness of existing multi-hop logical reasoning approaches by fully integrating the context of the FOL query graph. Our approach distinctively discerns (1) the structural context inherent to the query structure and (2) the relation-induced context unique to each node in the query graph as delineated in the corresponding knowledge graph. This dual-context paradigm helps nodes within a query graph attain refined internal representations throughout the multi-hop reasoning steps. Through experiments on two datasets, our method consistently enhances the three multi-hop reasoning foundation models, achieving performance improvements of up to 19.5%. Our code is available at https://github.com/kjh9503/caqr.

* Accepted to ACL 2024 Findings

Via

Access Paper or Ask Questions

HyperCLOVA X Technical Report

Apr 13, 2024

Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim(+386 more)

Abstract:We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.

* 44 pages; updated authors list and fixed author names

Via

Access Paper or Ask Questions

Domain Generalization in LiDAR Semantic Segmentation Leveraged by Density Discriminative Feature Embedding

Dec 19, 2023

Jaeyeul Kim, Jungwan Woo, Jeonghoon Kim, Sunghoon Im

Figure 1 for Domain Generalization in LiDAR Semantic Segmentation Leveraged by Density Discriminative Feature Embedding

Figure 2 for Domain Generalization in LiDAR Semantic Segmentation Leveraged by Density Discriminative Feature Embedding

Figure 3 for Domain Generalization in LiDAR Semantic Segmentation Leveraged by Density Discriminative Feature Embedding

Figure 4 for Domain Generalization in LiDAR Semantic Segmentation Leveraged by Density Discriminative Feature Embedding

Abstract:While significant progress has been achieved in LiDAR-based perception, domain generalization continues to present challenges, often resulting in reduced performance when encountering unfamiliar datasets due to domain discrepancies. One of the primary hurdles stems from the variability of LiDAR sensors, leading to inconsistencies in point cloud density distribution. Such inconsistencies can undermine the effectiveness of perception models. We address this challenge by introducing a new approach that acknowledges a fundamental characteristic of LiDAR: the variation in point density due to the distance from the LiDAR to the scene, and the number of beams relative to the field of view. Understanding this, we view each LiDAR's point cloud at various distances as having distinct density distributions, which can be consistent across different LiDAR models. With this insight, we propose the Density Discriminative Feature Embedding (DDFE) module, crafted to specifically extract features related to density while ensuring domain invariance across different LiDAR sensors. In addition, we introduce a straightforward but effective density augmentation technique, designed to broaden the density spectrum and enhance the capabilities of the DDFE. The proposed DDFE stands out as a versatile and lightweight domain generalization module. It can be seamlessly integrated into various 3D backbone networks, consistently outperforming existing state-of-the-art domain generalization approaches. We commit to releasing the source code publicly to foster community collaboration and advancement.

* under review

Via

Access Paper or Ask Questions

Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models

Sep 27, 2023

Jung Hwan Heo, Jeonghoon Kim, Beomseok Kwon, Byeongwook Kim, Se Jung Kwon, Dongsoo Lee

Abstract:Large Language Models (LLMs) have recently demonstrated a remarkable success across various tasks. However, efficiently serving LLMs has been a challenge due to its large memory bottleneck, specifically in small batch inference settings (e.g. mobile devices). Weight-only quantization can be a promising approach, but sub-4 bit quantization remains a challenge due to large-magnitude activation outliers. To mitigate the undesirable outlier effect, we first propose per-IC quantization, a simple yet effective method that creates quantization groups within each input channel (IC) rather than the conventional per-output channel (OC). Our method is motivated by the observation that activation outliers affect the input dimension of the weight matrix, so similarly grouping the weights in the IC direction can isolate outliers to be within a group. We also find that activation outliers do not dictate quantization difficulty, and inherent weight sensitivities also exist. With per-IC quantization as a new outlier-friendly scheme, we then propose Adaptive Dimensions (AdaDim), a versatile quantization framework that can adapt to various weight sensitivity patterns. We demonstrate the effectiveness of AdaDim by augmenting prior methods such as Round-To-Nearest and GPTQ, showing significant improvements across various language modeling benchmarks for both base (up to +4.7% on MMLU) and instruction-tuned (up to +10% on HumanEval) LLMs.

* 17 pages, 10 figures, 7 tables

Via

Access Paper or Ask Questions