Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Minsung Kim

Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning

Sep 26, 2025

Nakyeong Yang, Dong-Kyum Kim, Jea Kwon, Minsung Kim, Kyomin Jung, Meeyoung Cha

Abstract:Large language models trained on web-scale data can memorize private or sensitive knowledge, raising significant privacy risks. Although some unlearning methods mitigate these risks, they remain vulnerable to "relearning" during subsequent training, allowing a substantial portion of forgotten knowledge to resurface. In this paper, we show that widely used unlearning methods cause shallow alignment: instead of faithfully erasing target knowledge, they generate spurious unlearning neurons that amplify negative influence to hide it. To overcome this limitation, we introduce Ssiuu, a new class of unlearning methods that employs attribution-guided regularization to prevent spurious negative influence and faithfully remove target knowledge. Experimental results confirm that our method reliably erases target knowledge and outperforms strong baselines across two practical retraining scenarios: (1) adversarial injection of private data, and (2) benign attack using an instruction-following benchmark. Our findings highlight the necessity of robust and faithful unlearning methods for safe deployment of language models.

* 15 pages

Via

Access Paper or Ask Questions

Bilinear relational structure fixes reversal curse and enables consistent model editing

Sep 26, 2025

Dong-Kyum Kim, Minsung Kim, Jea Kwon, Nakyeong Yang, Meeyoung Cha

Abstract:The reversal curse -- a language model's (LM) inability to infer an unseen fact ``B is A'' from a learned fact ``A is B'' -- is widely considered a fundamental limitation. We show that this is not an inherent failure but an artifact of how models encode knowledge. By training LMs from scratch on a synthetic dataset of relational knowledge graphs, we demonstrate that bilinear relational structure emerges in their hidden representations. This structure substantially alleviates the reversal curse, enabling LMs to infer unseen reverse facts. Crucially, we also find that this bilinear structure plays a key role in consistent model editing. When a fact is updated in a LM with this structure, the edit correctly propagates to its reverse and other logically dependent facts. In contrast, models lacking this representation not only suffer from the reversal curse but also fail to generalize edits, further introducing logical inconsistencies. Our results establish that training on a relational knowledge dataset induces the emergence of bilinear internal representations, which in turn enable LMs to behave in a logically consistent manner after editing. This implies that the success of model editing depends critically not just on editing algorithms but on the underlying representational geometry of the knowledge being modified.

* 9 pages

Via

Access Paper or Ask Questions

FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge

Feb 26, 2025

Nakyeong Yang, Minsung Kim, Seunghyun Yoon, Joongbo Shin, Kyomin Jung

Figure 1 for FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge

Figure 2 for FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge

Figure 3 for FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge

Figure 4 for FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge

Abstract:Various studies have attempted to remove sensitive or private knowledge from a language model to prevent its unauthorized exposure. However, prior studies have overlooked the complex and interconnected nature of knowledge, where related knowledge must be carefully examined. Specifically, they have failed to evaluate whether an unlearning method faithfully erases interconnected knowledge that should be removed, retaining knowledge that appears relevant but exists in a completely different context. To resolve this problem, we first define a new concept called superficial unlearning, which refers to the phenomenon where an unlearning method either fails to erase the interconnected knowledge it should remove or unintentionally erases irrelevant knowledge. Based on the definition, we introduce a new benchmark, FaithUn, to analyze and evaluate the faithfulness of unlearning in real-world knowledge QA settings. Furthermore, we propose a novel unlearning method, KLUE, which updates only knowledge-related neurons to achieve faithful unlearning. KLUE identifies knowledge neurons using an explainability method and updates only those neurons using selected unforgotten samples. Experimental results demonstrate that widely-used unlearning methods fail to ensure faithful unlearning, while our method shows significant effectiveness in real-world QA unlearning.

* 16 pages

Via

Access Paper or Ask Questions

Generating Diverse Hypotheses for Inductive Reasoning

Dec 18, 2024

Kang-il Lee, Hyukhun Koh, Dongryeol Lee, Seunghyun Yoon, Minsung Kim, Kyomin Jung

Abstract:Inductive reasoning - the process of inferring general rules from a small number of observations - is a fundamental aspect of human intelligence. Recent works suggest that large language models (LLMs) can engage in inductive reasoning by sampling multiple hypotheses about the rules and selecting the one that best explains the observations. However, due to the IID sampling, semantically redundant hypotheses are frequently generated, leading to significant wastage of compute. In this paper, we 1) demonstrate that increasing the temperature to enhance the diversity is limited due to text degeneration issue, and 2) propose a novel method to improve the diversity while maintaining text quality. We first analyze the effect of increasing the temperature parameter, which is regarded as the LLM's diversity control, on IID hypotheses. Our analysis shows that as temperature rises, diversity and accuracy of hypotheses increase up to a certain point, but this trend saturates due to text degeneration. To generate hypotheses that are more semantically diverse and of higher quality, we propose a novel approach inspired by human inductive reasoning, which we call Mixture of Concepts (MoC). When applied to several inductive reasoning benchmarks, MoC demonstrated significant performance improvements compared to standard IID sampling and other approaches.

* 14 pages

Via

Access Paper or Ask Questions

Fine-grained Gender Control in Machine Translation with Large Language Models

Jul 21, 2024

Minwoo Lee, Hyukhun Koh, Minsung Kim, Kyomin Jung

Figure 1 for Fine-grained Gender Control in Machine Translation with Large Language Models

Figure 2 for Fine-grained Gender Control in Machine Translation with Large Language Models

Figure 3 for Fine-grained Gender Control in Machine Translation with Large Language Models

Figure 4 for Fine-grained Gender Control in Machine Translation with Large Language Models

Abstract:In machine translation, the problem of ambiguously gendered input has been pointed out, where the gender of an entity is not available in the source sentence. To address this ambiguity issue, the task of controlled translation that takes the gender of the ambiguous entity as additional input have been proposed. However, most existing works have only considered a simplified setup of one target gender for input. In this paper, we tackle controlled translation in a more realistic setting of inputs with multiple entities and propose Gender-of-Entity (GoE) prompting method for LLMs. Our proposed method instructs the model with fine-grained entity-level gender information to translate with correct gender inflections. By utilizing four evaluation benchmarks, we investigate the controlled translation capability of LLMs in multiple dimensions and find that LLMs reach state-of-the-art performance in controlled translation. Furthermore, we discover an emergence of gender interference phenomenon when controlling the gender of multiple entities. Finally, we address the limitations of existing gender accuracy evaluation metrics and propose leveraging LLMs as an evaluator for gender inflection in machine translation.

* NAACL 2024 Main track long paper

Via

Access Paper or Ask Questions

VLind-Bench: Measuring Language Priors in Large Vision-Language Models

Jun 17, 2024

Kang-il Lee, Minbeom Kim, Minsung Kim, Dongryeol Lee, Hyukhun Koh, Kyomin Jung

Abstract:Large Vision-Language Models (LVLMs) have demonstrated outstanding performance across various multimodal tasks. However, they suffer from a problem known as language prior, where responses are generated based solely on textual patterns while disregarding image information. Addressing the issue of language prior is crucial, as it can lead to undesirable biases or hallucinations when dealing with images that are out of training distribution. Despite its importance, current methods for accurately measuring language priors in LVLMs are poorly studied. Although existing benchmarks based on counterfactual or out-of-distribution images can partially be used to measure language priors, they fail to disentangle language priors from other confounding factors. To this end, we propose a new benchmark called VLind-Bench, which is the first benchmark specifically designed to measure the language priors, or blindness, of LVLMs. It not only includes tests on counterfactual images to assess language priors but also involves a series of tests to evaluate more basic capabilities such as commonsense knowledge, visual perception, and commonsense biases. For each instance in our benchmark, we ensure that all these basic tests are passed before evaluating the language priors, thereby minimizing the influence of other factors on the assessment. The evaluation and analysis of recent LVLMs in our benchmark reveal that almost all models exhibit a significant reliance on language priors, presenting a strong challenge in the field.

Via

Access Paper or Ask Questions

Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in Multilingual Machine Translation

May 23, 2023

Minwoo Lee, Hyukhun Koh, Kang-il Lee, Dongdong Zhang, Minsung Kim, Kyomin Jung

Figure 1 for Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in Multilingual Machine Translation

Figure 2 for Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in Multilingual Machine Translation

Figure 3 for Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in Multilingual Machine Translation

Figure 4 for Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in Multilingual Machine Translation

Abstract:Gender bias is a significant issue in machine translation, leading to ongoing research efforts in developing bias mitigation techniques. However, most works focus on debiasing of bilingual models without consideration for multilingual systems. In this paper, we specifically target the unambiguous gender bias issue of multilingual machine translation models and propose a new mitigation method based on a novel perspective on the problem. We hypothesize that the gender bias in unambiguous settings is due to the lack of gender information encoded into the non-explicit gender words and devise a scheme to encode correct gender information into their latent embeddings. Specifically, we employ Gender-Aware Contrastive Learning, GACL, based on gender pseudo-labels to encode gender information on the encoder embeddings. Our method is target-language-agnostic and applicable to already trained multilingual machine translation models through post-fine-tuning. Through multilingual evaluation, we show that our approach improves gender accuracy by a wide margin without hampering translation performance. We also observe that incorporated gender information transfers and benefits other target languages regarding gender accuracy. Finally, we demonstrate that our method is applicable and beneficial to models of various sizes.

Via

Access Paper or Ask Questions

Bag of Tricks for Developing Diabetic Retinopathy Analysis Framework to Overcome Data Scarcity

Oct 18, 2022

Gitaek Kwon, Eunjin Kim, Sunho Kim, Seongwon Bak, Minsung Kim, Jaeyoung Kim

Figure 1 for Bag of Tricks for Developing Diabetic Retinopathy Analysis Framework to Overcome Data Scarcity

Figure 2 for Bag of Tricks for Developing Diabetic Retinopathy Analysis Framework to Overcome Data Scarcity

Figure 3 for Bag of Tricks for Developing Diabetic Retinopathy Analysis Framework to Overcome Data Scarcity

Figure 4 for Bag of Tricks for Developing Diabetic Retinopathy Analysis Framework to Overcome Data Scarcity

Abstract:Recently, diabetic retinopathy (DR) screening utilizing ultra-wide optical coherence tomography angiography (UW-OCTA) has been used in clinical practices to detect signs of early DR. However, developing a deep learning-based DR analysis system using UW-OCTA images is not trivial due to the difficulty of data collection and the absence of public datasets. By realistic constraints, a model trained on small datasets may obtain sub-par performance. Therefore, to help ophthalmologists be less confused about models' incorrect decisions, the models should be robust even in data scarcity settings. To address the above practical challenging, we present a comprehensive empirical study for DR analysis tasks, including lesion segmentation, image quality assessment, and DR grading. For each task, we introduce a robust training scheme by leveraging ensemble learning, data augmentation, and semi-supervised learning. Furthermore, we propose reliable pseudo labeling that excludes uncertain pseudo-labels based on the model's confidence scores to reduce the negative effect of noisy pseudo-labels. By exploiting the proposed approaches, we achieved 1st place in the Diabetic Retinopathy Analysis Challenge.

Via

Access Paper or Ask Questions

Predictive Modeling of Charge Levels for Battery Electric Vehicles using CNN EfficientNet and IGTD Algorithm

Jun 07, 2022

Seongwoo Choi, Chongzhou Fang, David Haddad, Minsung Kim

Figure 1 for Predictive Modeling of Charge Levels for Battery Electric Vehicles using CNN EfficientNet and IGTD Algorithm

Figure 2 for Predictive Modeling of Charge Levels for Battery Electric Vehicles using CNN EfficientNet and IGTD Algorithm

Figure 3 for Predictive Modeling of Charge Levels for Battery Electric Vehicles using CNN EfficientNet and IGTD Algorithm

Figure 4 for Predictive Modeling of Charge Levels for Battery Electric Vehicles using CNN EfficientNet and IGTD Algorithm

Abstract:Convolutional Neural Networks (CNN) have been a good solution for understanding a vast image dataset. As the increased number of battery-equipped electric vehicles is flourishing globally, there has been much research on understanding which charge levels electric vehicle drivers would choose to charge their vehicles to get to their destination without any prevention. We implemented deep learning approaches to analyze the tabular datasets to understand their state of charge and which charge levels they would choose. In addition, we implemented the Image Generator for Tabular Dataset algorithm to utilize tabular datasets as image datasets to train convolutional neural networks. Also, we integrated other CNN architecture such as EfficientNet to prove that CNN is a great learner for reading information from images that were converted from the tabular dataset, and able to predict charge levels for battery-equipped electric vehicles. We also evaluated several optimization methods to enhance the learning rate of the models and examined further analysis on improving the model architecture.

Via

Access Paper or Ask Questions