Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Minseon Kim

Medical Red Teaming Protocol of Language Models: On the Importance of User Perspectives in Healthcare Settings

Jul 09, 2025

Minseon Kim, Jean-Philippe Corbeil, Alessandro Sordoni, Francois Beaulieu, Paul Vozila

Abstract:As the performance of large language models (LLMs) continues to advance, their adoption is expanding across a wide range of domains, including the medical field. The integration of LLMs into medical applications raises critical safety concerns, particularly due to their use by users with diverse roles, e.g. patients and clinicians, and the potential for model's outputs to directly affect human health. Despite the domain-specific capabilities of medical LLMs, prior safety evaluations have largely focused only on general safety benchmarks. In this paper, we introduce a safety evaluation protocol tailored to the medical domain in both patient user and clinician user perspectives, alongside general safety assessments and quantitatively analyze the safety of medical LLMs. We bridge a gap in the literature by building the PatientSafetyBench containing 466 samples over 5 critical categories to measure safety from the perspective of the patient. We apply our red-teaming protocols on the MediPhi model collection as a case study. To our knowledge, this is the first work to define safety evaluation criteria for medical LLMs through targeted red-teaming taking three different points of view - patient, clinician, and general user - establishing a foundation for safer deployment in medical domains.

Via

Access Paper or Ask Questions

Enhancing Variational Autoencoders with Smooth Robust Latent Encoding

Apr 24, 2025

Hyomin Lee, Minseon Kim, Sangwon Jang, Jongheon Jeong, Sung Ju Hwang

Abstract:Variational Autoencoders (VAEs) have played a key role in scaling up diffusion-based generative models, as in Stable Diffusion, yet questions regarding their robustness remain largely underexplored. Although adversarial training has been an established technique for enhancing robustness in predictive models, it has been overlooked for generative models due to concerns about potential fidelity degradation by the nature of trade-offs between performance and robustness. In this work, we challenge this presumption, introducing Smooth Robust Latent VAE (SRL-VAE), a novel adversarial training framework that boosts both generation quality and robustness. In contrast to conventional adversarial training, which focuses on robustness only, our approach smooths the latent space via adversarial perturbations, promoting more generalizable representations while regularizing with originality representation to sustain original fidelity. Applied as a post-training step on pre-trained VAEs, SRL-VAE improves image robustness and fidelity with minimal computational overhead. Experiments show that SRL-VAE improves both generation quality, in image reconstruction and text-guided image editing, and robustness, against Nightshade attacks and image editing attacks. These results establish a new paradigm, showing that adversarial training, once thought to be detrimental to generative models, can instead enhance both fidelity and robustness.

* Under review

Via

Access Paper or Ask Questions

debug-gym: A Text-Based Environment for Interactive Debugging

Mar 27, 2025

Xingdi Yuan, Morgane M Moss, Charbel El Feghali, Chinmay Singh, Darya Moldavskaya, Drew MacPhee, Lucas Caccia, Matheus Pereira, Minseon Kim, Alessandro Sordoni(+1 more)

Abstract:Large Language Models (LLMs) are increasingly relied upon for coding tasks, yet in most scenarios it is assumed that all relevant information can be either accessed in context or matches their training data. We posit that LLMs can benefit from the ability to interactively explore a codebase to gather the information relevant to their task. To achieve this, we present a textual environment, namely debug-gym, for developing LLM-based agents in an interactive coding setting. Our environment is lightweight and provides a preset of useful tools, such as a Python debugger (pdb), designed to facilitate an LLM-based agent's interactive debugging. Beyond coding and debugging tasks, this approach can be generalized to other tasks that would benefit from information-seeking behavior by an LLM agent.

Via

Access Paper or Ask Questions

Optimizing Query Generation for Enhanced Document Retrieval in RAG

Jul 17, 2024

Hamin Koo, Minseon Kim, Sung Ju Hwang

Figure 1 for Optimizing Query Generation for Enhanced Document Retrieval in RAG

Figure 2 for Optimizing Query Generation for Enhanced Document Retrieval in RAG

Figure 3 for Optimizing Query Generation for Enhanced Document Retrieval in RAG

Figure 4 for Optimizing Query Generation for Enhanced Document Retrieval in RAG

Abstract:Large Language Models (LLMs) excel in various language tasks but they often generate incorrect information, a phenomenon known as "hallucinations". Retrieval-Augmented Generation (RAG) aims to mitigate this by using document retrieval for accurate responses. However, RAG still faces hallucinations due to vague queries. This study aims to improve RAG by optimizing query generation with a query-document alignment score, refining queries using LLMs for better precision and efficiency of document retrieval. Experiments have shown that our approach improves document retrieval, resulting in an average accuracy gain of 1.6%.

Via

Access Paper or Ask Questions

Automatic Jailbreaking of the Text-to-Image Generative AI Systems

May 28, 2024

Minseon Kim, Hyomin Lee, Boqing Gong, Huishuai Zhang, Sung Ju Hwang

Figure 1 for Automatic Jailbreaking of the Text-to-Image Generative AI Systems

Figure 2 for Automatic Jailbreaking of the Text-to-Image Generative AI Systems

Figure 3 for Automatic Jailbreaking of the Text-to-Image Generative AI Systems

Figure 4 for Automatic Jailbreaking of the Text-to-Image Generative AI Systems

Abstract:Recent AI systems have shown extremely powerful performance, even surpassing human performance, on various tasks such as information retrieval, language generation, and image generation based on large language models (LLMs). At the same time, there are diverse safety risks that can cause the generation of malicious contents by circumventing the alignment in LLMs, which are often referred to as jailbreaking. However, most of the previous works only focused on the text-based jailbreaking in LLMs, and the jailbreaking of the text-to-image (T2I) generation system has been relatively overlooked. In this paper, we first evaluate the safety of the commercial T2I generation systems, such as ChatGPT, Copilot, and Gemini, on copyright infringement with naive prompts. From this empirical study, we find that Copilot and Gemini block only 12% and 17% of the attacks with naive prompts, respectively, while ChatGPT blocks 84% of them. Then, we further propose a stronger automated jailbreaking pipeline for T2I generation systems, which produces prompts that bypass their safety guards. Our automated jailbreaking framework leverages an LLM optimizer to generate prompts to maximize degree of violation from the generated images without any weight updates or gradient computation. Surprisingly, our simple yet effective approach successfully jailbreaks the ChatGPT with 11.0% block rate, making it generate copyrighted contents in 76% of the time. Finally, we explore various defense strategies, such as post-generation filtering and machine unlearning techniques, but found that they were inadequate, which suggests the necessity of stronger defense mechanisms.

* Under review

Via

Access Paper or Ask Questions

Protein Representation Learning by Capturing Protein Sequence-Structure-Function Relationship

Apr 29, 2024

Eunji Ko, Seul Lee, Minseon Kim, Dongki Kim

Figure 1 for Protein Representation Learning by Capturing Protein Sequence-Structure-Function Relationship

Figure 2 for Protein Representation Learning by Capturing Protein Sequence-Structure-Function Relationship

Figure 3 for Protein Representation Learning by Capturing Protein Sequence-Structure-Function Relationship

Figure 4 for Protein Representation Learning by Capturing Protein Sequence-Structure-Function Relationship

Abstract:The goal of protein representation learning is to extract knowledge from protein databases that can be applied to various protein-related downstream tasks. Although protein sequence, structure, and function are the three key modalities for a comprehensive understanding of proteins, existing methods for protein representation learning have utilized only one or two of these modalities due to the difficulty of capturing the asymmetric interrelationships between them. To account for this asymmetry, we introduce our novel asymmetric multi-modal masked autoencoder (AMMA). AMMA adopts (1) a unified multi-modal encoder to integrate all three modalities into a unified representation space and (2) asymmetric decoders to ensure that sequence latent features reflect structural and functional information. The experiments demonstrate that the proposed AMMA is highly effective in learning protein representations that exhibit well-aligned inter-modal relationships, which in turn makes it effective for various downstream protein-related tasks.

* ICLR 2024 MLGenX Workshop (Spotlight)

Via

Access Paper or Ask Questions

Context-dependent Instruction Tuning for Dialogue Response Generation

Nov 13, 2023

Jin Myung Kwak, Minseon Kim, Sung Ju Hwang

Figure 1 for Context-dependent Instruction Tuning for Dialogue Response Generation

Figure 2 for Context-dependent Instruction Tuning for Dialogue Response Generation

Figure 3 for Context-dependent Instruction Tuning for Dialogue Response Generation

Figure 4 for Context-dependent Instruction Tuning for Dialogue Response Generation

Abstract:Recent language models have achieved impressive performance in natural language tasks by incorporating instructions with task input during fine-tuning. Since all samples in the same natural language task can be explained with the same task instructions, many instruction datasets only provide a few instructions for the entire task, without considering the input of each example in the task. However, this approach becomes ineffective in complex multi-turn dialogue generation tasks, where the input varies highly with each turn as the dialogue context changes, so that simple task instructions cannot improve the generation performance. To address this limitation, we introduce a context-based instruction fine-tuning framework for each multi-turn dialogue which generates both responses and instructions based on the previous context as input. During the evaluation, the model generates instructions based on the previous context to self-guide the response. The proposed framework produces comparable or even outstanding results compared to the baselines by aligning instructions to the input during fine-tuning with the instructions in quantitative evaluations on dialogue benchmark datasets with reduced computation budget.

* Work in Progress

Via

Access Paper or Ask Questions

Generalizable Lightweight Proxy for Robust NAS against Diverse Perturbations

Jun 08, 2023

Hyeonjeong Ha, Minseon Kim, Sung Ju Hwang

Figure 1 for Generalizable Lightweight Proxy for Robust NAS against Diverse Perturbations

Figure 2 for Generalizable Lightweight Proxy for Robust NAS against Diverse Perturbations

Figure 3 for Generalizable Lightweight Proxy for Robust NAS against Diverse Perturbations

Figure 4 for Generalizable Lightweight Proxy for Robust NAS against Diverse Perturbations

Abstract:Recent neural architecture search (NAS) frameworks have been successful in finding optimal architectures for given conditions (e.g., performance or latency). However, they search for optimal architectures in terms of their performance on clean images only, while robustness against various types of perturbations or corruptions is crucial in practice. Although there exist several robust NAS frameworks that tackle this issue by integrating adversarial training into one-shot NAS, however, they are limited in that they only consider robustness against adversarial attacks and require significant computational resources to discover optimal architectures for a single task, which makes them impractical in real-world scenarios. To address these challenges, we propose a novel lightweight robust zero-cost proxy that considers the consistency across features, parameters, and gradients of both clean and perturbed images at the initialization state. Our approach facilitates an efficient and rapid search for neural architectures capable of learning generalizable features that exhibit robustness across diverse perturbations. The experimental results demonstrate that our proxy can rapidly and efficiently search for neural architectures that are consistently robust against various perturbations on multiple benchmark datasets and diverse search spaces, largely outperforming existing clean zero-shot NAS and robust NAS with reduced search cost.

Via

Access Paper or Ask Questions

Meta-prediction Model for Distillation-Aware NAS on Unseen Datasets

May 26, 2023

Hayeon Lee, Sohyun An, Minseon Kim, Sung Ju Hwang

Figure 1 for Meta-prediction Model for Distillation-Aware NAS on Unseen Datasets

Figure 2 for Meta-prediction Model for Distillation-Aware NAS on Unseen Datasets

Figure 3 for Meta-prediction Model for Distillation-Aware NAS on Unseen Datasets

Figure 4 for Meta-prediction Model for Distillation-Aware NAS on Unseen Datasets

Abstract:Distillation-aware Neural Architecture Search (DaNAS) aims to search for an optimal student architecture that obtains the best performance and/or efficiency when distilling the knowledge from a given teacher model. Previous DaNAS methods have mostly tackled the search for the neural architecture for fixed datasets and the teacher, which are not generalized well on a new task consisting of an unseen dataset and an unseen teacher, thus need to perform a costly search for any new combination of the datasets and the teachers. For standard NAS tasks without KD, meta-learning-based computationally efficient NAS methods have been proposed, which learn the generalized search process over multiple tasks (datasets) and transfer the knowledge obtained over those tasks to a new task. However, since they assume learning from scratch without KD from a teacher, they might not be ideal for DaNAS scenarios. To eliminate the excessive computational cost of DaNAS methods and the sub-optimality of rapid NAS methods, we propose a distillation-aware meta accuracy prediction model, DaSS (Distillation-aware Student Search), which can predict a given architecture's final performances on a dataset when performing KD with a given teacher, without having actually to train it on the target task. The experimental results demonstrate that our proposed meta-prediction model successfully generalizes to multiple unseen datasets for DaNAS tasks, largely outperforming existing meta-NAS methods and rapid NAS baselines. Code is available at https://github.com/CownowAn/DaSS

* ICLR 2023 (Notable-top-25%)

Via

Access Paper or Ask Questions

Language Detoxification with Attribute-Discriminative Latent Space

Oct 19, 2022

Jin Myung Kwak, Minseon Kim, Sung Ju Hwang

Figure 1 for Language Detoxification with Attribute-Discriminative Latent Space

Figure 2 for Language Detoxification with Attribute-Discriminative Latent Space

Figure 3 for Language Detoxification with Attribute-Discriminative Latent Space

Figure 4 for Language Detoxification with Attribute-Discriminative Latent Space

Abstract:Transformer-based Language Models (LMs) achieve remarkable performances on a variety of NLU tasks, but are also prone to generating toxic texts such as insults, threats, and profanities which limit their adaptations to the real-world applications. To overcome this issue, a few text generation approaches aim to detoxify toxic texts with additional LMs or perturbations. However, previous methods require excessive memory, computations, and time which are serious bottlenecks in their real-world application. To address such limitations, we propose an effective yet efficient method for language detoxification using an attribute-discriminative latent space. Specifically, we project the latent space of an original Transformer LM to a discriminative latent space on which the texts are well-separated by their attributes, with the help of a projection block and a discriminator. This allows the LM to control the text generation to be non-toxic with minimal memory and computation overhead. We validate our model, Attribute-Discriminative Language Model (ADLM) on detoxified language and dialogue generation tasks, on which our method significantly outperforms baselines both in performance and efficiency.

Via

Access Paper or Ask Questions