Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kwanyoung Kim

Single-Step Bidirectional Unpaired Image Translation Using Implicit Bridge Consistency Distillation

Mar 19, 2025

Suhyeon Lee, Kwanyoung Kim, Jong Chul Ye

Abstract:Unpaired image-to-image translation has seen significant progress since the introduction of CycleGAN. However, methods based on diffusion models or Schr\"odinger bridges have yet to be widely adopted in real-world applications due to their iterative sampling nature. To address this challenge, we propose a novel framework, Implicit Bridge Consistency Distillation (IBCD), which enables single-step bidirectional unpaired translation without using adversarial loss. IBCD extends consistency distillation by using a diffusion implicit bridge model that connects PF-ODE trajectories between distributions. Additionally, we introduce two key improvements: 1) distribution matching for consistency distillation and 2) adaptive weighting method based on distillation difficulty. Experimental results demonstrate that IBCD achieves state-of-the-art performance on benchmark datasets in a single generation step. Project page available at https://hyn2028.github.io/project_page/IBCD/index.html

* 25 pages, 16 figures

Via

Access Paper or Ask Questions

PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity

Mar 10, 2025

Kwanyoung Kim, Byeongsu Sim

Abstract:Diffusion models have shown impressive results in generating high-quality conditional samples using guidance techniques such as Classifier-Free Guidance (CFG). However, existing methods often require additional training or neural function evaluations (NFEs), making them incompatible with guidance-distilled models. Also, they rely on heuristic approaches that need identifying target layers. In this work, we propose a novel and efficient method, termed PLADIS, which boosts pre-trained models (U-Net/Transformer) by leveraging sparse attention. Specifically, we extrapolate query-key correlations using softmax and its sparse counterpart in the cross-attention layer during inference, without requiring extra training or NFEs. By leveraging the noise robustness of sparse attention, our PLADIS unleashes the latent potential of text-to-image diffusion models, enabling them to excel in areas where they once struggled with newfound effectiveness. It integrates seamlessly with guidance techniques, including guidance-distilled models. Extensive experiments show notable improvements in text alignment and human preference, offering a highly efficient and universally applicable solution.

* 29 pages, 19 figures

Via

Access Paper or Ask Questions

OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation

Mar 21, 2024

Kwanyoung Kim, Yujin Oh, Jong Chul Ye

Figure 1 for OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation

Figure 2 for OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation

Figure 3 for OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation

Figure 4 for OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation

Abstract:The recent success of CLIP has demonstrated promising results in zero-shot semantic segmentation by transferring muiltimodal knowledge to pixel-level classification. However, leveraging pre-trained CLIP knowledge to closely align text embeddings with pixel embeddings still has limitations in existing approaches. To address this issue, we propose OTSeg, a novel multimodal attention mechanism aimed at enhancing the potential of multiple text prompts for matching associated pixel embeddings. We first propose Multi-Prompts Sinkhorn (MPS) based on the Optimal Transport (OT) algorithm, which leads multiple text prompts to selectively focus on various semantic features within image pixels. Moreover, inspired by the success of Sinkformers in unimodal settings, we introduce the extension of MPS, called Multi-Prompts Sinkhorn Attention (MPSA), which effectively replaces cross-attention mechanisms within Transformer framework in multimodal settings. Through extensive experiments, we demonstrate that OTSeg achieves state-of-the-art (SOTA) performance with significant gains on Zero-Shot Semantic Segmentation (ZS3) tasks across three benchmark datasets.

* 22 pages, 7 figures

Via

Access Paper or Ask Questions

UNICORN: Ultrasound Nakagami Imaging via Score Matching and Adaptation

Mar 10, 2024

Kwanyoung Kim, Jaa-Yeon Lee, Jong Chul Ye

Abstract:Nakagami imaging holds promise for visualizing and quantifying tissue scattering in ultrasound waves, with potential applications in tumor diagnosis and fat fraction estimation which are challenging to discern by conventional ultrasound B-mode images. Existing methods struggle with optimal window size selection and suffer from estimator instability, leading to degraded resolution images. To address this, here we propose a novel method called UNICORN (Ultrasound Nakagami Imaging via Score Matching and Adaptation), that offers an accurate, closed-form estimator for Nakagami parameter estimation in terms of the score function of ultrasonic envelope. Extensive experiments using simulation and real ultrasound RF data demonstrate UNICORN's superiority over conventional approaches in accuracy and resolution quality.

* 12 pages, 5 figure

Via

Access Paper or Ask Questions

RO-LLaMA: Generalist LLM for Radiation Oncology via Noise Augmentation and Consistency Regularization

Nov 27, 2023

Kwanyoung Kim, Yujin Oh, Sangjoon Park, Hwa Kyung Byun, Jin Sung Kim, Yong Bae Kim, Jong Chul Ye

Abstract:Recent advancements in Artificial Intelligence (AI) have profoundly influenced medical fields, by providing tools to reduce clinical workloads. However, most AI models are constrained to execute uni-modal tasks, in stark contrast to the comprehensive approaches utilized by medical professionals. To address this, here we present RO-LLaMA, a versatile generalist large language model (LLM) tailored for the field of radiation oncology. This model seamlessly covers a wide range of the workflow of radiation oncologists, adept at various tasks such as clinical report summarization, radiation therapy plan suggestion, and plan-guided therapy target volume segmentation. In particular, to maximize the end-to-end performance, we further present a novel Consistency Embedding Fine-Tuning (CEFTune) technique, which boosts LLM's robustness to additional errors at the intermediates while preserving the capability of handling clean inputs, and creatively transform this concept into LLM-driven segmentation framework as Consistency Embedding Segmentation (CESEG). Experimental results on multi-centre cohort sets demonstrate our proposed RO-LLaMA's promising performance for diverse tasks with generalization capabilities.

Via

Access Paper or Ask Questions

Unpaired Image-to-Image Translation via Neural Schrödinger Bridge

May 24, 2023

Beomsu Kim, Gihyun Kwon, Kwanyoung Kim, Jong Chul Ye

Abstract:Diffusion models are a powerful class of generative models which simulate stochastic differential equations (SDEs) to generate data from noise. Although diffusion models have achieved remarkable progress in recent years, they have limitations in the unpaired image-to-image translation tasks due to the Gaussian prior assumption. Schr\"odinger Bridge (SB), which learns an SDE to translate between two arbitrary distributions, have risen as an attractive solution to this problem. However, none of SB models so far have been successful at unpaired translation between high-resolution images. In this work, we propose the Unpaired Neural Schr\"odinger Bridge (UNSB), which combines SB with adversarial training and regularization to learn a SB between unpaired data. We demonstrate that UNSB is scalable, and that it successfully solves various unpaired image-to-image translation tasks. Code: \url{https://github.com/cyclomon/UNSB}

Via

Access Paper or Ask Questions

ZegOT: Zero-shot Segmentation Through Optimal Transport of Text Prompts

Jan 28, 2023

Kwanyoung Kim, Yujin Oh, Jong Chul Ye

Figure 1 for ZegOT: Zero-shot Segmentation Through Optimal Transport of Text Prompts

Figure 2 for ZegOT: Zero-shot Segmentation Through Optimal Transport of Text Prompts

Figure 3 for ZegOT: Zero-shot Segmentation Through Optimal Transport of Text Prompts

Figure 4 for ZegOT: Zero-shot Segmentation Through Optimal Transport of Text Prompts

Abstract:Recent success of large-scale Contrastive Language-Image Pre-training (CLIP) has led to great promise in zero-shot semantic segmentation by transferring image-text aligned knowledge to pixel-level classification. However, existing methods usually require an additional image encoder or retraining/tuning the CLIP module. Here, we present a cost-effective strategy using text-prompt learning that keeps the entire CLIP module frozen while fully leveraging its rich information. Specifically, we propose a novel Zero-shot segmentation with Optimal Transport (ZegOT) method that matches multiple text prompts with frozen image embeddings through optimal transport, which allows each text prompt to efficiently focus on specific semantic attributes. Additionally, we propose Deep Local Feature Alignment (DLFA) that deeply aligns the text prompts with intermediate local feature of the frozen image encoder layers, which significantly boosts the zero-shot segmentation performance. Through extensive experiments on benchmark datasets, we show that our method achieves the state-of-the-art (SOTA) performance with only x7 lighter parameters compared to previous SOTA approaches.

* 16pages, 9 figures

Via

Access Paper or Ask Questions

Noise Distribution Adaptive Self-Supervised Image Denoising using Tweedie Distribution and Score Matching

Dec 05, 2021

Kwanyoung Kim, Taesung Kwon, Jong Chul Ye

Figure 1 for Noise Distribution Adaptive Self-Supervised Image Denoising using Tweedie Distribution and Score Matching

Figure 2 for Noise Distribution Adaptive Self-Supervised Image Denoising using Tweedie Distribution and Score Matching

Figure 3 for Noise Distribution Adaptive Self-Supervised Image Denoising using Tweedie Distribution and Score Matching

Figure 4 for Noise Distribution Adaptive Self-Supervised Image Denoising using Tweedie Distribution and Score Matching

Abstract:Tweedie distributions are a special case of exponential dispersion models, which are often used in classical statistics as distributions for generalized linear models. Here, we reveal that Tweedie distributions also play key roles in modern deep learning era, leading to a distribution independent self-supervised image denoising formula without clean reference images. Specifically, by combining with the recent Noise2Score self-supervised image denoising approach and the saddle point approximation of Tweedie distribution, we can provide a general closed-form denoising formula that can be used for large classes of noise distributions without ever knowing the underlying noise distribution. Similar to the original Noise2Score, the new approach is composed of two successive steps: score matching using perturbed noisy images, followed by a closed form image denoising formula via distribution-independent Tweedie's formula. This also suggests a systematic algorithm to estimate the noise model and noise parameters for a given noisy image data set. Through extensive experiments, we demonstrate that the proposed method can accurately estimate noise models and parameters, and provide the state-of-the-art self-supervised image denoising performance in the benchmark dataset and real-world dataset.

Via

Access Paper or Ask Questions

Noise2Score: Tweedie's Approach to Self-Supervised Image Denoising without Clean Images

Jun 13, 2021

Kwanyoung Kim, Jong Chul Ye

Figure 1 for Noise2Score: Tweedie's Approach to Self-Supervised Image Denoising without Clean Images

Figure 2 for Noise2Score: Tweedie's Approach to Self-Supervised Image Denoising without Clean Images

Figure 3 for Noise2Score: Tweedie's Approach to Self-Supervised Image Denoising without Clean Images

Figure 4 for Noise2Score: Tweedie's Approach to Self-Supervised Image Denoising without Clean Images

Abstract:Recently, there has been extensive research interest in training deep networks to denoise images without clean reference. However, the representative approaches such as Noise2Noise, Noise2Void, Stein's unbiased risk estimator (SURE), etc. seem to differ from one another and it is difficult to find the coherent mathematical structure. To address this, here we present a novel approach, called Noise2Score, which reveals a missing link in order to unite these seemingly different approaches. Specifically, we show that image denoising problems without clean images can be addressed by finding the mode of the posterior distribution and that the Tweedie's formula offers an explicit solution through the score function (i.e. the gradient of log likelihood). Our method then uses the recent finding that the score function can be stably estimated from the noisy images using the amortized residual denoising autoencoder, the method of which is closely related to Noise2Noise or Nose2Void. Our Noise2Score approach is so universal that the same network training can be used to remove noises from images that are corrupted by any exponential family distributions and noise parameters. Using extensive experiments with Gaussian, Poisson, and Gamma noises, we show that Noise2Score significantly outperforms the state-of-the-art self-supervised denoising methods in the benchmark data set such as (C)BSD68, Set12, and Kodak, etc.

Via

Access Paper or Ask Questions

Task-Aware Variational Adversarial Active Learning

Feb 11, 2020

Kwanyoung Kim, Dongwon Park, Kwang In Kim, Se Young Chun

Figure 1 for Task-Aware Variational Adversarial Active Learning

Figure 2 for Task-Aware Variational Adversarial Active Learning

Figure 3 for Task-Aware Variational Adversarial Active Learning

Figure 4 for Task-Aware Variational Adversarial Active Learning

Abstract:Deep learning has achieved remarkable performance in various tasks thanks to massive labeled datasets. However, there are often cases where labeling large amount of data is challenging or infeasible due to high labeling cost such as labeling by experts or long labeling time per large-scale data sample (e.g., video, very large image). Active learning is one of the ways to query the most informative samples to be annotated among massive unlabeled pool. Two promising directions for active learning that have been recently explored are data distribution-based approach to select data points that are far from current labeled pool and model uncertainty-based approach that relies on the perspective of task model. Unfortunately, the former does not exploit structures from tasks and the latter does not seem to well-utilize overall data distribution. Here, we propose the methods that simultaneously take advantage of both data distribution and model uncertainty approaches. Our proposed methods exploit variational adversarial active learning (VAAL), that considered data distribution of both label and unlabeled pools, by incorporating learning loss prediction module and RankCGAN concept into VAAL by modeling loss prediction as a ranker. We demonstrate that our proposed methods outperform recent state-of-the-art active learning methods on various balanced and imbalanced benchmark datasets.

* 10 pages, 7 figures, 1 table

Via

Access Paper or Ask Questions