Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Serin Yang

ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler

Oct 08, 2024

Serin Yang, Taesung Kwon, Jong Chul Ye

Abstract:Recent progress in large-scale text-to-video (T2V) and image-to-video (I2V) diffusion models has greatly enhanced video generation, especially in terms of keyframe interpolation. However, current image-to-video diffusion models, while powerful in generating videos from a single conditioning frame, need adaptation for two-frame (start & end) conditioned generation, which is essential for effective bounded interpolation. Unfortunately, existing approaches that fuse temporally forward and backward paths in parallel often suffer from off-manifold issues, leading to artifacts or requiring multiple iterative re-noising steps. In this work, we introduce a novel, bidirectional sampling strategy to address these off-manifold issues without requiring extensive re-noising or fine-tuning. Our method employs sequential sampling along both forward and backward paths, conditioned on the start and end frames, respectively, ensuring more coherent and on-manifold generation of intermediate frames. Additionally, we incorporate advanced guidance techniques, CFG++ and DDS, to further enhance the interpolation process. By integrating these, our method achieves state-of-the-art performance, efficiently generating high-quality, smooth videos between keyframes. On a single 3090 GPU, our method can interpolate 25 frames at 1024 x 576 resolution in just 195 seconds, establishing it as a leading solution for keyframe interpolation.

* Project page: https://vibid.github.io/

Via

Access Paper or Ask Questions

Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer

Mar 15, 2023

Serin Yang, Hyunmin Hwang, Jong Chul Ye

Figure 1 for Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer

Figure 2 for Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer

Figure 3 for Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer

Figure 4 for Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer

Abstract:Diffusion models have shown great promise in text-guided image style transfer, but there is a trade-off between style transformation and content preservation due to their stochastic nature. Existing methods require computationally expensive fine-tuning of diffusion models or additional neural network. To address this, here we propose a zero-shot contrastive loss for diffusion models that doesn't require additional fine-tuning or auxiliary networks. By leveraging patch-wise contrastive loss between generated samples and original image embeddings in the pre-trained diffusion model, our method can generate images with the same semantic content as the source image in a zero-shot manner. Our approach outperforms existing methods while preserving content and requiring no additional training, not only for image style transfer but also for image-to-image translation and manipulation. Our experimental results validate the effectiveness of our proposed method.

Via

Access Paper or Ask Questions

Highly Personalized Text Embedding for Image Manipulation by Stable Diffusion

Mar 15, 2023

Inhwa Han, Serin Yang, Taesung Kwon, Jong Chul Ye

Figure 1 for Highly Personalized Text Embedding for Image Manipulation by Stable Diffusion

Figure 2 for Highly Personalized Text Embedding for Image Manipulation by Stable Diffusion

Figure 3 for Highly Personalized Text Embedding for Image Manipulation by Stable Diffusion

Figure 4 for Highly Personalized Text Embedding for Image Manipulation by Stable Diffusion

Abstract:Diffusion models have shown superior performance in image generation and manipulation, but the inherent stochasticity presents challenges in preserving and manipulating image content and identity. While previous approaches like DreamBooth and Textual Inversion have proposed model or latent representation personalization to maintain the content, their reliance on multiple reference images and complex training limits their practicality. In this paper, we present a simple yet highly effective approach to personalization using highly personalized (HiPer) text embedding by decomposing the CLIP embedding space for personalization and content manipulation. Our method does not require model fine-tuning or identifiers, yet still enables manipulation of background, texture, and motion with just a single image and target text. Through experiments on diverse target texts, we demonstrate that our approach produces highly personalized and complex semantic image edits across a wide range of tasks. We believe that the novel understanding of the text embedding space presented in this work has the potential to inspire further research across various tasks.

Via

Access Paper or Ask Questions

Continuous Conversion of CT Kernel using Switchable CycleGAN with AdaIN

Nov 26, 2020

Serin Yang, Eung Yeop Kim, Jong Chul Ye

Figure 1 for Continuous Conversion of CT Kernel using Switchable CycleGAN with AdaIN

Figure 2 for Continuous Conversion of CT Kernel using Switchable CycleGAN with AdaIN

Figure 3 for Continuous Conversion of CT Kernel using Switchable CycleGAN with AdaIN

Figure 4 for Continuous Conversion of CT Kernel using Switchable CycleGAN with AdaIN

Abstract:In X-ray computed tomography (CT) reconstruction, different filter kernels are used for different structures being emphasized. Since the raw sinogram data is usually removed after reconstruction, in case there are additional requirements for reconstructed images with other types of kernels that were not previously generated, the patient may need to be scanned again. Accordingly, there exists increasing demand for post-hoc image domain conversion from one kernel to another without sacrificing the image content. In this paper, we propose a novel unsupervised kernel conversion method using cycle-consistent generative adversarial network (cycleGAN) with adaptive instance normalization (AdaIN). In contrast to the existing deep learning approaches for kernel conversion, our method does not require paired dataset for training. In addition, our network can not only translate the images between two different kernels but also generate images on every interpolating path along an optimal transport between the two kernel image domains, enabling synergestic combination of the two filter kernels. Experimental results confirm the advantages of the proposed algorithm.

Via

Access Paper or Ask Questions