Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kyoungmin Lee

A Training-Free Style-aligned Image Generation with Scale-wise Autoregressive Model

Apr 08, 2025

Jihun Park, Jongmin Gim, Kyoungmin Lee, Minseok Oh, Minwoo Choi, Jaeyeul Kim, Woo Chool Park, Sunghoon Im

Figure 1 for A Training-Free Style-aligned Image Generation with Scale-wise Autoregressive Model

Figure 2 for A Training-Free Style-aligned Image Generation with Scale-wise Autoregressive Model

Figure 3 for A Training-Free Style-aligned Image Generation with Scale-wise Autoregressive Model

Figure 4 for A Training-Free Style-aligned Image Generation with Scale-wise Autoregressive Model

Abstract:We present a training-free style-aligned image generation method that leverages a scale-wise autoregressive model. While large-scale text-to-image (T2I) models, particularly diffusion-based methods, have demonstrated impressive generation quality, they often suffer from style misalignment across generated image sets and slow inference speeds, limiting their practical usability. To address these issues, we propose three key components: initial feature replacement to ensure consistent background appearance, pivotal feature interpolation to align object placement, and dynamic style injection, which reinforces style consistency using a schedule function. Unlike previous methods requiring fine-tuning or additional training, our approach maintains fast inference while preserving individual content details. Extensive experiments show that our method achieves generation quality comparable to competing approaches, significantly improves style alignment, and delivers inference speeds over six times faster than the fastest model.

* 17 pages, 15 figures

Via

Access Paper or Ask Questions

TEXTOC: Text-driven Object-Centric Style Transfer

Aug 16, 2024

Jihun Park, Jongmin Gim, Kyoungmin Lee, Seunghun Lee, Sunghoon Im

Figure 1 for TEXTOC: Text-driven Object-Centric Style Transfer

Figure 2 for TEXTOC: Text-driven Object-Centric Style Transfer

Figure 3 for TEXTOC: Text-driven Object-Centric Style Transfer

Figure 4 for TEXTOC: Text-driven Object-Centric Style Transfer

Abstract:We present Text-driven Object-Centric Style Transfer (TEXTOC), a novel method that guides style transfer at an object-centric level using textual inputs. The core of TEXTOC is our Patch-wise Co-Directional (PCD) loss, meticulously designed for precise object-centric transformations that are closely aligned with the input text. This loss combines a patch directional loss for text-guided style direction and a patch distribution consistency loss for even CLIP embedding distribution across object regions. It ensures a seamless and harmonious style transfer across object regions. Key to our method are the Text-Matched Patch Selection (TMPS) and Pre-fixed Region Selection (PRS) modules for identifying object locations via text, eliminating the need for segmentation masks. Lastly, we introduce an Adaptive Background Preservation (ABP) loss to maintain the original style and structural essence of the image's background. This loss is applied to dynamically identified background areas. Extensive experiments underline the effectiveness of our approach in creating visually coherent and textually aligned style transfers.

Via

Access Paper or Ask Questions

Residual Features and Unified Prediction Network for Single Stage Detection

Jan 05, 2018

Kyoungmin Lee, Jaeseok Choi, Jisoo Jeong, Nojun Kwak

Figure 1 for Residual Features and Unified Prediction Network for Single Stage Detection

Figure 2 for Residual Features and Unified Prediction Network for Single Stage Detection

Figure 3 for Residual Features and Unified Prediction Network for Single Stage Detection

Figure 4 for Residual Features and Unified Prediction Network for Single Stage Detection

Abstract:Recently, a lot of single stage detectors using multi-scale features have been actively proposed. They are much faster than two stage detectors that use region proposal networks (RPN) without much degradation in the detection performances. However, the feature maps in the lower layers close to the input which are responsible for detecting small objects in a single stage detector have a problem of insufficient representation power because they are too shallow. There is also a structural contradiction that the feature maps have to deliver low-level information to next layers as well as contain high-level abstraction for prediction. In this paper, we propose a method to enrich the representation power of feature maps using Resblock and deconvolution layers. In addition, a unified prediction module is applied to generalize output results and boost earlier layers' representation power for prediction. The proposed method enables more precise prediction, which achieved higher score than SSD on PASCAL VOC and MS COCO. In addition, it maintains the advantage of fast computation of a single stage detector, which requires much less computation than other detectors with similar performance. Code is available at https://github.com/kmlee-snu/run

Via

Access Paper or Ask Questions