Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Minho Park

Concept-Aware LoRA for Domain-Aligned Segmentation Dataset Generation

Mar 28, 2025

Minho Park, Sunghyun Park, Jungsoo Lee, Hyojin Park, Kyuwoong Hwang, Fatih Porikli, Jaegul Choo, Sungha Choi

Abstract:This paper addresses the challenge of data scarcity in semantic segmentation by generating datasets through text-to-image (T2I) generation models, reducing image acquisition and labeling costs. Segmentation dataset generation faces two key challenges: 1) aligning generated samples with the target domain and 2) producing informative samples beyond the training data. Fine-tuning T2I models can help generate samples aligned with the target domain. However, it often overfits and memorizes training data, limiting their ability to generate diverse and well-aligned samples. To overcome these issues, we propose Concept-Aware LoRA (CA-LoRA), a novel fine-tuning approach that selectively identifies and updates only the weights associated with necessary concepts (e.g., style or viewpoint) for domain alignment while preserving the pretrained knowledge of the T2I model to produce informative samples. We demonstrate its effectiveness in generating datasets for urban-scene segmentation, outperforming baseline and state-of-the-art methods in in-domain (few-shot and fully-supervised) settings, as well as in domain generalization tasks, especially under challenging conditions such as adverse weather and varying illumination, further highlighting its superiority.

Via

Access Paper or Ask Questions

Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling

Nov 21, 2024

Daehoon Gwak, Junwoo Park, Minho Park, Chaehun Park, Hyunchan Lee, Edward Choi, Jaegul Choo

Figure 1 for Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling

Figure 2 for Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling

Figure 3 for Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling

Figure 4 for Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling

Abstract:Predicting future international events from textual information, such as news articles, has tremendous potential for applications in global policy, strategic decision-making, and geopolitics. However, existing datasets available for this task are often limited in quality, hindering the progress of related research. In this paper, we introduce WORLDREP (WORLD Relationship and Event Prediction), a novel dataset designed to address these limitations by leveraging the advanced reasoning capabilities of large-language models (LLMs). Our dataset features high-quality scoring labels generated through advanced prompt modeling and rigorously validated by domain experts in political science. We showcase the quality and utility of WORLDREP for real-world event prediction tasks, demonstrating its effectiveness through extensive experiments and analysis. Furthermore, we publicly release our dataset along with the full automation source code for data collection, labeling, and benchmarking, aiming to support and advance research in text-based event prediction.

* EMNLP 2024 Findings

Via

Access Paper or Ask Questions

Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models

Jun 08, 2024

Minho Park, Sunghyun Park, Jooyeol Yun, Jaegul Choo

Figure 1 for Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models

Figure 2 for Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models

Figure 3 for Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models

Figure 4 for Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models

Abstract:Recent advancements in text-to-image generation have inspired researchers to generate datasets tailored for perception models using generative models, which prove particularly valuable in scenarios where real-world data is limited. In this study, our goal is to address the challenges when fine-tuning vision-language models (e.g., CLIP) on generated datasets. Specifically, we aim to fine-tune vision-language models to a specific classification model without access to any real images, also known as name-only transfer. However, despite the high fidelity of generated images, we observed a significant performance degradation when fine-tuning the model using the generated datasets due to the domain gap between real and generated images. To overcome the domain gap, we provide two regularization methods for training and post-training, respectively. First, we leverage the domain-agnostic knowledge from the original pre-trained vision-language model by conducting the weight-space ensemble of the fine-tuned model on the generated dataset with the original pre-trained model at the post-training. Secondly, we reveal that fine-tuned models with high feature diversity score high performance in the real domain, which indicates that increasing feature diversity prevents learning the generated domain-specific knowledge. Thus, we encourage feature diversity by providing additional regularization at training time. Extensive experiments on various classification datasets and various text-to-image generation models demonstrated that our analysis and regularization techniques effectively mitigate the domain gap, which has long been overlooked, and enable us to achieve state-of-the-art performance by training with generated images. Code is available at https://github.com/pmh9960/regft-for-gen

* Preprint. Under review

Via

Access Paper or Ask Questions

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Dec 04, 2023

Jeongho Kim, Gyojung Gu, Minho Park, Sunghyun Park, Jaegul Choo

Figure 1 for StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Figure 2 for StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Figure 3 for StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Figure 4 for StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Abstract:Given a clothing image and a person image, an image-based virtual try-on aims to generate a customized image that appears natural and accurately reflects the characteristics of the clothing image. In this work, we aim to expand the applicability of the pre-trained diffusion model so that it can be utilized independently for the virtual try-on task.The main challenge is to preserve the clothing details while effectively utilizing the robust generative capability of the pre-trained model. In order to tackle these issues, we propose StableVITON, learning the semantic correspondence between the clothing and the human body within the latent space of the pre-trained diffusion model in an end-to-end manner. Our proposed zero cross-attention blocks not only preserve the clothing details by learning the semantic correspondence but also generate high-fidelity images by utilizing the inherent knowledge of the pre-trained model in the warping process. Through our proposed novel attention total variation loss and applying augmentation, we achieve the sharp attention map, resulting in a more precise representation of clothing details. StableVITON outperforms the baselines in qualitative and quantitative evaluation, showing promising quality in arbitrary person images. Our code is available at https://github.com/rlawjdghek/StableVITON.

* 17 pages

Via

Access Paper or Ask Questions

Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis

Aug 16, 2023

Minho Park, Jooyeol Yun, Seunghwan Choi, Jaegul Choo

Abstract:Existing text-to-image generation approaches have set high standards for photorealism and text-image correspondence, largely benefiting from web-scale text-image datasets, which can include up to 5~billion pairs. However, text-to-image generation models trained on domain-specific datasets, such as urban scenes, medical images, and faces, still suffer from low text-image correspondence due to the lack of text-image pairs. Additionally, collecting billions of text-image pairs for a specific domain can be time-consuming and costly. Thus, ensuring high text-image correspondence without relying on web-scale text-image datasets remains a challenging task. In this paper, we present a novel approach for enhancing text-image correspondence by leveraging available semantic layouts. Specifically, we propose a Gaussian-categorical diffusion process that simultaneously generates both images and corresponding layout pairs. Our experiments reveal that we can guide text-to-image generation models to be aware of the semantics of different image regions, by training the model to generate semantic labels for each pixel. We demonstrate that our approach achieves higher text-image correspondence compared to existing text-to-image generation approaches in the Multi-Modal CelebA-HQ and the Cityscapes dataset, where text-image pairs are scarce. Codes are available in this https://pmh9960.github.io/research/GCDP

* Accepted to ICCV 2023

Via

Access Paper or Ask Questions

iColoriT: Towards Propagating Local Hint to the Right Region in Interactive Colorization by Leveraging Vision Transformer

Jul 15, 2022

Sanghyeon Lee, Jooyeol Yun, Minho Park, Jaegul Choo

Figure 1 for iColoriT: Towards Propagating Local Hint to the Right Region in Interactive Colorization by Leveraging Vision Transformer

Figure 2 for iColoriT: Towards Propagating Local Hint to the Right Region in Interactive Colorization by Leveraging Vision Transformer

Figure 3 for iColoriT: Towards Propagating Local Hint to the Right Region in Interactive Colorization by Leveraging Vision Transformer

Figure 4 for iColoriT: Towards Propagating Local Hint to the Right Region in Interactive Colorization by Leveraging Vision Transformer

Abstract:Point-interactive image colorization aims to colorize grayscale images when a user provides the colors for specific locations. It is essential for point-interactive colorization methods to appropriately propagate user-provided colors (i.e., user hints) in the entire image to obtain a reasonably colorized image with minimal user effort. However, existing approaches often produce partially colorized results due to the inefficient design of stacking convolutional layers to propagate hints to distant relevant regions. To address this problem, we present iColoriT, a novel point-interactive colorization Vision Transformer capable of propagating user hints to relevant regions, leveraging the global receptive field of Transformers. The self-attention mechanism of Transformers enables iColoriT to selectively colorize relevant regions with only a few local hints. Our approach colorizes images in real-time by utilizing pixel shuffling, an efficient upsampling technique that replaces the decoder architecture. Also, in order to mitigate the artifacts caused by pixel shuffling with large upsampling ratios, we present the local stabilizing layer. Extensive quantitative and qualitative results demonstrate that our approach highly outperforms existing methods for point-interactive colorization, producing accurately colorized images with a user's minimal effort.

Via

Access Paper or Ask Questions

Visual Comfort Aware-Reinforcement Learning for Depth Adjustment of Stereoscopic 3D Images

Apr 14, 2021

Hak Gu Kim, Minho Park, Sangmin Lee, Seongyeop Kim, Yong Man Ro

Figure 1 for Visual Comfort Aware-Reinforcement Learning for Depth Adjustment of Stereoscopic 3D Images

Figure 2 for Visual Comfort Aware-Reinforcement Learning for Depth Adjustment of Stereoscopic 3D Images

Figure 3 for Visual Comfort Aware-Reinforcement Learning for Depth Adjustment of Stereoscopic 3D Images

Figure 4 for Visual Comfort Aware-Reinforcement Learning for Depth Adjustment of Stereoscopic 3D Images

Abstract:Depth adjustment aims to enhance the visual experience of stereoscopic 3D (S3D) images, which accompanied with improving visual comfort and depth perception. For a human expert, the depth adjustment procedure is a sequence of iterative decision making. The human expert iteratively adjusts the depth until he is satisfied with the both levels of visual comfort and the perceived depth. In this work, we present a novel deep reinforcement learning (DRL)-based approach for depth adjustment named VCA-RL (Visual Comfort Aware Reinforcement Learning) to explicitly model human sequential decision making in depth editing operations. We formulate the depth adjustment process as a Markov decision process where actions are defined as camera movement operations to control the distance between the left and right cameras. Our agent is trained based on the guidance of an objective visual comfort assessment metric to learn the optimal sequence of camera movement actions in terms of perceptual aspects in stereoscopic viewing. With extensive experiments and user studies, we show the effectiveness of our VCA-RL model on three different S3D databases.

* AAAI 2021

Via

Access Paper or Ask Questions

Generative Guiding Block: Synthesizing Realistic Looking Variants Capable of Even Large Change Demands

Jul 02, 2019

Minho Park, Hak Gu Kim, Yong Man Ro

Figure 1 for Generative Guiding Block: Synthesizing Realistic Looking Variants Capable of Even Large Change Demands

Figure 2 for Generative Guiding Block: Synthesizing Realistic Looking Variants Capable of Even Large Change Demands

Figure 3 for Generative Guiding Block: Synthesizing Realistic Looking Variants Capable of Even Large Change Demands

Figure 4 for Generative Guiding Block: Synthesizing Realistic Looking Variants Capable of Even Large Change Demands

Abstract:Realistic image synthesis is to generate an image that is perceptually indistinguishable from an actual image. Generating realistic looking images with large variations (e.g., large spatial deformations and large pose change), however, is very challenging. Handing large variations as well as preserving appearance needs to be taken into account in the realistic looking image generation. In this paper, we propose a novel realistic looking image synthesis method, especially in large change demands. To do that, we devise generative guiding blocks. The proposed generative guiding block includes realistic appearance preserving discriminator and naturalistic variation transforming discriminator. By taking the proposed generative guiding blocks into generative model, the latent features at the layer of generative model are enhanced to synthesize both realistic looking- and target variation- image. With qualitative and quantitative evaluation in experiments, we demonstrated the effectiveness of the proposed generative guiding blocks, compared to the state-of-the-arts.

* This work is accepted in ICIP 2019

Via

Access Paper or Ask Questions