Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Min-Cheol Sagong

Channel-wise Noise Scheduled Diffusion for Inverse Rendering in Indoor Scenes

Mar 13, 2025

JunYong Choi, Min-Cheol Sagong, SeokYeong Lee, Seung-Won Jung, Ig-Jae Kim, Junghyun Cho

Abstract:We propose a diffusion-based inverse rendering framework that decomposes a single RGB image into geometry, material, and lighting. Inverse rendering is inherently ill-posed, making it difficult to predict a single accurate solution. To address this challenge, recent generative model-based methods aim to present a range of possible solutions. However, finding a single accurate solution and generating diverse solutions can be conflicting. In this paper, we propose a channel-wise noise scheduling approach that allows a single diffusion model architecture to achieve two conflicting objectives. The resulting two diffusion models, trained with different channel-wise noise schedules, can predict a single highly accurate solution and present multiple possible solutions. The experimental results demonstrate the superiority of our two models in terms of both diversity and accuracy, which translates to enhanced performance in downstream applications such as object insertion and material editing.

* Accepted by CVPR 2025

Via

Access Paper or Ask Questions

Multi-View Pedestrian Occupancy Prediction with a Novel Synthetic Dataset

Dec 18, 2024

Sithu Aung, Min-Cheol Sagong, Junghyun Cho

Abstract:We address an advanced challenge of predicting pedestrian occupancy as an extension of multi-view pedestrian detection in urban traffic. To support this, we have created a new synthetic dataset called MVP-Occ, designed for dense pedestrian scenarios in large-scale scenes. Our dataset provides detailed representations of pedestrians using voxel structures, accompanied by rich semantic scene understanding labels, facilitating visual navigation and insights into pedestrian spatial information. Furthermore, we present a robust baseline model, termed OmniOcc, capable of predicting both the voxel occupancy state and panoptic labels for the entire scene from multi-view images. Through in-depth analysis, we identify and evaluate the key elements of our proposed model, highlighting their specific contributions and importance.

* AAAI 2025

Via

Access Paper or Ask Questions

VIGFace: Virtual Identity Generation Model for Face Image Synthesis

Mar 13, 2024

Minsoo Kim, Min-Cheol Sagong, Gi Pyo Nam, Junghyun Cho, Ig-Jae Kim

Abstract:Deep learning-based face recognition continues to face challenges due to its reliance on huge datasets obtained from web crawling, which can be costly to gather and raise significant real-world privacy concerns. To address this issue, we propose VIGFace, a novel framework capable of generating synthetic facial images. Initially, we train the face recognition model using a real face dataset and create a feature space for both real and virtual IDs where virtual prototypes are orthogonal to other prototypes. Subsequently, we generate synthetic images by using the diffusion model based on the feature space. Our proposed framework provides two significant benefits. Firstly, it allows for creating virtual facial images without concerns about portrait rights, guaranteeing that the generated virtual face images are clearly differentiated from existing individuals. Secondly, it serves as an effective augmentation method by incorporating real existing images. Further experiments demonstrate the efficacy of our framework, achieving state-of-the-art results from both perspectives without any external data.

Via

Access Paper or Ask Questions

Image Generation with Self Pixel-wise Normalization

Jan 26, 2022

Yoon-Jae Yeo, Min-Cheol Sagong, Seung Park, Sung-Jea Ko, Yong-Goo Shin

Abstract:Region-adaptive normalization (RAN) methods have been widely used in the generative adversarial network (GAN)-based image-to-image translation technique. However, since these approaches need a mask image to infer the pixel-wise affine transformation parameters, they cannot be applied to the general image generation models having no paired mask images. To resolve this problem, this paper presents a novel normalization method, called self pixel-wise normalization (SPN), which effectively boosts the generative performance by performing the pixel-adaptive affine transformation without the mask image. In our method, the transforming parameters are derived from a self-latent mask that divides the feature map into the foreground and background regions. The visualization of the self-latent masks shows that SPN effectively captures a single object to be generated as the foreground. Since the proposed method produces the self-latent mask without external data, it is easily applicable in the existing generative models. Extensive experiments on various datasets reveal that the proposed method significantly improves the performance of image generation technique in terms of Frechet inception distance (FID) and Inception score (IS).

* 13 pages, 8 figures

Via

Access Paper or Ask Questions

Content-aware Directed Propagation Network with Pixel Adaptive Kernel Attention

Jul 28, 2021

Min-Cheol Sagong, Yoon-Jae Yeo, Seung-Won Jung, Sung-Jea Ko

Figure 1 for Content-aware Directed Propagation Network with Pixel Adaptive Kernel Attention

Figure 2 for Content-aware Directed Propagation Network with Pixel Adaptive Kernel Attention

Figure 3 for Content-aware Directed Propagation Network with Pixel Adaptive Kernel Attention

Figure 4 for Content-aware Directed Propagation Network with Pixel Adaptive Kernel Attention

Abstract:Convolutional neural networks (CNNs) have been not only widespread but also achieved noticeable results on numerous applications including image classification, restoration, and generation. Although the weight-sharing property of convolutions makes them widely adopted in various tasks, its content-agnostic characteristic can also be considered a major drawback. To solve this problem, in this paper, we propose a novel operation, called pixel adaptive kernel attention (PAKA). PAKA provides directivity to the filter weights by multiplying spatially varying attention from learnable features. The proposed method infers pixel-adaptive attention maps along the channel and spatial directions separately to address the decomposed model with fewer parameters. Our method is trainable in an end-to-end manner and applicable to any CNN-based models. In addition, we propose an improved information aggregation module with PAKA, called the hierarchical PAKA module (HPM). We demonstrate the superiority of our HPM by presenting state-of-the-art performance on semantic segmentation compared to the conventional information aggregation modules. We validate the proposed method through additional ablation studies and visualizing the effect of PAKA providing directivity to the weights of convolutions. We also show the generalizability of the proposed method by applying it to multi-modal tasks especially color-guided depth map super-resolution.

* submitted to IEEE transactions on Neural Networks and Learning System

Via

Access Paper or Ask Questions

A Simple yet Effective Way for Improving the Performance of GANs

Dec 11, 2019

Yong-Goo Shin, Yoon-Jae Yeo, Min-Cheol Sagong, Cheol-Hwan Yoo, Sung-Jea Ko

Figure 1 for A Simple yet Effective Way for Improving the Performance of GANs

Figure 2 for A Simple yet Effective Way for Improving the Performance of GANs

Figure 3 for A Simple yet Effective Way for Improving the Performance of GANs

Figure 4 for A Simple yet Effective Way for Improving the Performance of GANs

Abstract:This paper presents a simple but effective way that improves the performance of generative adversarial networks (GANs) without imposing the training overhead or modifying the network architectures of existing methods. The proposed method employs a novel cascading rejection (CR) module for discriminator, which extracts multiple non-overlapped features in an iterative manner. The CR module supports the discriminator to effectively distinguish between real and generated images, which results in a strong penalization to the generator. In order to deceive the robust discriminator containing the CR module, the generator produces the images that are more similar to the real images. Since the proposed CR module requires only a few simple vector operations, it can be readily applied to existing frameworks with marginal training overheads. Quantitative evaluations on various datasets including CIFAR-10, Celeb-HQ, LSUN, and tiny-ImageNet confirm that the proposed method significantly improves the performance of GANs and conditional GANs in terms of Frechet inception distance (FID) indicating the diversity and visual appearance of the generated images.

Via

Access Paper or Ask Questions

cGANs with Conditional Convolution Layer

Jun 03, 2019

Min-Cheol Sagong, Yong-Goo Shin, Yoon-Jae Yeo, Seung Park, Sung-Jea Ko

Figure 1 for cGANs with Conditional Convolution Layer

Figure 2 for cGANs with Conditional Convolution Layer

Figure 3 for cGANs with Conditional Convolution Layer

Figure 4 for cGANs with Conditional Convolution Layer

Abstract:Conditional generative adversarial networks (cGANs) have been widely researched to generate class conditional images using a single generator. However, in the conventional cGANs techniques, it is still challenging for the generator to learn condition-specific features, since a standard convolutional layer with the same weights is used regardless of the condition. In this paper, we propose a novel convolution layer, called the conditional convolution layer, which directly generates different feature maps by employing the weights which are adjusted depending on the conditions. More specifically, in each conditional convolution layer, the weights are conditioned in a simple but effective way through filter-wise scaling and channel-wise shifting operations. In contrast to the conventional methods, the proposed method with a single generator can effectively handle condition-specific characteristics. The experimental results on CIFAR, LSUN and ImageNet datasets show that the generator with the proposed conditional convolution layer achieves a higher quality of conditional image generation than that with the standard convolution layer.

* Submitted to International Journal of Computer Vision (IJCV)

Via

Access Paper or Ask Questions

PEPSI++: Fast and Lightweight Network for Image Inpainting

May 22, 2019

Yong-Goo Shin, Min-Cheol Sagong, Yoon-Jae Yeo, Seung-Wook Kim, Sung-Jea Ko

Figure 1 for PEPSI++: Fast and Lightweight Network for Image Inpainting

Figure 2 for PEPSI++: Fast and Lightweight Network for Image Inpainting

Figure 3 for PEPSI++: Fast and Lightweight Network for Image Inpainting

Figure 4 for PEPSI++: Fast and Lightweight Network for Image Inpainting

Abstract:Generative adversarial network (GAN)-based image inpainting methods which utilize coarse-to-fine network with a contextual attention module (CAM) have shown remarkable performance. However, they require numerous computational resources such as convolution operations and network parameters due to two stacked generative networks, which results in a low speed. To address this problem, we propose a novel network structure called PEPSI: parallel extended-decoder path for semantic inpainting network, which aims at not only reducing hardware costs but also improving the inpainting performance. The PEPSI consists of a single shared encoding network and parallel decoding networks with coarse and inpainting paths. The coarse path generates a preliminary inpainting result to train the encoding network for prediction of features for the CAM. At the same time, the inpainting path results in higher inpainting quality with refined features reconstructed using the CAM. In addition, we propose a Diet-PEPSI which significantly reduces the network parameters while maintaining the performance. In the proposed method, we present a Diet-PEPSI unit (DPU) which effectively aggregates the global contextual information with a small number of parameters. Extensive experiments and comparisons with state-of-the-art image inpainting methods demonstrate that both PEPSI and Diet-PEPSI achieve significant improvements in qualitative scores and reduced computation cost.

* Submitted to IEEE transactions on Neural Networks and Learning Systems

Via

Access Paper or Ask Questions