Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wook-Shin Han

Locality-Aware Generalizable Implicit Neural Representation

Oct 12, 2023

Doyup Lee, Chiheon Kim, Minsu Cho, Wook-Shin Han

Figure 1 for Locality-Aware Generalizable Implicit Neural Representation

Figure 2 for Locality-Aware Generalizable Implicit Neural Representation

Figure 3 for Locality-Aware Generalizable Implicit Neural Representation

Figure 4 for Locality-Aware Generalizable Implicit Neural Representation

Abstract:Generalizable implicit neural representation (INR) enables a single continuous function, i.e., a coordinate-based neural network, to represent multiple data instances by modulating its weights or intermediate features using latent codes. However, the expressive power of the state-of-the-art modulation is limited due to its inability to localize and capture fine-grained details of data entities such as specific pixels and rays. To address this issue, we propose a novel framework for generalizable INR that combines a transformer encoder with a locality-aware INR decoder. The transformer encoder predicts a set of latent tokens from a data instance to encode local information into each latent token. The locality-aware INR decoder extracts a modulation vector by selectively aggregating the latent tokens via cross-attention for a coordinate input and then predicts the output by progressively decoding with coarse-to-fine modulation through multiple frequency bandwidths. The selective token aggregation and the multi-band feature modulation enable us to learn locality-aware representation in spatial and spectral aspects, respectively. Our framework significantly outperforms previous generalizable INRs and validates the usefulness of the locality-aware latents for downstream tasks such as image generation.

* 19 pages, 12 figures

Via

Access Paper or Ask Questions

Generalizable Implicit Neural Representations via Instance Pattern Composers

Nov 23, 2022

Chiheon Kim, Doyup Lee, Saehoon Kim, Minsu Cho, Wook-Shin Han

Figure 1 for Generalizable Implicit Neural Representations via Instance Pattern Composers

Figure 2 for Generalizable Implicit Neural Representations via Instance Pattern Composers

Figure 3 for Generalizable Implicit Neural Representations via Instance Pattern Composers

Figure 4 for Generalizable Implicit Neural Representations via Instance Pattern Composers

Abstract:Despite recent advances in implicit neural representations (INRs), it remains challenging for a coordinate-based multi-layer perceptron (MLP) of INRs to learn a common representation across data instances and generalize it for unseen instances. In this work, we introduce a simple yet effective framework for generalizable INRs that enables a coordinate-based MLP to represent complex data instances by modulating only a small set of weights in an early MLP layer as an instance pattern composer; the remaining MLP weights learn pattern composition rules for common representations across instances. Our generalizable INR framework is fully compatible with existing meta-learning and hypernetworks in learning to predict the modulated weight for unseen instances. Extensive experiments demonstrate that our method achieves high performance on a wide range of domains such as an audio, image, and 3D object, while the ablation study validates our weight modulation.

* 14 pages, 11 figures

Via

Access Paper or Ask Questions

Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer

Jun 09, 2022

Doyup Lee, Chiheon Kim, Saehoon Kim, Minsu Cho, Wook-Shin Han

Figure 1 for Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer

Figure 2 for Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer

Figure 3 for Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer

Figure 4 for Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer

Abstract:Although autoregressive models have achieved promising results on image generation, their unidirectional generation process prevents the resultant images from fully reflecting global contexts. To address the issue, we propose an effective image generation framework of Draft-and-Revise with Contextual RQ-transformer to consider global contexts during the generation process. As a generalized VQ-VAE, RQ-VAE first represents a high-resolution image as a sequence of discrete code stacks. After code stacks in the sequence are randomly masked, Contextual RQ-Transformer is trained to infill the masked code stacks based on the unmasked contexts of the image. Then, Contextual RQ-Transformer uses our two-phase decoding, Draft-and-Revise, and generates an image, while exploiting the global contexts of the image during the generation process. Specifically. in the draft phase, our model first focuses on generating diverse images despite rather low quality. Then, in the revise phase, the model iteratively improves the quality of images, while preserving the global contexts of generated images. In experiments, our method achieves state-of-the-art results on conditional image generation. We also validate that the Draft-and-Revise decoding can achieve high performance by effectively controlling the quality-diversity trade-off in image generation.

* 20 pages, 11 figures

Via

Access Paper or Ask Questions

Autoregressive Image Generation using Residual Quantization

Mar 09, 2022

Doyup Lee, Chiheon Kim, Saehoon Kim, Minsu Cho, Wook-Shin Han

Figure 1 for Autoregressive Image Generation using Residual Quantization

Figure 2 for Autoregressive Image Generation using Residual Quantization

Figure 3 for Autoregressive Image Generation using Residual Quantization

Figure 4 for Autoregressive Image Generation using Residual Quantization

Abstract:For autoregressive (AR) modeling of high-resolution images, vector quantization (VQ) represents an image as a sequence of discrete codes. A short sequence length is important for an AR model to reduce its computational costs to consider long-range interactions of codes. However, we postulate that previous VQ cannot shorten the code sequence and generate high-fidelity images together in terms of the rate-distortion trade-off. In this study, we propose the two-stage framework, which consists of Residual-Quantized VAE (RQ-VAE) and RQ-Transformer, to effectively generate high-resolution images. Given a fixed codebook size, RQ-VAE can precisely approximate a feature map of an image and represent the image as a stacked map of discrete codes. Then, RQ-Transformer learns to predict the quantized feature vector at the next position by predicting the next stack of codes. Thanks to the precise approximation of RQ-VAE, we can represent a 256$\times$256 image as 8$\times$8 resolution of the feature map, and RQ-Transformer can efficiently reduce the computational costs. Consequently, our framework outperforms the existing AR models on various benchmarks of unconditional and conditional image generation. Our approach also has a significantly faster sampling speed than previous AR models to generate high-quality images.

* 30 pages, 24 figures, accepted by CVPR 2022, the code is available at https://github.com/kakaobrain/rq-vae-transformer

Via

Access Paper or Ask Questions

Contrastive Regularization for Semi-Supervised Learning

Jan 17, 2022

Doyup Lee, Sungwoong Kim, Ildoo Kim, Yeongjae Cheon, Minsu Cho, Wook-Shin Han

Figure 1 for Contrastive Regularization for Semi-Supervised Learning

Figure 2 for Contrastive Regularization for Semi-Supervised Learning

Figure 3 for Contrastive Regularization for Semi-Supervised Learning

Figure 4 for Contrastive Regularization for Semi-Supervised Learning

Abstract:Consistency regularization on label predictions becomes a fundamental technique in semi-supervised learning, but it still requires a large number of training iterations for high performance. In this study, we analyze that the consistency regularization restricts the propagation of labeling information due to the exclusion of samples with unconfident pseudo-labels in the model updates. Then, we propose contrastive regularization to improve both efficiency and accuracy of the consistency regularization by well-clustered features of unlabeled data. In specific, after strongly augmented samples are assigned to clusters by their pseudo-labels, our contrastive regularization updates the model so that the features with confident pseudo-labels aggregate the features in the same cluster, while pushing away features in different clusters. As a result, the information of confident pseudo-labels can be effectively propagated into more unlabeled samples during training by the well-clustered features. On benchmarks of semi-supervised learning tasks, our contrastive regularization improves the previous consistency-based methods and achieves state-of-the-art results, especially with fewer training iterations. Our method also shows robust performance on open-set semi-supervised learning where unlabeled data includes out-of-distribution samples.

* The code is available at https://github.com/LeeDoYup/Contrastive_Regularization_for_SSL

Via

Access Paper or Ask Questions

Regularizing Attention Networks for Anomaly Detection in Visual Question Answering

Sep 21, 2020

Doyup Lee, Yeongjae Cheon, Wook-Shin Han

Figure 1 for Regularizing Attention Networks for Anomaly Detection in Visual Question Answering

Figure 2 for Regularizing Attention Networks for Anomaly Detection in Visual Question Answering

Figure 3 for Regularizing Attention Networks for Anomaly Detection in Visual Question Answering

Figure 4 for Regularizing Attention Networks for Anomaly Detection in Visual Question Answering

Abstract:For stability and reliability of real-world applications, the robustness of DNNs in unimodal tasks has been evaluated. However, few studies consider abnormal situations that a visual question answering (VQA) model might encounter at test time after deployment in the real-world. In this study, we evaluate the robustness of state-of-the-art VQA models to five different anomalies, including worst-case scenarios, the most frequent scenarios, and the current limitation of VQA models. Different from the results in unimodal tasks, the maximum confidence of answers in VQA models cannot detect anomalous inputs, and post-training of the outputs, such as outlier exposure, is ineffective for VQA models. Thus, we propose an attention-based method, which uses confidence of reasoning between input images and questions and shows much more promising results than the previous methods in unimodal tasks. In addition, we show that a maximum entropy regularization of attention networks can significantly improve the attention-based anomaly detection of the VQA models. Thanks to the simplicity, attention-based anomaly detection and the regularization are model-agnostic methods, which can be used for various cross-modal attentions in the state-of-the-art VQA models. The results imply that cross-modal attention in VQA is important to improve not only VQA accuracy, but also the robustness to various anomalies.

* 16 pages, 7 figures

Via

Access Paper or Ask Questions