Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Junho Lee

Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation

Nov 03, 2024

Seongsu Ha, Chaeyun Kim, Donghwa Kim, Junho Lee, Sangho Lee, Joonseok Lee

Figure 1 for Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation

Figure 2 for Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation

Figure 3 for Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation

Figure 4 for Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation

Abstract:Referring Image Segmentation is a comprehensive task to segment an object referred by a textual query from an image. In nature, the level of difficulty in this task is affected by the existence of similar objects and the complexity of the referring expression. Recent RIS models still show a significant performance gap between easy and hard scenarios. We pose that the bottleneck exists in the data, and propose a simple but powerful data augmentation method, Negative-mined Mosaic Augmentation (NeMo). This method augments a training image into a mosaic with three other negative images carefully curated by a pretrained multimodal alignment model, e.g., CLIP, to make the sample more challenging. We discover that it is critical to properly adjust the difficulty level, neither too ambiguous nor too trivial. The augmented training data encourages the RIS model to recognize subtle differences and relationships between similar visual entities and to concretely understand the whole expression to locate the right target better. Our approach shows consistent improvements on various datasets and models, verified by extensive experiments.

* Accepted at ECCV 2024. Project page: https://dddonghwa.github.io/NeMo/

Via

Access Paper or Ask Questions

Scalable Frame Sampling for Video Classification: A Semi-Optimal Policy Approach with Reduced Search Space

Sep 09, 2024

Junho Lee, Jeongwoo Shin, Seung Woo Ko, Seongsu Ha, Joonseok Lee

Abstract:Given a video with $T$ frames, frame sampling is a task to select $N \ll T$ frames, so as to maximize the performance of a fixed video classifier. Not just brute-force search, but most existing methods suffer from its vast search space of $\binom{T}{N}$, especially when $N$ gets large. To address this challenge, we introduce a novel perspective of reducing the search space from $O(T^N)$ to $O(T)$. Instead of exploring the entire $O(T^N)$ space, our proposed semi-optimal policy selects the top $N$ frames based on the independently estimated value of each frame using per-frame confidence, significantly reducing the computational complexity. We verify that our semi-optimal policy can efficiently approximate the optimal policy, particularly under practical settings. Additionally, through extensive experiments on various datasets and model architectures, we demonstrate that learning our semi-optimal policy ensures stable and high performance regardless of the size of $N$ and $T$.

Via

Access Paper or Ask Questions

Isometric Representation Learning for Disentangled Latent Space of Diffusion Models

Jul 16, 2024

Jaehoon Hahm, Junho Lee, Sunghyun Kim, Joonseok Lee

Abstract:The latent space of diffusion model mostly still remains unexplored, despite its great success and potential in the field of generative modeling. In fact, the latent space of existing diffusion models are entangled, with a distorted mapping from its latent space to image space. To tackle this problem, we present Isometric Diffusion, equipping a diffusion model with a geometric regularizer to guide the model to learn a geometrically sound latent space of the training data manifold. This approach allows diffusion models to learn a more disentangled latent space, which enables smoother interpolation, more accurate inversion, and more precise control over attributes directly in the latent space. Our extensive experiments consisting of image interpolations, image inversions, and linear editing show the effectiveness of our method.

* Forty-first International Conference on Machine Learning (ICML 2024)

Via

Access Paper or Ask Questions

Semi-Supervised Domain Adaptation Using Target-Oriented Domain Augmentation for 3D Object Detection

Jun 17, 2024

Yecheol Kim, Junho Lee, Changsoo Park, Hyoung won Kim, Inho Lim, Christopher Chang, Jun Won Choi

Figure 1 for Semi-Supervised Domain Adaptation Using Target-Oriented Domain Augmentation for 3D Object Detection

Figure 2 for Semi-Supervised Domain Adaptation Using Target-Oriented Domain Augmentation for 3D Object Detection

Figure 3 for Semi-Supervised Domain Adaptation Using Target-Oriented Domain Augmentation for 3D Object Detection

Figure 4 for Semi-Supervised Domain Adaptation Using Target-Oriented Domain Augmentation for 3D Object Detection

Abstract:3D object detection is crucial for applications like autonomous driving and robotics. However, in real-world environments, variations in sensor data distribution due to sensor upgrades, weather changes, and geographic differences can adversely affect detection performance. Semi-Supervised Domain Adaptation (SSDA) aims to mitigate these challenges by transferring knowledge from a source domain, abundant in labeled data, to a target domain where labels are scarce. This paper presents a new SSDA method referred to as Target-Oriented Domain Augmentation (TODA) specifically tailored for LiDAR-based 3D object detection. TODA efficiently utilizes all available data, including labeled data in the source domain, and both labeled data and unlabeled data in the target domain to enhance domain adaptation performance. TODA consists of two stages: TargetMix and AdvMix. TargetMix employs mixing augmentation accounting for LiDAR sensor characteristics to facilitate feature alignment between the source-domain and target-domain. AdvMix applies point-wise adversarial augmentation with mixing augmentation, which perturbs the unlabeled data to align the features within both labeled and unlabeled data in the target domain. Our experiments conducted on the challenging domain adaptation tasks demonstrate that TODA outperforms existing domain adaptation techniques designed for 3D object detection by significant margins. The code is available at: https://github.com/rasd3/TODA.

* Accepted to IEEE Transactions on Intelligent Vehicles (T-IV). The code is available at: https://github.com/rasd3/TODA

Via

Access Paper or Ask Questions

Enhancing Effectiveness and Robustness in a Low-Resource Regime via Decision-Boundary-aware Data Augmentation

Mar 22, 2024

Kyohoon Jin, Junho Lee, Juhwan Choi, Sangmin Song, Youngbin Kim

Abstract:Efforts to leverage deep learning models in low-resource regimes have led to numerous augmentation studies. However, the direct application of methods such as mixup and cutout to text data, is limited due to their discrete characteristics. While methods using pretrained language models have exhibited efficiency, they require additional considerations for robustness. Inspired by recent studies on decision boundaries, this paper proposes a decision-boundary-aware data augmentation strategy to enhance robustness using pretrained language models. The proposed technique first focuses on shifting the latent features closer to the decision boundary, followed by reconstruction to generate an ambiguous version with a soft label. Additionally, mid-K sampling is suggested to enhance the diversity of the generated sentences. This paper demonstrates the performance of the proposed augmentation strategy compared to other methods through extensive experiments. Furthermore, the ablation study reveals the effect of soft labels and mid-K sampling and the extensibility of the method with curriculum data augmentation.

* Accepted at LREC-COLING 2024

Via

Access Paper or Ask Questions

SoftEDA: Rethinking Rule-Based Data Augmentation with Soft Labels

Feb 08, 2024

Juhwan Choi, Kyohoon Jin, Junho Lee, Sangmin Song, Youngbin Kim

Figure 1 for SoftEDA: Rethinking Rule-Based Data Augmentation with Soft Labels

Figure 2 for SoftEDA: Rethinking Rule-Based Data Augmentation with Soft Labels

Figure 3 for SoftEDA: Rethinking Rule-Based Data Augmentation with Soft Labels

Figure 4 for SoftEDA: Rethinking Rule-Based Data Augmentation with Soft Labels

Abstract:Rule-based text data augmentation is widely used for NLP tasks due to its simplicity. However, this method can potentially damage the original meaning of the text, ultimately hurting the performance of the model. To overcome this limitation, we propose a straightforward technique for applying soft labels to augmented data. We conducted experiments across seven different classification tasks and empirically demonstrated the effectiveness of our proposed approach. We have publicly opened our source code for reproducibility.

* ICLR 2023 Tiny Papers

Via

Access Paper or Ask Questions

AutoAugment Is What You Need: Enhancing Rule-based Augmentation Methods in Low-resource Regimes

Feb 08, 2024

Juhwan Choi, Kyohoon Jin, Junho Lee, Sangmin Song, Youngbin Kim

Figure 1 for AutoAugment Is What You Need: Enhancing Rule-based Augmentation Methods in Low-resource Regimes

Figure 2 for AutoAugment Is What You Need: Enhancing Rule-based Augmentation Methods in Low-resource Regimes

Figure 3 for AutoAugment Is What You Need: Enhancing Rule-based Augmentation Methods in Low-resource Regimes

Abstract:Text data augmentation is a complex problem due to the discrete nature of sentences. Although rule-based augmentation methods are widely adopted in real-world applications because of their simplicity, they suffer from potential semantic damage. Previous researchers have suggested easy data augmentation with soft labels (softEDA), employing label smoothing to mitigate this problem. However, finding the best factor for each model and dataset is challenging; therefore, using softEDA in real-world applications is still difficult. In this paper, we propose adapting AutoAugment to solve this problem. The experimental results suggest that the proposed method can boost existing augmentation methods and that rule-based methods can enhance cutting-edge pre-trained language models. We offer the source code.

* EACL 2024 Student Research Workshop

Via

Access Paper or Ask Questions

User Guide for KOTE: Korean Online Comments Emotions Dataset

May 11, 2022

Duyoung Jeon, Junho Lee, Cheongtag Kim

Figure 1 for User Guide for KOTE: Korean Online Comments Emotions Dataset

Figure 2 for User Guide for KOTE: Korean Online Comments Emotions Dataset

Figure 3 for User Guide for KOTE: Korean Online Comments Emotions Dataset

Figure 4 for User Guide for KOTE: Korean Online Comments Emotions Dataset

Abstract:Sentiment analysis that classifies data into positive or negative has been dominantly used to recognize emotional aspects of texts, despite the deficit of thorough examination of emotional meanings. Recently, corpora labeled with more than just valence are built to exceed this limit. However, most Korean emotion corpora are small in the number of instances and cover a limited range of emotions. We introduce KOTE dataset. KOTE contains 50k (250k cases) Korean online comments, each of which is manually labeled for 43 emotion labels or one special label (NO EMOTION) by crowdsourcing (Ps = 3,048). The emotion taxonomy of the 43 emotions is systematically established by cluster analysis of Korean emotion concepts expressed on word embedding space. After explaining how KOTE is developed, we also discuss the results of finetuning and analysis for social discrimination in the corpus.

* 16 pages, 4 figures

Via

Access Paper or Ask Questions

Do Not Escape From the Manifold: Discovering the Local Coordinates on the Latent Space of GANs

Jun 13, 2021

Jaewoong Choi, Changyeon Yoon, Junho Lee, Jung Ho Park, Geonho Hwang, Myungjoo Kang

Figure 1 for Do Not Escape From the Manifold: Discovering the Local Coordinates on the Latent Space of GANs

Figure 2 for Do Not Escape From the Manifold: Discovering the Local Coordinates on the Latent Space of GANs

Figure 3 for Do Not Escape From the Manifold: Discovering the Local Coordinates on the Latent Space of GANs

Figure 4 for Do Not Escape From the Manifold: Discovering the Local Coordinates on the Latent Space of GANs

Abstract:In this paper, we propose a method to find local-geometry-aware traversal directions on the intermediate latent space of Generative Adversarial Networks (GANs). These directions are defined as an ordered basis of tangent space at a latent code. Motivated by the intrinsic sparsity of the latent space, the basis is discovered by solving the low-rank approximation problem of the differential of the partial network. Moreover, the local traversal basis leads to a natural iterative traversal on the latent space. Iterative Curve-Traversal shows stable traversal on images, since the trajectory of latent code stays close to the latent space even under the strong perturbations compared to the linear traversal. This stability provides far more diverse variations of the given image. Although the proposed method can be applied to various GAN models, we focus on the W-space of the StyleGAN2, which is renowned for showing the better disentanglement of the latent factors of variation. Our quantitative and qualitative analysis provides evidence showing that the W-space is still globally warped while showing a certain degree of global consistency of interpretable variation. In particular, we introduce some metrics on the Grassmannian manifolds to quantify the global warpage of the W-space and the subspace traversal to test the stability of traversal directions.

* 16 pages, 12 figures

Via

Access Paper or Ask Questions

GATSBI: Generative Agent-centric Spatio-temporal Object Interaction

Apr 09, 2021

Cheol-Hui Min, Jinseok Bae, Junho Lee, Young Min Kim

Figure 1 for GATSBI: Generative Agent-centric Spatio-temporal Object Interaction

Figure 2 for GATSBI: Generative Agent-centric Spatio-temporal Object Interaction

Figure 3 for GATSBI: Generative Agent-centric Spatio-temporal Object Interaction

Figure 4 for GATSBI: Generative Agent-centric Spatio-temporal Object Interaction

Abstract:We present GATSBI, a generative model that can transform a sequence of raw observations into a structured latent representation that fully captures the spatio-temporal context of the agent's actions. In vision-based decision-making scenarios, an agent faces complex high-dimensional observations where multiple entities interact with each other. The agent requires a good scene representation of the visual observation that discerns essential components and consistently propagates along the time horizon. Our method, GATSBI, utilizes unsupervised object-centric scene representation learning to separate an active agent, static background, and passive objects. GATSBI then models the interactions reflecting the causal relationships among decomposed entities and predicts physically plausible future states. Our model generalizes to a variety of environments where different types of robots and objects dynamically interact with each other. We show GATSBI achieves superior performance on scene decomposition and video prediction compared to its state-of-the-art counterparts.

* accepted to CVPR'2021 as an oral presentation. Code and video will be released soon

Via

Access Paper or Ask Questions