Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Youngmin Oh

Subnet-Aware Dynamic Supernet Training for Neural Architecture Search

Mar 13, 2025

Jeimin Jeon, Youngmin Oh, Junghyup Lee, Donghyeon Baek, Dohyung Kim, Chanho Eom, Bumsub Ham

Abstract:N-shot neural architecture search (NAS) exploits a supernet containing all candidate subnets for a given search space. The subnets are typically trained with a static training strategy (e.g., using the same learning rate (LR) scheduler and optimizer for all subnets). This, however, does not consider that individual subnets have distinct characteristics, leading to two problems: (1) The supernet training is biased towards the low-complexity subnets (unfairness); (2) the momentum update in the supernet is noisy (noisy momentum). We present a dynamic supernet training technique to address these problems by adjusting the training strategy adaptive to the subnets. Specifically, we introduce a complexity-aware LR scheduler (CaLR) that controls the decay ratio of LR adaptive to the complexities of subnets, which alleviates the unfairness problem. We also present a momentum separation technique (MS). It groups the subnets with similar structural characteristics and uses a separate momentum for each group, avoiding the noisy momentum problem. Our approach can be applicable to various N-shot NAS methods with marginal cost, while improving the search performance drastically. We validate the effectiveness of our approach on various search spaces (e.g., NAS-Bench-201, Mobilenet spaces) and datasets (e.g., CIFAR-10/100, ImageNet).

* Accepted to CVPR 2025

Via

Access Paper or Ask Questions

Efficient Few-Shot Neural Architecture Search by Counting the Number of Nonlinear Functions

Dec 19, 2024

Youngmin Oh, Hyunju Lee, Bumsub Ham

Abstract:Neural architecture search (NAS) enables finding the best-performing architecture from a search space automatically. Most NAS methods exploit an over-parameterized network (i.e., a supernet) containing all possible architectures (i.e., subnets) in the search space. However, the subnets that share the same set of parameters are likely to have different characteristics, interfering with each other during training. To address this, few-shot NAS methods have been proposed that divide the space into a few subspaces and employ a separate supernet for each subspace to limit the extent of weight sharing. They achieve state-of-the-art performance, but the computational cost increases accordingly. We introduce in this paper a novel few-shot NAS method that exploits the number of nonlinear functions to split the search space. To be specific, our method divides the space such that each subspace consists of subnets with the same number of nonlinear functions. Our splitting criterion is efficient, since it does not require comparing gradients of a supernet to split the space. In addition, we have found that dividing the space allows us to reduce the channel dimensions required for each supernet, which enables training multiple supernets in an efficient manner. We also introduce a supernet-balanced sampling (SBS) technique, sampling several subnets at each training step, to train different supernets evenly within a limited number of training steps. Extensive experiments on standard NAS benchmarks demonstrate the effectiveness of our approach. Our code is available at https://cvlab.yonsei.ac.kr/projects/EFS-NAS.

* Accepted to AAAI 2025

Via

Access Paper or Ask Questions

M3: Mamba-assisted Multi-Circuit Optimization via MBRL with Effective Scheduling

Nov 25, 2024

Youngmin Oh, Jinje Park, Seunggeun Kim, Taejin Paik, David Pan, Bosun Hwang

Figure 1 for M3: Mamba-assisted Multi-Circuit Optimization via MBRL with Effective Scheduling

Figure 2 for M3: Mamba-assisted Multi-Circuit Optimization via MBRL with Effective Scheduling

Figure 3 for M3: Mamba-assisted Multi-Circuit Optimization via MBRL with Effective Scheduling

Figure 4 for M3: Mamba-assisted Multi-Circuit Optimization via MBRL with Effective Scheduling

Abstract:Recent advancements in reinforcement learning (RL) for analog circuit optimization have demonstrated significant potential for improving sample efficiency and generalization across diverse circuit topologies and target specifications. However, there are challenges such as high computational overhead, the need for bespoke models for each circuit. To address them, we propose M3, a novel Model-based RL (MBRL) method employing the Mamba architecture and effective scheduling. The Mamba architecture, known as a strong alternative to the transformer architecture, enables multi-circuit optimization with distinct parameters and target specifications. The effective scheduling strategy enhances sample efficiency by adjusting crucial MBRL training parameters. To the best of our knowledge, M3 is the first method for multi-circuit optimization by leveraging both the Mamba architecture and a MBRL with effective scheduling. As a result, it significantly improves sample efficiency compared to existing RL methods.

Via

Access Paper or Ask Questions

MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection

Jul 23, 2024

Youngmin Oh, Hyung-Il Kim, Seong Tae Kim, Jung Uk Kim

Figure 1 for MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection

Figure 2 for MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection

Figure 3 for MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection

Figure 4 for MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection

Abstract:Monocular 3D object detection is an important challenging task in autonomous driving. Existing methods mainly focus on performing 3D detection in ideal weather conditions, characterized by scenarios with clear and optimal visibility. However, the challenge of autonomous driving requires the ability to handle changes in weather conditions, such as foggy weather, not just clear weather. We introduce MonoWAD, a novel weather-robust monocular 3D object detector with a weather-adaptive diffusion model. It contains two components: (1) the weather codebook to memorize the knowledge of the clear weather and generate a weather-reference feature for any input, and (2) the weather-adaptive diffusion model to enhance the feature representation of the input feature by incorporating a weather-reference feature. This serves an attention role in indicating how much improvement is needed for the input feature according to the weather conditions. To achieve this goal, we introduce a weather-adaptive enhancement loss to enhance the feature representation under both clear and foggy weather conditions. Extensive experiments under various weather conditions demonstrate that MonoWAD achieves weather-robust monocular 3D object detection. The code and dataset are released at https://github.com/VisualAIKHU/MonoWAD.

* Accepted by ECCV 2024

Via

Access Paper or Ask Questions

FYI: Flip Your Images for Dataset Distillation

Jul 11, 2024

Byunggwan Son, Youngmin Oh, Donghyeon Baek, Bumsub Ham

Figure 1 for FYI: Flip Your Images for Dataset Distillation

Figure 2 for FYI: Flip Your Images for Dataset Distillation

Figure 3 for FYI: Flip Your Images for Dataset Distillation

Figure 4 for FYI: Flip Your Images for Dataset Distillation

Abstract:Dataset distillation synthesizes a small set of images from a large-scale real dataset such that synthetic and real images share similar behavioral properties (e.g, distributions of gradients or features) during a training process. Through extensive analyses on current methods and real datasets, together with empirical observations, we provide in this paper two important things to share for dataset distillation. First, object parts that appear on one side of a real image are highly likely to appear on the opposite side of another image within a dataset, which we call the bilateral equivalence. Second, the bilateral equivalence enforces synthetic images to duplicate discriminative parts of objects on both the left and right sides of the images, limiting the recognition of subtle differences between objects. To address this problem, we introduce a surprisingly simple yet effective technique for dataset distillation, dubbed FYI, that enables distilling rich semantics of real images into synthetic ones. To this end, FYI embeds a horizontal flipping technique into distillation processes, mitigating the influence of the bilateral equivalence, while capturing more details of objects. Experiments on CIFAR-10/100, Tiny-ImageNet, and ImageNet demonstrate that FYI can be seamlessly integrated into several state-of-the-art methods, without modifying training objectives and network architectures, and it improves the performance remarkably.

* Accepted to ECCV 2024

Via

Access Paper or Ask Questions

INSIGHT: Universal Neural Simulator for Analog Circuits Harnessing Autoregressive Transformers

Jul 10, 2024

Souradip Poddar, Youngmin Oh, Yao Lai, Hanqing Zhu, Bosun Hwang, David Z. Pan

Figure 1 for INSIGHT: Universal Neural Simulator for Analog Circuits Harnessing Autoregressive Transformers

Figure 2 for INSIGHT: Universal Neural Simulator for Analog Circuits Harnessing Autoregressive Transformers

Figure 3 for INSIGHT: Universal Neural Simulator for Analog Circuits Harnessing Autoregressive Transformers

Figure 4 for INSIGHT: Universal Neural Simulator for Analog Circuits Harnessing Autoregressive Transformers

Abstract:Analog front-end design heavily relies on specialized human expertise and costly trial-and-error simulations, which motivated many prior works on analog design automation. However, efficient and effective exploration of the vast and complex design space remains constrained by the time-consuming nature of CPU-based SPICE simulations, making effective design automation a challenging endeavor. In this paper, we introduce INSIGHT, a GPU-powered, technology-independent, effective universal neural simulator in the analog front-end design automation loop. INSIGHT accurately predicts the performance metrics of analog circuits across various technology nodes, significantly reducing inference time. Notably, its autoregressive capabilities enable INSIGHT to accurately predict simulation-costly critical transient specifications leveraging less expensive performance metric information. The low cost and high fidelity feature make INSIGHT a good substitute for standard simulators in analog front-end optimization frameworks. INSIGHT is compatible with any optimization framework, facilitating enhanced design space exploration for sample efficiency through sophisticated offline learning and adaptation techniques. Our experiments demonstrate that INSIGHT-M, a model-based batch reinforcement learning framework that leverages INSIGHT for analog sizing, achieves at least 50X improvement in sample efficiency across circuits. To the best of our knowledge, this marks the first use of autoregressive transformers in analog front-end design.

Via

Access Paper or Ask Questions

Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning

Mar 08, 2024

Hongjoon Ahn, Jinu Hyeon, Youngmin Oh, Bosun Hwang, Taesup Moon

Figure 1 for Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning

Figure 2 for Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning

Figure 3 for Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning

Figure 4 for Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning

Abstract:We argue that one of the main obstacles for developing effective Continual Reinforcement Learning (CRL) algorithms is the negative transfer issue occurring when the new task to learn arrives. Through comprehensive experimental validation, we demonstrate that such issue frequently exists in CRL and cannot be effectively addressed by several recent work on mitigating plasticity loss of RL agents. To that end, we develop Reset & Distill (R&D), a simple yet highly effective method, to overcome the negative transfer problem in CRL. R&D combines a strategy of resetting the agent's online actor and critic networks to learn a new task and an offline learning step for distilling the knowledge from the online actor and previous expert's action probabilities. We carried out extensive experiments on long sequence of Meta-World tasks and show that our method consistently outperforms recent baselines, achieving significantly higher success rates across a range of tasks. Our findings highlight the importance of considering negative transfer in CRL and emphasize the need for robust strategies like R&D to mitigate its detrimental effects.

Via

Access Paper or Ask Questions

ACLS: Adaptive and Conditional Label Smoothing for Network Calibration

Aug 24, 2023

Hyekang Park, Jongyoun Noh, Youngmin Oh, Donghyeon Baek, Bumsub Ham

Figure 1 for ACLS: Adaptive and Conditional Label Smoothing for Network Calibration

Figure 2 for ACLS: Adaptive and Conditional Label Smoothing for Network Calibration

Figure 3 for ACLS: Adaptive and Conditional Label Smoothing for Network Calibration

Figure 4 for ACLS: Adaptive and Conditional Label Smoothing for Network Calibration

Abstract:We address the problem of network calibration adjusting miscalibrated confidences of deep neural networks. Many approaches to network calibration adopt a regularization-based method that exploits a regularization term to smooth the miscalibrated confidences. Although these approaches have shown the effectiveness on calibrating the networks, there is still a lack of understanding on the underlying principles of regularization in terms of network calibration. We present in this paper an in-depth analysis of existing regularization-based methods, providing a better understanding on how they affect to network calibration. Specifically, we have observed that 1) the regularization-based methods can be interpreted as variants of label smoothing, and 2) they do not always behave desirably. Based on the analysis, we introduce a novel loss function, dubbed ACLS, that unifies the merits of existing regularization methods, while avoiding the limitations. We show extensive experimental results for image classification and semantic segmentation on standard benchmarks, including CIFAR10, Tiny-ImageNet, ImageNet, and PASCAL VOC, demonstrating the effectiveness of our loss function.

* Accepted to ICCV 2023 (Oral presentation)

Via

Access Paper or Ask Questions

ALIFE: Adaptive Logit Regularizer and Feature Replay for Incremental Semantic Segmentation

Oct 13, 2022

Youngmin Oh, Donghyeon Baek, Bumsub Ham

Figure 1 for ALIFE: Adaptive Logit Regularizer and Feature Replay for Incremental Semantic Segmentation

Figure 2 for ALIFE: Adaptive Logit Regularizer and Feature Replay for Incremental Semantic Segmentation

Figure 3 for ALIFE: Adaptive Logit Regularizer and Feature Replay for Incremental Semantic Segmentation

Figure 4 for ALIFE: Adaptive Logit Regularizer and Feature Replay for Incremental Semantic Segmentation

Abstract:We address the problem of incremental semantic segmentation (ISS) recognizing novel object/stuff categories continually without forgetting previous ones that have been learned. The catastrophic forgetting problem is particularly severe in ISS, since pixel-level ground-truth labels are available only for the novel categories at training time. To address the problem, regularization-based methods exploit probability calibration techniques to learn semantic information from unlabeled pixels. While such techniques are effective, there is still a lack of theoretical understanding of them. Replay-based methods propose to memorize a small set of images for previous categories. They achieve state-of-the-art performance at the cost of large memory footprint. We propose in this paper a novel ISS method, dubbed ALIFE, that provides a better compromise between accuracy and efficiency. To this end, we first show an in-depth analysis on the calibration techniques to better understand the effects on ISS. Based on this, we then introduce an adaptive logit regularizer (ALI) that enables our model to better learn new categories, while retaining knowledge for previous ones. We also present a feature replay scheme that memorizes features, instead of images directly, in order to reduce memory requirements significantly. Since a feature extractor is changed continually, memorized features should also be updated at every incremental stage. To handle this, we introduce category-specific rotation matrices updating the features for each category separately. We demonstrate the effectiveness of our approach with extensive experiments on standard ISS benchmarks, and show that our method achieves a better trade-off in terms of accuracy and efficiency.

* Accepted to NeurIPS 2022

Via

Access Paper or Ask Questions

Decomposed Knowledge Distillation for Class-Incremental Semantic Segmentation

Oct 12, 2022

Donghyeon Baek, Youngmin Oh, Sanghoon Lee, Junghyup Lee, Bumsub Ham

Figure 1 for Decomposed Knowledge Distillation for Class-Incremental Semantic Segmentation

Figure 2 for Decomposed Knowledge Distillation for Class-Incremental Semantic Segmentation

Figure 3 for Decomposed Knowledge Distillation for Class-Incremental Semantic Segmentation

Figure 4 for Decomposed Knowledge Distillation for Class-Incremental Semantic Segmentation

Abstract:Class-incremental semantic segmentation (CISS) labels each pixel of an image with a corresponding object/stuff class continually. To this end, it is crucial to learn novel classes incrementally without forgetting previously learned knowledge. Current CISS methods typically use a knowledge distillation (KD) technique for preserving classifier logits, or freeze a feature extractor, to avoid the forgetting problem. The strong constraints, however, prevent learning discriminative features for novel classes. We introduce a CISS framework that alleviates the forgetting problem and facilitates learning novel classes effectively. We have found that a logit can be decomposed into two terms. They quantify how likely an input belongs to a particular class or not, providing a clue for a reasoning process of a model. The KD technique, in this context, preserves the sum of two terms (i.e., a class logit), suggesting that each could be changed and thus the KD does not imitate the reasoning process. To impose constraints on each term explicitly, we propose a new decomposed knowledge distillation (DKD) technique, improving the rigidity of a model and addressing the forgetting problem more effectively. We also introduce a novel initialization method to train new classifiers for novel classes. In CISS, the number of negative training samples for novel classes is not sufficient to discriminate old classes. To mitigate this, we propose to transfer knowledge of negatives to the classifiers successively using an auxiliary classifier, boosting the performance significantly. Experimental results on standard CISS benchmarks demonstrate the effectiveness of our framework.

* Accepted to NeurIPS 2022

Via

Access Paper or Ask Questions