Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wonjun Hwang

Ranked Entropy Minimization for Continual Test-Time Adaptation

May 22, 2025

Jisu Han, Jaemin Na, Wonjun Hwang

Abstract:Test-time adaptation aims to adapt to realistic environments in an online manner by learning during test time. Entropy minimization has emerged as a principal strategy for test-time adaptation due to its efficiency and adaptability. Nevertheless, it remains underexplored in continual test-time adaptation, where stability is more important. We observe that the entropy minimization method often suffers from model collapse, where the model converges to predicting a single class for all images due to a trivial solution. We propose ranked entropy minimization to mitigate the stability problem of the entropy minimization method and extend its applicability to continuous scenarios. Our approach explicitly structures the prediction difficulty through a progressive masking strategy. Specifically, it gradually aligns the model's probability distributions across different levels of prediction difficulty while preserving the rank order of entropy. The proposed method is extensively evaluated across various benchmarks, demonstrating its effectiveness through empirical results. Our code is available at https://github.com/pilsHan/rem

* ICML 2025

Via

Access Paper or Ask Questions

SCHNet: SAM Marries CLIP for Human Parsing

Mar 28, 2025

Kunliang Liu, Jianming Wang, Rize Jin, Wonjun Hwang, Tae-Sun Chung

Abstract:Vision Foundation Model (VFM) such as the Segment Anything Model (SAM) and Contrastive Language-Image Pre-training Model (CLIP) has shown promising performance for segmentation and detection tasks. However, although SAM excels in fine-grained segmentation, it faces major challenges when applying it to semantic-aware segmentation. While CLIP exhibits a strong semantic understanding capability via aligning the global features of language and vision, it has deficiencies in fine-grained segmentation tasks. Human parsing requires to segment human bodies into constituent parts and involves both accurate fine-grained segmentation and high semantic understanding of each part. Based on traits of SAM and CLIP, we formulate high efficient modules to effectively integrate features of them to benefit human parsing. We propose a Semantic-Refinement Module to integrate semantic features of CLIP with SAM features to benefit parsing. Moreover, we formulate a high efficient Fine-tuning Module to adjust the pretrained SAM for human parsing that needs high semantic information and simultaneously demands spatial details, which significantly reduces the training time compared with full-time training and achieves notable performance. Extensive experiments demonstrate the effectiveness of our method on LIP, PPP, and CIHP databases.

Via

Access Paper or Ask Questions

Semantic Prompting with Image-Token for Continual Learning

Mar 18, 2024

Jisu Han, Jaemin Na, Wonjun Hwang

Figure 1 for Semantic Prompting with Image-Token for Continual Learning

Figure 2 for Semantic Prompting with Image-Token for Continual Learning

Figure 3 for Semantic Prompting with Image-Token for Continual Learning

Figure 4 for Semantic Prompting with Image-Token for Continual Learning

Abstract:Continual learning aims to refine model parameters for new tasks while retaining knowledge from previous tasks. Recently, prompt-based learning has emerged to leverage pre-trained models to be prompted to learn subsequent tasks without the reliance on the rehearsal buffer. Although this approach has demonstrated outstanding results, existing methods depend on preceding task-selection process to choose appropriate prompts. However, imperfectness in task-selection may lead to negative impacts on the performance particularly in the scenarios where the number of tasks is large or task distributions are imbalanced. To address this issue, we introduce I-Prompt, a task-agnostic approach focuses on the visual semantic information of image tokens to eliminate task prediction. Our method consists of semantic prompt matching, which determines prompts based on similarities between tokens, and image token-level prompting, which applies prompts directly to image tokens in the intermediate layers. Consequently, our method achieves competitive performance on four benchmarks while significantly reducing training time compared to state-of-the-art methods. Moreover, we demonstrate the superiority of our method across various scenarios through extensive experiments.

Via

Access Paper or Ask Questions

OurDB: Ouroboric Domain Bridging for Multi-Target Domain Adaptive Semantic Segmentation

Mar 18, 2024

Seungbeom Woo, Geonwoo Baek, Taehoon Kim, Jaemin Na, Joong-won Hwang, Wonjun Hwang

Abstract:Multi-target domain adaptation (MTDA) for semantic segmentation poses a significant challenge, as it involves multiple target domains with varying distributions. The goal of MTDA is to minimize the domain discrepancies among a single source and multi-target domains, aiming to train a single model that excels across all target domains. Previous MTDA approaches typically employ multiple teacher architectures, where each teacher specializes in one target domain to simplify the task. However, these architectures hinder the student model from fully assimilating comprehensive knowledge from all target-specific teachers and escalate training costs with increasing target domains. In this paper, we propose an ouroboric domain bridging (OurDB) framework, offering an efficient solution to the MTDA problem using a single teacher architecture. This framework dynamically cycles through multiple target domains, aligning each domain individually to restrain the biased alignment problem, and utilizes Fisher information to minimize the forgetting of knowledge from previous target domains. We also propose a context-guided class-wise mixup (CGMix) that leverages contextual information tailored to diverse target contexts in MTDA. Experimental evaluations conducted on four urban driving datasets (i.e., GTA5, Cityscapes, IDD, and Mapillary) demonstrate the superiority of our method over existing state-of-the-art approaches.

Via

Access Paper or Ask Questions

D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection

Mar 14, 2024

Dinh Phat Do, Taehoon Kim, Jaemin Na, Jiwon Kim, Keonho Lee, Kyunghwan Cho, Wonjun Hwang

Figure 1 for D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection

Abstract:Domain adaptation for object detection typically entails transferring knowledge from one visible domain to another visible domain. However, there are limited studies on adapting from the visible to the thermal domain, because the domain gap between the visible and thermal domains is much larger than expected, and traditional domain adaptation can not successfully facilitate learning in this situation. To overcome this challenge, we propose a Distinctive Dual-Domain Teacher (D3T) framework that employs distinct training paradigms for each domain. Specifically, we segregate the source and target training sets for building dual-teachers and successively deploy exponential moving average to the student model to individual teachers of each domain. The framework further incorporates a zigzag learning method between dual teachers, facilitating a gradual transition from the visible to thermal domains during training. We validate the superiority of our method through newly designed experimental protocols with well-known thermal datasets, i.e., FLIR and KAIST. Source code is available at https://github.com/EdwardDo69/D3T .

* Accepted by CVPR 2024. Link: https://github.com/EdwardDo69/D3T

Via

Access Paper or Ask Questions

Switching Temporary Teachers for Semi-Supervised Semantic Segmentation

Oct 28, 2023

Jaemin Na, Jung-Woo Ha, Hyung Jin Chang, Dongyoon Han, Wonjun Hwang

Abstract:The teacher-student framework, prevalent in semi-supervised semantic segmentation, mainly employs the exponential moving average (EMA) to update a single teacher's weights based on the student's. However, EMA updates raise a problem in that the weights of the teacher and student are getting coupled, causing a potential performance bottleneck. Furthermore, this problem may become more severe when training with more complicated labels such as segmentation masks but with few annotated data. This paper introduces Dual Teacher, a simple yet effective approach that employs dual temporary teachers aiming to alleviate the coupling problem for the student. The temporary teachers work in shifts and are progressively improved, so consistently prevent the teacher and student from becoming excessively close. Specifically, the temporary teachers periodically take turns generating pseudo-labels to train a student model and maintain the distinct characteristics of the student model for each epoch. Consequently, Dual Teacher achieves competitive performance on the PASCAL VOC, Cityscapes, and ADE20K benchmarks with remarkably shorter training times than state-of-the-art methods. Moreover, we demonstrate that our approach is model-agnostic and compatible with both CNN- and Transformer-based models. Code is available at \url{https://github.com/naver-ai/dual-teacher}.

* NeurIPS-2023

Via

Access Paper or Ask Questions

SRIL: Selective Regularization for Class-Incremental Learning

May 09, 2023

Jisu Han, Jaemin Na, Wonjun Hwang

Figure 1 for SRIL: Selective Regularization for Class-Incremental Learning

Figure 2 for SRIL: Selective Regularization for Class-Incremental Learning

Figure 3 for SRIL: Selective Regularization for Class-Incremental Learning

Figure 4 for SRIL: Selective Regularization for Class-Incremental Learning

Abstract:Human intelligence gradually accepts new information and accumulates knowledge throughout the lifespan. However, deep learning models suffer from a catastrophic forgetting phenomenon, where they forget previous knowledge when acquiring new information. Class-Incremental Learning aims to create an integrated model that balances plasticity and stability to overcome this challenge. In this paper, we propose a selective regularization method that accepts new knowledge while maintaining previous knowledge. We first introduce an asymmetric feature distillation method for old and new classes inspired by cognitive science, using the gradient of classification and knowledge distillation losses to determine whether to perform pattern completion or pattern separation. We also propose a method to selectively interpolate the weight of the previous model for a balance between stability and plasticity, and we adjust whether to transfer through model confidence to ensure the performance of the previous class and enable exploratory learning. We validate the effectiveness of the proposed method, which surpasses the performance of existing methods through extensive experimental protocols using CIFAR-100, ImageNet-Subset, and ImageNet-Full.

* 10 pages, 7 figures

Via

Access Paper or Ask Questions

ORC: Network Group-based Knowledge Distillation using Online Role Change

Jun 01, 2022

Junyong Choi, Hyeon Cho, Seockhwa Jeong, Wonjun Hwang

Figure 1 for ORC: Network Group-based Knowledge Distillation using Online Role Change

Figure 2 for ORC: Network Group-based Knowledge Distillation using Online Role Change

Figure 3 for ORC: Network Group-based Knowledge Distillation using Online Role Change

Figure 4 for ORC: Network Group-based Knowledge Distillation using Online Role Change

Abstract:In knowledge distillation, since a single, omnipotent teacher network cannot solve all problems, multiple teacher-based knowledge distillations have been studied recently. However, sometimes their improvements are not as good as expected because some immature teachers may transfer the false knowledge to the student. In this paper, to overcome this limitation and take the efficacy of the multiple networks, we divide the multiple networks into teacher and student groups, respectively. That is, the student group is a set of immature networks that require learning the teacher's knowledge, while the teacher group consists of the selected networks that have performed well. Furthermore, according to our online role change strategy, the top-ranked networks in the student group are able to promote to the teacher group at every iteration and vice versa. After training the teacher group using the error images of the student group to refine the teacher group's knowledge, we transfer the collective knowledge from the teacher group to the student group successfully. We verify the superiority of the proposed method on CIFAR-10 and CIFAR-100, which achieves high performance. We further show the generality of our method with various backbone architectures such as resent, wrn, vgg, mobilenet, and shufflenet.

Via

Access Paper or Ask Questions

itKD: Interchange Transfer-based Knowledge Distillation for 3D Object Detection

May 31, 2022

Hyeon Cho, Junyong Choi, Geonwoo Baek, Wonjun Hwang

Figure 1 for itKD: Interchange Transfer-based Knowledge Distillation for 3D Object Detection

Figure 2 for itKD: Interchange Transfer-based Knowledge Distillation for 3D Object Detection

Figure 3 for itKD: Interchange Transfer-based Knowledge Distillation for 3D Object Detection

Figure 4 for itKD: Interchange Transfer-based Knowledge Distillation for 3D Object Detection

Abstract:Recently, point-cloud based 3D object detectors have achieved remarkable progress. However, most studies are limited to the development of deep learning architectures for improving only their accuracy. In this paper, we propose an autoencoder-style framework comprising channel-wise compression and decompression via interchange transfer for knowledge distillation. To learn the map-view feature of a teacher network, the features from a teacher and student network are independently passed through the shared autoencoder; here, we use a compressed representation loss that binds the channel-wised compression knowledge from both the networks as a kind of regularization. The decompressed features are transferred in opposite directions to reduce the gap in the interchange reconstructions. Lastly, we present an attentive head loss for matching the pivotal detection information drawn by the multi-head self-attention mechanism. Through extensive experiments, we verify that our method can learn the lightweight model that is well-aligned with the 3D point cloud detection task and we demonstrate its superiority using the well-known public datasets Waymo and nuScenes.

* 12 pages, 2 figures, 8 tables

Via

Access Paper or Ask Questions

Contrastive Vicinal Space for Unsupervised Domain Adaptation

Dec 05, 2021

Jaemin Na, Dongyoon Han, Hyung Jin Chang, Wonjun Hwang

Figure 1 for Contrastive Vicinal Space for Unsupervised Domain Adaptation

Figure 2 for Contrastive Vicinal Space for Unsupervised Domain Adaptation

Figure 3 for Contrastive Vicinal Space for Unsupervised Domain Adaptation

Figure 4 for Contrastive Vicinal Space for Unsupervised Domain Adaptation

Abstract:Utilizing vicinal space between the source and target domains is one of the recent unsupervised domain adaptation approaches. However, the problem of the equilibrium collapse of labels, where the source labels are dominant over the target labels in the predictions of vicinal instances, has never been addressed. In this paper, we propose an instance-wise minimax strategy that minimizes the entropy of high uncertainty instances in the vicinal space to tackle it. We divide the vicinal space into two subspaces through the solution of the minimax problem: contrastive space and consensus space. In the contrastive space, inter-domain discrepancy is mitigated by constraining instances to have contrastive views and labels, and the consensus space reduces the confusion between intra-domain categories. The effectiveness of our method is demonstrated on the public benchmarks, including Office-31, Office-Home, and VisDA-C, which achieve state-of-the-art performances. We further show that our method outperforms current state-of-the-art methods on PACS, which indicates our instance-wise approach works well for multi-source domain adaptation as well.

* 10 pages, 7 figures, 5 tables

Via

Access Paper or Ask Questions