Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hyun Seok Seong

Temporal Alignment-Free Video Matching for Few-shot Action Recognition

Apr 08, 2025

SuBeen Lee, WonJun Moon, Hyun Seok Seong, Jae-Pil Heo

Abstract:Few-Shot Action Recognition (FSAR) aims to train a model with only a few labeled video instances. A key challenge in FSAR is handling divergent narrative trajectories for precise video matching. While the frame- and tuple-level alignment approaches have been promising, their methods heavily rely on pre-defined and length-dependent alignment units (e.g., frames or tuples), which limits flexibility for actions of varying lengths and speeds. In this work, we introduce a novel TEmporal Alignment-free Matching (TEAM) approach, which eliminates the need for temporal units in action representation and brute-force alignment during matching. Specifically, TEAM represents each video with a fixed set of pattern tokens that capture globally discriminative clues within the video instance regardless of action length or speed, ensuring its flexibility. Furthermore, TEAM is inherently efficient, using token-wise comparisons to measure similarity between videos, unlike existing methods that rely on pairwise comparisons for temporal alignment. Additionally, we propose an adaptation process that identifies and removes common information across classes, establishing clear boundaries even between novel categories. Extensive experiments demonstrate the effectiveness of TEAM. Codes are available at github.com/leesb7426/TEAM.

* 10 pages, 7 figures, 6 tables, Accepted to CVPR 2025 as Oral Presentation

Via

Access Paper or Ask Questions

Foreground-Covering Prototype Generation and Matching for SAM-Aided Few-Shot Segmentation

Jan 01, 2025

Suho Park, SuBeen Lee, Hyun Seok Seong, Jaejoon Yoo, Jae-Pil Heo

Figure 1 for Foreground-Covering Prototype Generation and Matching for SAM-Aided Few-Shot Segmentation

Figure 2 for Foreground-Covering Prototype Generation and Matching for SAM-Aided Few-Shot Segmentation

Figure 3 for Foreground-Covering Prototype Generation and Matching for SAM-Aided Few-Shot Segmentation

Figure 4 for Foreground-Covering Prototype Generation and Matching for SAM-Aided Few-Shot Segmentation

Abstract:We propose Foreground-Covering Prototype Generation and Matching to resolve Few-Shot Segmentation (FSS), which aims to segment target regions in unlabeled query images based on labeled support images. Unlike previous research, which typically estimates target regions in the query using support prototypes and query pixels, we utilize the relationship between support and query prototypes. To achieve this, we utilize two complementary features: SAM Image Encoder features for pixel aggregation and ResNet features for class consistency. Specifically, we construct support and query prototypes with SAM features and distinguish query prototypes of target regions based on ResNet features. For the query prototype construction, we begin by roughly guiding foreground regions within SAM features using the conventional pseudo-mask, then employ iterative cross-attention to aggregate foreground features into learnable tokens. Here, we discover that the cross-attention weights can effectively alternate the conventional pseudo-mask. Therefore, we use the attention-based pseudo-mask to guide ResNet features to focus on the foreground, then infuse the guided ResNet feature into the learnable tokens to generate class-consistent query prototypes. The generation of the support prototype is conducted symmetrically to that of the query one, with the pseudo-mask replaced by the ground-truth mask. Finally, we compare these query prototypes with support ones to generate prompts, which subsequently produce object masks through the SAM Mask Decoder. Our state-of-the-art performances on various datasets validate the effectiveness of the proposed method for FSS. Our official code is available at https://github.com/SuhoPark0706/FCP

* Association for the Advancement of Artificial Intelligence (AAAI) 2025

Via

Access Paper or Ask Questions

Progressive Proxy Anchor Propagation for Unsupervised Semantic Segmentation

Jul 17, 2024

Hyun Seok Seong, WonJun Moon, SuBeen Lee, Jae-Pil Heo

Figure 1 for Progressive Proxy Anchor Propagation for Unsupervised Semantic Segmentation

Figure 2 for Progressive Proxy Anchor Propagation for Unsupervised Semantic Segmentation

Figure 3 for Progressive Proxy Anchor Propagation for Unsupervised Semantic Segmentation

Figure 4 for Progressive Proxy Anchor Propagation for Unsupervised Semantic Segmentation

Abstract:The labor-intensive labeling for semantic segmentation has spurred the emergence of Unsupervised Semantic Segmentation. Recent studies utilize patch-wise contrastive learning based on features from image-level self-supervised pretrained models. However, relying solely on similarity-based supervision from image-level pretrained models often leads to unreliable guidance due to insufficient patch-level semantic representations. To address this, we propose a Progressive Proxy Anchor Propagation (PPAP) strategy. This method gradually identifies more trustworthy positives for each anchor by relocating its proxy to regions densely populated with semantically similar samples. Specifically, we initially establish a tight boundary to gather a few reliable positive samples around each anchor. Then, considering the distribution of positive samples, we relocate the proxy anchor towards areas with a higher concentration of positives and adjust the positiveness boundary based on the propagation degree of the proxy anchor. Moreover, to account for ambiguous regions where positive and negative samples may coexist near the positiveness boundary, we introduce an instance-wise ambiguous zone. Samples within these zones are excluded from the negative set, further enhancing the reliability of the negative set. Our state-of-the-art performances on various datasets validate the effectiveness of the proposed method for Unsupervised Semantic Segmentation.

* Accepted to ECCV 2024

Via

Access Paper or Ask Questions

Task-Disruptive Background Suppression for Few-Shot Segmentation

Dec 26, 2023

Suho Park, SuBeen Lee, Sangeek Hyun, Hyun Seok Seong, Jae-Pil Heo

Abstract:Few-shot segmentation aims to accurately segment novel target objects within query images using only a limited number of annotated support images. The recent works exploit support background as well as its foreground to precisely compute the dense correlations between query and support. However, they overlook the characteristics of the background that generally contains various types of objects. In this paper, we highlight this characteristic of background which can bring problematic cases as follows: (1) when the query and support backgrounds are dissimilar and (2) when objects in the support background are similar to the target object in the query. Without any consideration of the above cases, adopting the entire support background leads to a misprediction of the query foreground as background. To address this issue, we propose Task-disruptive Background Suppression (TBS), a module to suppress those disruptive support background features based on two spatial-wise scores: query-relevant and target-relevant scores. The former aims to mitigate the impact of unshared features solely existing in the support background, while the latter aims to reduce the influence of target-similar support background features. Based on these two scores, we define a query background relevant score that captures the similarity between the backgrounds of the query and the support, and utilize it to scale support background features to adaptively restrict the impact of disruptive support backgrounds. Our proposed method achieves state-of-the-art performance on PASCAL-5 and COCO-20 datasets on 1-shot segmentation. Our official code is available at github.com/SuhoPark0706/TBSNet.

Via

Access Paper or Ask Questions

Task-Oriented Channel Attention for Fine-Grained Few-Shot Classification

Jul 28, 2023

SuBeen Lee, WonJun Moon, Hyun Seok Seong, Jae-Pil Heo

Abstract:The difficulty of the fine-grained image classification mainly comes from a shared overall appearance across classes. Thus, recognizing discriminative details, such as eyes and beaks for birds, is a key in the task. However, this is particularly challenging when training data is limited. To address this, we propose Task Discrepancy Maximization (TDM), a task-oriented channel attention method tailored for fine-grained few-shot classification with two novel modules Support Attention Module (SAM) and Query Attention Module (QAM). SAM highlights channels encoding class-wise discriminative features, while QAM assigns higher weights to object-relevant channels of the query. Based on these submodules, TDM produces task-adaptive features by focusing on channels encoding class-discriminative details and possessed by the query at the same time, for accurate class-sensitive similarity measure between support and query instances. While TDM influences high-level feature maps by task-adaptive calibration of channel-wise importance, we further introduce Instance Attention Module (IAM) operating in intermediate layers of feature extractors to instance-wisely highlight object-relevant channels, by extending QAM. The merits of TDM and IAM and their complementary benefits are experimentally validated in fine-grained few-shot classification tasks. Moreover, IAM is also shown to be effective in coarse-grained and cross-domain few-shot classifications.

* arXiv admin note: text overlap with arXiv:2207.01376

Via

Access Paper or Ask Questions

Leveraging Hidden Positives for Unsupervised Semantic Segmentation

Mar 27, 2023

Hyun Seok Seong, WonJun Moon, SuBeen Lee, Jae-Pil Heo

Figure 1 for Leveraging Hidden Positives for Unsupervised Semantic Segmentation

Figure 2 for Leveraging Hidden Positives for Unsupervised Semantic Segmentation

Figure 3 for Leveraging Hidden Positives for Unsupervised Semantic Segmentation

Figure 4 for Leveraging Hidden Positives for Unsupervised Semantic Segmentation

Abstract:Dramatic demand for manpower to label pixel-level annotations triggered the advent of unsupervised semantic segmentation. Although the recent work employing the vision transformer (ViT) backbone shows exceptional performance, there is still a lack of consideration for task-specific training guidance and local semantic consistency. To tackle these issues, we leverage contrastive learning by excavating hidden positives to learn rich semantic relationships and ensure semantic consistency in local regions. Specifically, we first discover two types of global hidden positives, task-agnostic and task-specific ones for each anchor based on the feature similarities defined by a fixed pre-trained backbone and a segmentation head-in-training, respectively. A gradual increase in the contribution of the latter induces the model to capture task-specific semantic features. In addition, we introduce a gradient propagation strategy to learn semantic consistency between adjacent patches, under the inherent premise that nearby patches are highly likely to possess the same semantics. Specifically, we add the loss propagating to local hidden positives, semantically similar nearby patches, in proportion to the predefined similarity scores. With these training schemes, our proposed method achieves new state-of-the-art (SOTA) results in COCO-stuff, Cityscapes, and Potsdam-3 datasets. Our code is available at: https://github.com/hynnsk/HP.

* Accepted to CVPR 2023

Via

Access Paper or Ask Questions

Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition

Nov 24, 2022

WonJun Moon, Hyun Seok Seong, Jae-Pil Heo

Figure 1 for Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition

Figure 2 for Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition

Figure 3 for Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition

Figure 4 for Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition

Abstract:A dramatic increase in real-world video volume with extremely diverse and emerging topics naturally forms a long-tailed video distribution in terms of their categories, and it spotlights the need for Video Long-Tailed Recognition (VLTR). In this work, we summarize the challenges in VLTR and explore how to overcome them. The challenges are: (1) it is impractical to re-train the whole model for high-quality features, (2) acquiring frame-wise labels requires extensive cost, and (3) long-tailed data triggers biased training. Yet, most existing works for VLTR unavoidably utilize image-level features extracted from pretrained models which are task-irrelevant, and learn by video-level labels. Therefore, to deal with such (1) task-irrelevant features and (2) video-level labels, we introduce two complementary learnable feature aggregators. Learnable layers in each aggregator are to produce task-relevant representations, and each aggregator is to assemble the snippet-wise knowledge into a video representative. Then, we propose Minority-Oriented Vicinity Expansion (MOVE) that explicitly leverages the class frequency into approximating the vicinity distributions to alleviate (3) biased training. By combining these solutions, our approach achieves state-of-the-art results on large-scale VideoLT and synthetically induced Imbalanced-MiniKinetics200. With VideoLT features from ResNet-50, it attains 18% and 58% relative improvements on head and tail classes over the previous state-of-the-art method, respectively.

* Accepted to AAAI 2023. Code is available at https://github.com/wjun0830/MOVE

Via

Access Paper or Ask Questions

Difficulty-Aware Simulator for Open Set Recognition

Jul 20, 2022

WonJun Moon, Junho Park, Hyun Seok Seong, Cheol-Ho Cho, Jae-Pil Heo

Figure 1 for Difficulty-Aware Simulator for Open Set Recognition

Figure 2 for Difficulty-Aware Simulator for Open Set Recognition

Figure 3 for Difficulty-Aware Simulator for Open Set Recognition

Figure 4 for Difficulty-Aware Simulator for Open Set Recognition

Abstract:Open set recognition (OSR) assumes unknown instances appear out of the blue at the inference time. The main challenge of OSR is that the response of models for unknowns is totally unpredictable. Furthermore, the diversity of open set makes it harder since instances have different difficulty levels. Therefore, we present a novel framework, DIfficulty-Aware Simulator (DIAS), that generates fakes with diverse difficulty levels to simulate the real world. We first investigate fakes from generative adversarial network (GAN) in the classifier's viewpoint and observe that these are not severely challenging. This leads us to define the criteria for difficulty by regarding samples generated with GANs having moderate-difficulty. To produce hard-difficulty examples, we introduce Copycat, imitating the behavior of the classifier. Furthermore, moderate- and easy-difficulty samples are also yielded by our modified GAN and Copycat, respectively. As a result, DIAS outperforms state-of-the-art methods with both metrics of AUROC and F-score. Our code is available at https://github.com/wjun0830/Difficulty-Aware-Simulator.

* Accepted to ECCV 2022. Code is available at github.com/wjun0830/Difficulty-Aware-Simulator

Via

Access Paper or Ask Questions