Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Semi-Supervised Audio-Visual Video Action Recognition with Audio Source Localization Guided Mixup

Mar 04, 2025

Seokun Kang, Taehwan Kim

Figure 1 for Semi-Supervised Audio-Visual Video Action Recognition with Audio Source Localization Guided Mixup

Figure 2 for Semi-Supervised Audio-Visual Video Action Recognition with Audio Source Localization Guided Mixup

Figure 3 for Semi-Supervised Audio-Visual Video Action Recognition with Audio Source Localization Guided Mixup

Figure 4 for Semi-Supervised Audio-Visual Video Action Recognition with Audio Source Localization Guided Mixup

Share this with someone who'll enjoy it:

Abstract:Video action recognition is a challenging but important task for understanding and discovering what the video does. However, acquiring annotations for a video is costly, and semi-supervised learning (SSL) has been studied to improve performance even with a small number of labeled data in the task. Prior studies for semi-supervised video action recognition have mostly focused on using single modality - visuals - but the video is multi-modal, so utilizing both visuals and audio would be desirable and improve performance further, which has not been explored well. Therefore, we propose audio-visual SSL for video action recognition, which uses both visual and audio together, even with quite a few labeled data, which is challenging. In addition, to maximize the information of audio and video, we propose a novel audio source localization-guided mixup method that considers inter-modal relations between video and audio modalities. In experiments on UCF-51, Kinetics-400, and VGGSound datasets, our model shows the superior performance of the proposed semi-supervised audio-visual action recognition framework and audio source localization-guided mixup.

View paper on

Share this with someone who'll enjoy it:

Title:Semi-Supervised Audio-Visual Video Action Recognition with Audio Source Localization Guided Mixup

Paper and Code