Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Structured-Noise Masked Modeling for Video, Audio and Beyond

Mar 20, 2025

Aritra Bhowmik, Fida Mohammad Thoker, Carlos Hinojosa, Bernard Ghanem, Cees G. M. Snoek

Figure 1 for Structured-Noise Masked Modeling for Video, Audio and Beyond

Figure 2 for Structured-Noise Masked Modeling for Video, Audio and Beyond

Figure 3 for Structured-Noise Masked Modeling for Video, Audio and Beyond

Figure 4 for Structured-Noise Masked Modeling for Video, Audio and Beyond

Share this with someone who'll enjoy it:

Abstract:Masked modeling has emerged as a powerful self-supervised learning framework, but existing methods largely rely on random masking, disregarding the structural properties of different modalities. In this work, we introduce structured noise-based masking, a simple yet effective approach that naturally aligns with the spatial, temporal, and spectral characteristics of video and audio data. By filtering white noise into distinct color noise distributions, we generate structured masks that preserve modality-specific patterns without requiring handcrafted heuristics or access to the data. Our approach improves the performance of masked video and audio modeling frameworks without any computational overhead. Extensive experiments demonstrate that structured noise masking achieves consistent improvement over random masking for standard and advanced masked modeling methods, highlighting the importance of modality-aware masking strategies for representation learning.

View paper on

Share this with someone who'll enjoy it:

Title:Structured-Noise Masked Modeling for Video, Audio and Beyond

Paper and Code