Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency

Apr 06, 2022

Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Yi Xu, Xiang Wang, Mingqian Tang, Changxin Gao, Rong Jin, Nong Sang

Figure 1 for Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency

Figure 2 for Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency

Figure 3 for Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency

Figure 4 for Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency

Share this with someone who'll enjoy it:

Abstract:Natural videos provide rich visual contents for self-supervised learning. Yet most existing approaches for learning spatio-temporal representations rely on manually trimmed videos, leading to limited diversity in visual patterns and limited performance gain. In this work, we aim to learn representations by leveraging more abundant information in untrimmed videos. To this end, we propose to learn a hierarchy of consistencies in videos, i.e., visual consistency and topical consistency, corresponding respectively to clip pairs that tend to be visually similar when separated by a short time span and share similar topics when separated by a long time span. Specifically, a hierarchical consistency learning framework HiCo is presented, where the visually consistent pairs are encouraged to have the same representation through contrastive learning, while the topically consistent pairs are coupled through a topical classifier that distinguishes whether they are topic related. Further, we impose a gradual sampling algorithm for proposed hierarchical consistency learning, and demonstrate its theoretical superiority. Empirically, we show that not only HiCo can generate stronger representations on untrimmed videos, it also improves the representation quality when applied to trimmed videos. This is in contrast to standard contrastive learning that fails to learn appropriate representations from untrimmed videos.

* CVPR2022; Project page is: https://hico-cvpr2022.github.io/

View paper on

Share this with someone who'll enjoy it:

Title:Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency

Paper and Code