Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022

Jun 15, 2022

Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson

Figure 1 for Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022

Figure 2 for Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022

Figure 3 for Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022

Share this with someone who'll enjoy it:

Abstract:This technical report describes the SViT approach for the Ego4D Point of No Return (PNR) Temporal Localization Challenge. We propose a learning framework StructureViT (SViT for short), which demonstrates how utilizing the structure of a small number of images only available during training can improve a video model. SViT relies on two key insights. First, as both images and videos contain structured information, we enrich a transformer model with a set of \emph{object tokens} that can be used across images and videos. Second, the scene representations of individual frames in video should "align" with those of still images. This is achieved via a "Frame-Clip Consistency" loss, which ensures the flow of structured information between images and videos. SViT obtains strong performance on the challenge test set with 0.656 absolute temporal localization error.

* Ego4D CVPR22 Object State Localization challenge. arXiv admin note: substantial text overlap with arXiv:2206.06346

View paper on

Share this with someone who'll enjoy it:

Title:Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022

Paper and Code