Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:GenRec: Unifying Video Generation and Recognition with Diffusion Models

Aug 27, 2024

Zejia Weng, Xitong Yang, Zhen Xing, Zuxuan Wu, Yu-Gang Jiang

Figure 1 for GenRec: Unifying Video Generation and Recognition with Diffusion Models

Figure 2 for GenRec: Unifying Video Generation and Recognition with Diffusion Models

Figure 3 for GenRec: Unifying Video Generation and Recognition with Diffusion Models

Figure 4 for GenRec: Unifying Video Generation and Recognition with Diffusion Models

Share this with someone who'll enjoy it:

Abstract:Video diffusion models are able to generate high-quality videos by learning strong spatial-temporal priors on large-scale datasets. In this paper, we aim to investigate whether such priors derived from a generative process are suitable for video recognition, and eventually joint optimization of generation and recognition. Building upon Stable Video Diffusion, we introduce GenRec, the first unified framework trained with a random-frame conditioning process so as to learn generalized spatial-temporal representations. The resulting framework can naturally supports generation and recognition, and more importantly is robust even when visual inputs contain limited information. Extensive experiments demonstrate the efficacy of GenRec for both recognition and generation. In particular, GenRec achieves competitive recognition performance, offering 75.8% and 87.2% accuracy on SSV2 and K400, respectively. GenRec also performs the best class-conditioned image-to-video generation results, achieving 46.5 and 49.3 FVD scores on SSV2 and EK-100 datasets. Furthermore, GenRec demonstrates extraordinary robustness in scenarios that only limited frames can be observed.

* 17 pages, 6 figures, 7 tables

View paper on

Share this with someone who'll enjoy it:

Title:GenRec: Unifying Video Generation and Recognition with Diffusion Models

Paper and Code