Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Multi-sentence Video Grounding for Long Video Generation

Jul 18, 2024

Wei Feng, Xin Wang, Hong Chen, Zeyang Zhang, Wenwu Zhu

Figure 1 for Multi-sentence Video Grounding for Long Video Generation

Figure 2 for Multi-sentence Video Grounding for Long Video Generation

Figure 3 for Multi-sentence Video Grounding for Long Video Generation

Figure 4 for Multi-sentence Video Grounding for Long Video Generation

Share this with someone who'll enjoy it:

Abstract:Video generation has witnessed great success recently, but their application in generating long videos still remains challenging due to the difficulty in maintaining the temporal consistency of generated videos and the high memory cost during generation. To tackle the problems, in this paper, we propose a brave and new idea of Multi-sentence Video Grounding for Long Video Generation, connecting the massive video moment retrieval to the video generation task for the first time, providing a new paradigm for long video generation. The method of our work can be summarized as three steps: (i) We design sequential scene text prompts as the queries for video grounding, utilizing the massive video moment retrieval to search for video moment segments that meet the text requirements in the video database. (ii) Based on the source frames of retrieved video moment segments, we adopt video editing methods to create new video content while preserving the temporal consistency of the retrieved video. Since the editing can be conducted segment by segment, and even frame by frame, it largely reduces the memory cost. (iii) We also attempt video morphing and personalized generation methods to improve the subject consistency of long video generation, providing ablation experimental results for the subtasks of long video generation. Our approach seamlessly extends the development in image/video editing, video morphing and personalized generation, and video grounding to the long video generation, offering effective solutions for generating long videos at low memory cost.

View paper on

Share this with someone who'll enjoy it:

Title:Multi-sentence Video Grounding for Long Video Generation

Paper and Code