Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Jan 18, 2024

Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Ziwei Liu(+1 more)

Figure 1 for Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Figure 2 for Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Figure 3 for Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Figure 4 for Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Share this with someone who'll enjoy it:

Abstract:We introduce a new task -- language-driven video inpainting, which uses natural language instructions to guide the inpainting process. This approach overcomes the limitations of traditional video inpainting methods that depend on manually labeled binary masks, a process often tedious and labor-intensive. We present the Remove Objects from Videos by Instructions (ROVI) dataset, containing 5,650 videos and 9,091 inpainting results, to support training and evaluation for this task. We also propose a novel diffusion-based language-driven video inpainting framework, the first end-to-end baseline for this task, integrating Multimodal Large Language Models to understand and execute complex language-based inpainting requests effectively. Our comprehensive results showcase the dataset's versatility and the model's effectiveness in various language-instructed inpainting scenarios. We will make datasets, code, and models publicly available.

* Project Page: https://jianzongwu.github.io/projects/rovi

View paper on

Share this with someone who'll enjoy it:

Title:Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Paper and Code