Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yen-Siang Wu

VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models

Mar 27, 2025

Chi-Pin Huang, Yen-Siang Wu, Hung-Kai Chung, Kai-Po Chang, Fu-En Yang, Yu-Chiang Frank Wang

Figure 1 for VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models

Figure 2 for VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models

Figure 3 for VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models

Figure 4 for VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models

Abstract:Customized text-to-video generation aims to produce high-quality videos that incorporate user-specified subject identities or motion patterns. However, existing methods mainly focus on personalizing a single concept, either subject identity or motion pattern, limiting their effectiveness for multiple subjects with the desired motion patterns. To tackle this challenge, we propose a unified framework VideoMage for video customization over both multiple subjects and their interactive motions. VideoMage employs subject and motion LoRAs to capture personalized content from user-provided images and videos, along with an appearance-agnostic motion learning approach to disentangle motion patterns from visual appearance. Furthermore, we develop a spatial-temporal composition scheme to guide interactions among subjects within the desired motion patterns. Extensive experiments demonstrate that VideoMage outperforms existing methods, generating coherent, user-controlled videos with consistent subject identities and interactions.

* CVPR 2025. Project Page: https://jasper0314-huang.github.io/videomage-customization

Via

Access Paper or Ask Questions

MotionMatcher: Motion Customization of Text-to-Video Diffusion Models via Motion Feature Matching

Feb 18, 2025

Yen-Siang Wu, Chi-Pin Huang, Fu-En Yang, Yu-Chiang Frank Wang

Figure 1 for MotionMatcher: Motion Customization of Text-to-Video Diffusion Models via Motion Feature Matching

Figure 2 for MotionMatcher: Motion Customization of Text-to-Video Diffusion Models via Motion Feature Matching

Figure 3 for MotionMatcher: Motion Customization of Text-to-Video Diffusion Models via Motion Feature Matching

Figure 4 for MotionMatcher: Motion Customization of Text-to-Video Diffusion Models via Motion Feature Matching

Abstract:Text-to-video (T2V) diffusion models have shown promising capabilities in synthesizing realistic videos from input text prompts. However, the input text description alone provides limited control over the precise objects movements and camera framing. In this work, we tackle the motion customization problem, where a reference video is provided as motion guidance. While most existing methods choose to fine-tune pre-trained diffusion models to reconstruct the frame differences of the reference video, we observe that such strategy suffer from content leakage from the reference video, and they cannot capture complex motion accurately. To address this issue, we propose MotionMatcher, a motion customization framework that fine-tunes the pre-trained T2V diffusion model at the feature level. Instead of using pixel-level objectives, MotionMatcher compares high-level, spatio-temporal motion features to fine-tune diffusion models, ensuring precise motion learning. For the sake of memory efficiency and accessibility, we utilize a pre-trained T2V diffusion model, which contains considerable prior knowledge about video motion, to compute these motion features. In our experiments, we demonstrate state-of-the-art motion customization performances, validating the design of our framework.

* Project page: https://www.csie.ntu.edu.tw/~b09902097/motionmatcher/

Via

Access Paper or Ask Questions