Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward

Nov 27, 2024

Haonan Han, Xiangzuo Wu, Huan Liao, Zunnan Xu, Zhongyuan Hu, Ronghui Li, Yachao Zhang, Xiu Li

Figure 1 for AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward

Figure 2 for AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward

Figure 3 for AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward

Figure 4 for AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward

Share this with someone who'll enjoy it:

Abstract:Recently, text-to-motion models have opened new possibilities for creating realistic human motion with greater efficiency and flexibility. However, aligning motion generation with event-level textual descriptions presents unique challenges due to the complex relationship between textual prompts and desired motion outcomes. To address this, we introduce AToM, a framework that enhances the alignment between generated motion and text prompts by leveraging reward from GPT-4Vision. AToM comprises three main stages: Firstly, we construct a dataset MotionPrefer that pairs three types of event-level textual prompts with generated motions, which cover the integrity, temporal relationship and frequency of motion. Secondly, we design a paradigm that utilizes GPT-4Vision for detailed motion annotation, including visual data formatting, task-specific instructions and scoring rules for each sub-task. Finally, we fine-tune an existing text-to-motion model using reinforcement learning guided by this paradigm. Experimental results demonstrate that AToM significantly improves the event-level alignment quality of text-to-motion generation.

View paper on

Share this with someone who'll enjoy it:

Title:AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward

Paper and Code