Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Text-controlled Motion Mamba: Text-Instructed Temporal Grounding of Human Motion

Apr 17, 2024

Xinghan Wang, Zixi Kang, Yadong Mu

Figure 1 for Text-controlled Motion Mamba: Text-Instructed Temporal Grounding of Human Motion

Figure 2 for Text-controlled Motion Mamba: Text-Instructed Temporal Grounding of Human Motion

Figure 3 for Text-controlled Motion Mamba: Text-Instructed Temporal Grounding of Human Motion

Figure 4 for Text-controlled Motion Mamba: Text-Instructed Temporal Grounding of Human Motion

Share this with someone who'll enjoy it:

Abstract:Human motion understanding is a fundamental task with diverse practical applications, facilitated by the availability of large-scale motion capture datasets. Recent studies focus on text-motion tasks, such as text-based motion generation, editing and question answering. In this study, we introduce the novel task of text-based human motion grounding (THMG), aimed at precisely localizing temporal segments corresponding to given textual descriptions within untrimmed motion sequences. Capturing global temporal information is crucial for the THMG task. However, transformer-based models that rely on global temporal self-attention face challenges when handling long untrimmed sequences due to the quadratic computational cost. We address these challenges by proposing Text-controlled Motion Mamba (TM-Mamba), a unified model that integrates temporal global context, language query control, and spatial graph topology with only linear memory cost. The core of the model is a text-controlled selection mechanism which dynamically incorporates global temporal information based on text query. The model is further enhanced to be topology-aware through the integration of relational embeddings. For evaluation, we introduce BABEL-Grounding, the first text-motion dataset that provides detailed textual descriptions of human actions along with their corresponding temporal segments. Extensive evaluations demonstrate the effectiveness of TM-Mamba on BABEL-Grounding.

View paper on

Share this with someone who'll enjoy it:

Title:Text-controlled Motion Mamba: Text-Instructed Temporal Grounding of Human Motion

Paper and Code