Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Longbin Ji

PoseTraj: Pose-Aware Trajectory Control in Video Diffusion

Mar 20, 2025

Longbin Ji, Lei Zhong, Pengfei Wei, Changjian Li

Abstract:Recent advancements in trajectory-guided video generation have achieved notable progress. However, existing models still face challenges in generating object motions with potentially changing 6D poses under wide-range rotations, due to limited 3D understanding. To address this problem, we introduce PoseTraj, a pose-aware video dragging model for generating 3D-aligned motion from 2D trajectories. Our method adopts a novel two-stage pose-aware pretraining framework, improving 3D understanding across diverse trajectories. Specifically, we propose a large-scale synthetic dataset PoseTraj-10K, containing 10k videos of objects following rotational trajectories, and enhance the model perception of object pose changes by incorporating 3D bounding boxes as intermediate supervision signals. Following this, we fine-tune the trajectory-controlling module on real-world videos, applying an additional camera-disentanglement module to further refine motion accuracy. Experiments on various benchmark datasets demonstrate that our method not only excels in 3D pose-aligned dragging for rotational trajectories but also outperforms existing baselines in trajectory accuracy and video quality.

* Code, data and project page: https://robingg1.github.io/Pose-Traj/

Via

Access Paper or Ask Questions

C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model

Aug 29, 2023

Longbin Ji, Pengfei Wei, Yi Ren, Jinglin Liu, Chen Zhang, Xiang Yin

Figure 1 for C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model

Figure 2 for C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model

Figure 3 for C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model

Figure 4 for C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model

Abstract:Co-speech gesture generation is crucial for automatic digital avatar animation. However, existing methods suffer from issues such as unstable training and temporal inconsistency, particularly in generating high-fidelity and comprehensive gestures. Additionally, these methods lack effective control over speaker identity and temporal editing of the generated gestures. Focusing on capturing temporal latent information and applying practical controlling, we propose a Controllable Co-speech Gesture Generation framework, named C2G2. Specifically, we propose a two-stage temporal dependency enhancement strategy motivated by latent diffusion models. We further introduce two key features to C2G2, namely a speaker-specific decoder to generate speaker-related real-length skeletons and a repainting strategy for flexible gesture generation/editing. Extensive experiments on benchmark gesture datasets verify the effectiveness of our proposed C2G2 compared with several state-of-the-art baselines. The link of the project demo page can be found at https://c2g2-gesture.github.io/c2_gesture

* 12 pages, 6 figures, 7 tables

Via

Access Paper or Ask Questions