Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaohang Yang

DanceChat: Large Language Model-Guided Music-to-Dance Generation

Jun 12, 2025

Qing Wang, Xiaohang Yang, Yilan Dong, Naveen Raj Govindaraj, Gregory Slabaugh, Shanxin Yuan

Abstract:Music-to-dance generation aims to synthesize human dance motion conditioned on musical input. Despite recent progress, significant challenges remain due to the semantic gap between music and dance motion, as music offers only abstract cues, such as melody, groove, and emotion, without explicitly specifying the physical movements. Moreover, a single piece of music can produce multiple plausible dance interpretations. This one-to-many mapping demands additional guidance, as music alone provides limited information for generating diverse dance movements. The challenge is further amplified by the scarcity of paired music and dance data, which restricts the model\^a\u{A}\'Zs ability to learn diverse dance patterns. In this paper, we introduce DanceChat, a Large Language Model (LLM)-guided music-to-dance generation approach. We use an LLM as a choreographer that provides textual motion instructions, offering explicit, high-level guidance for dance generation. This approach goes beyond implicit learning from music alone, enabling the model to generate dance that is both more diverse and better aligned with musical styles. Our approach consists of three components: (1) an LLM-based pseudo instruction generation module that produces textual dance guidance based on music style and structure, (2) a multi-modal feature extraction and fusion module that integrates music, rhythm, and textual guidance into a shared representation, and (3) a diffusion-based motion synthesis module together with a multi-modal alignment loss, which ensures that the generated dance is aligned with both musical and textual cues. Extensive experiments on AIST++ and human evaluations show that DanceChat outperforms state-of-the-art methods both qualitatively and quantitatively.

* check demos at https://dancechat.github.io/anon/

Via

Access Paper or Ask Questions

STaR: Seamless Spatial-Temporal Aware Motion Retargeting with Penetration and Consistency Constraints

Apr 09, 2025

Xiaohang Yang, Qing Wang, Jiahao Yang, Gregory Slabaugh, Shanxin Yuan

Abstract:Motion retargeting seeks to faithfully replicate the spatio-temporal motion characteristics of a source character onto a target character with a different body shape. Apart from motion semantics preservation, ensuring geometric plausibility and maintaining temporal consistency are also crucial for effective motion retargeting. However, many existing methods prioritize either geometric plausibility or temporal consistency. Neglecting geometric plausibility results in interpenetration while neglecting temporal consistency leads to motion jitter. In this paper, we propose a novel sequence-to-sequence model for seamless Spatial-Temporal aware motion Retargeting (STaR), with penetration and consistency constraints. STaR consists of two modules: (1) a spatial module that incorporates dense shape representation and a novel limb penetration constraint to ensure geometric plausibility while preserving motion semantics, and (2) a temporal module that utilizes a temporal transformer and a novel temporal consistency constraint to predict the entire motion sequence at once while enforcing multi-level trajectory smoothness. The seamless combination of the two modules helps us achieve a good balance between the semantic, geometric, and temporal targets. Extensive experiments on the Mixamo and ScanRet datasets demonstrate that our method produces plausible and coherent motions while significantly reducing interpenetration rates compared with other approaches.

* 12 pages, 9 figures;

Via

Access Paper or Ask Questions

Adaptive Multi-Modal Control of Digital Human Hand Synthesis Using a Region-Aware Cycle Loss

Sep 13, 2024

Qifan Fu, Xiaohang Yang, Muhammad Asad, Changjae Oh, Shanxin Yuan, Gregory Slabaugh

Figure 1 for Adaptive Multi-Modal Control of Digital Human Hand Synthesis Using a Region-Aware Cycle Loss

Figure 2 for Adaptive Multi-Modal Control of Digital Human Hand Synthesis Using a Region-Aware Cycle Loss

Figure 3 for Adaptive Multi-Modal Control of Digital Human Hand Synthesis Using a Region-Aware Cycle Loss

Figure 4 for Adaptive Multi-Modal Control of Digital Human Hand Synthesis Using a Region-Aware Cycle Loss

Abstract:Diffusion models have shown their remarkable ability to synthesize images, including the generation of humans in specific poses. However, current models face challenges in adequately expressing conditional control for detailed hand pose generation, leading to significant distortion in the hand regions. To tackle this problem, we first curate the How2Sign dataset to provide richer and more accurate hand pose annotations. In addition, we introduce adaptive, multi-modal fusion to integrate characters' physical features expressed in different modalities such as skeleton, depth, and surface normal. Furthermore, we propose a novel Region-Aware Cycle Loss (RACL) that enables the diffusion model training to focus on improving the hand region, resulting in improved quality of generated hand gestures. More specifically, the proposed RACL computes a weighted keypoint distance between the full-body pose keypoints from the generated image and the ground truth, to generate higher-quality hand poses while balancing overall pose accuracy. Moreover, we use two hand region metrics, named hand-PSNR and hand-Distance for hand pose generation evaluations. Our experimental evaluations demonstrate the effectiveness of our proposed approach in improving the quality of digital human pose generation using diffusion models, especially the quality of the hand region. The source code is available at https://github.com/fuqifan/Region-Aware-Cycle-Loss.

* This paper has been accepted by the ECCV 2024 HANDS workshop

Via

Access Paper or Ask Questions

Open-Source Reinforcement Learning Environments Implemented in MuJoCo with Franka Manipulator

Jan 11, 2024

Zichun Xu, Yuntao Li, Xiaohang Yang, Zhiyuan Zhao, Lei Zhuang, Jingdong Zhao

Figure 1 for Open-Source Reinforcement Learning Environments Implemented in MuJoCo with Franka Manipulator

Figure 2 for Open-Source Reinforcement Learning Environments Implemented in MuJoCo with Franka Manipulator

Figure 3 for Open-Source Reinforcement Learning Environments Implemented in MuJoCo with Franka Manipulator

Figure 4 for Open-Source Reinforcement Learning Environments Implemented in MuJoCo with Franka Manipulator

Abstract:This paper presents three open-source reinforcement learning environments developed on the MuJoCo physics engine with the Franka Emika Panda arm in MuJoCo Menagerie. Three representative tasks, push, slide, and pick-and-place, are implemented through the Gymnasium Robotics API, which inherits from the core of Gymnasium. Both the sparse binary and dense rewards are supported, and the observation space contains the keys of desired and achieved goals to follow the Multi-Goal Reinforcement Learning framework. Three different off-policy algorithms are used to validate the simulation attributes to ensure the fidelity of all tasks, and benchmark results are also given. Each environment and task are defined in a clean way, and the main parameters for modifying the environment are preserved to reflect the main difference. The repository, including all environments, is available at https://github.com/zichunxx/panda_mujoco_gym.

Via

Access Paper or Ask Questions

PetLock:A Genderless and Standard Interface for the Future On-orbit Construction

Sep 09, 2022

Yuntao Li, Zichun Xu, Xiaohang Yang, Zhiyuan Zhao, Jingdong Zhao, Hong Liu

Figure 1 for PetLock:A Genderless and Standard Interface for the Future On-orbit Construction

Figure 2 for PetLock:A Genderless and Standard Interface for the Future On-orbit Construction

Figure 3 for PetLock:A Genderless and Standard Interface for the Future On-orbit Construction

Figure 4 for PetLock:A Genderless and Standard Interface for the Future On-orbit Construction

Abstract:Modular design is the foundation of on orbit construction technology of large space facilities in the future.Standard interface is the key technology of modular design of the future space robotic systems and space facilities.This paper presents the designed and tested of PetLock,a standard and genderless interface which can transfer mechanical loads,power and data between the future modular space robotic manipulator and spacecraft.PetLock adopts a completely genderless design,including connection face,locking mechanism,data and power interface.The connection surface provides a large translation and rotation misalignment tolerance,due to its 120-degree symmetrical and 3D shape design.The locking mechanism features the three locking pins retraction structure design,which is simple and reliable.POGO pin connectors in the center of the interface provides the power and data transfer capabilities.Due to the advantages of high locking force,large tolerance,high reliability and low cost,PetLock has the very big application potential in future on orbit construction missions.

* 8 pages,11 figures

Via

Access Paper or Ask Questions

A Combined Inverse Kinematics Algorithm Using FABRIK with Optimization

Sep 06, 2022

Zichun Xu, Yuntao Li, Xiaohang Yang, Zhiyuan Zhao, Jingdong Zhao, Hong Liu

Figure 1 for A Combined Inverse Kinematics Algorithm Using FABRIK with Optimization

Figure 2 for A Combined Inverse Kinematics Algorithm Using FABRIK with Optimization

Figure 3 for A Combined Inverse Kinematics Algorithm Using FABRIK with Optimization

Figure 4 for A Combined Inverse Kinematics Algorithm Using FABRIK with Optimization

Abstract:Forward and backward reaching inverse kinematics (FABRIK) is a heuristic inverse kinematics solver that is gradually applied to manipulators with the advantages of fast convergence and generating more realistic configurations. However, under the high error constraint, FABRIK exhibits unstable convergence behavior, which is unsatisfactory for the real-time motion planning of manipulators. In this paper, a novel inverse kinematics algorithm that combines FABRIK and the sequential quadratic programming (SQP) algorithm is presented, in which the joint angles deduced by FABRIK will be taken as the initial seed of the SQP algorithm to avoid getting stuck in local minima. The combined algorithm is evaluated with experiments, in which our algorithm can achieve higher success rates and faster solution times than FABRIK under the high error constraint. Furthermore, the combined algorithm can generate continuous trajectories for the UR5 and KUKA LBR IIWA 14 R820 manipulators in path tracking with no pose error and permitted position error of the end-effector.

Via

Access Paper or Ask Questions

Unsupervised Motion Representation Enhanced Network for Action Recognition

Mar 05, 2021

Xiaohang Yang, Lingtong Kong, Jie Yang

Figure 1 for Unsupervised Motion Representation Enhanced Network for Action Recognition

Figure 2 for Unsupervised Motion Representation Enhanced Network for Action Recognition

Figure 3 for Unsupervised Motion Representation Enhanced Network for Action Recognition

Figure 4 for Unsupervised Motion Representation Enhanced Network for Action Recognition

Abstract:Learning reliable motion representation between consecutive frames, such as optical flow, has proven to have great promotion to video understanding. However, the TV-L1 method, an effective optical flow solver, is time-consuming and expensive in storage for caching the extracted optical flow. To fill the gap, we propose UF-TSN, a novel end-to-end action recognition approach enhanced with an embedded lightweight unsupervised optical flow estimator. UF-TSN estimates motion cues from adjacent frames in a coarse-to-fine manner and focuses on small displacement for each level by extracting pyramid of feature and warping one to the other according to the estimated flow of the last level. Due to the lack of labeled motion for action datasets, we constrain the flow prediction with multi-scale photometric consistency and edge-aware smoothness. Compared with state-of-the-art unsupervised motion representation learning methods, our model achieves better accuracy while maintaining efficiency, which is competitive with some supervised or more complicated approaches.

* Accepted by ICASSP 2021

Via

Access Paper or Ask Questions

OAS-Net: Occlusion Aware Sampling Network for Accurate Optical Flow

Jan 31, 2021

Lingtong Kong, Xiaohang Yang, Jie Yang

Figure 1 for OAS-Net: Occlusion Aware Sampling Network for Accurate Optical Flow

Figure 2 for OAS-Net: Occlusion Aware Sampling Network for Accurate Optical Flow

Figure 3 for OAS-Net: Occlusion Aware Sampling Network for Accurate Optical Flow

Figure 4 for OAS-Net: Occlusion Aware Sampling Network for Accurate Optical Flow

Abstract:Optical flow estimation is an essential step for many real-world computer vision tasks. Existing deep networks have achieved satisfactory results by mostly employing a pyramidal coarse-to-fine paradigm, where a key process is to adopt warped target feature based on previous flow prediction to correlate with source feature for building 3D matching cost volume. However, the warping operation can lead to troublesome ghosting problem that results in ambiguity. Moreover, occluded areas are treated equally with non occluded regions in most existing works, which may cause performance degradation. To deal with these challenges, we propose a lightweight yet efficient optical flow network, named OAS-Net (occlusion aware sampling network) for accurate optical flow. First, a new sampling based correlation layer is employed without noisy warping operation. Second, a novel occlusion aware module is presented to make raw cost volume conscious of occluded regions. Third, a shared flow and occlusion awareness decoder is adopted for structure compactness. Experiments on Sintel and KITTI datasets demonstrate the effectiveness of proposed approaches.

* Accepted by ICASSP 2021

Via

Access Paper or Ask Questions