Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xubo Yang

IKMo: Image-Keyframed Motion Generation with Trajectory-Pose Conditioned Motion Diffusion Model

May 27, 2025

Yang Zhao, Yan Zhang, Xubo Yang

Abstract:Existing human motion generation methods with trajectory and pose inputs operate global processing on both modalities, leading to suboptimal outputs. In this paper, we propose IKMo, an image-keyframed motion generation method based on the diffusion model with trajectory and pose being decoupled. The trajectory and pose inputs go through a two-stage conditioning framework. In the first stage, the dedicated optimization module is applied to refine inputs. In the second stage, trajectory and pose are encoded via a Trajectory Encoder and a Pose Encoder in parallel. Then, motion with high spatial and semantic fidelity is guided by a motion ControlNet, which processes the fused trajectory and pose data. Experiment results based on HumanML3D and KIT-ML datasets demonstrate that the proposed method outperforms state-of-the-art on all metrics under trajectory-keyframe constraints. In addition, MLLM-based agents are implemented to pre-process model inputs. Given texts and keyframe images from users, the agents extract motion descriptions, keyframe poses, and trajectories as the optimized inputs into the motion generation model. We conducts a user study with 10 participants. The experiment results prove that the MLLM-based agents pre-processing makes generated motion more in line with users' expectation. We believe that the proposed method improves both the fidelity and controllability of motion generation by the diffusion model.

Via

Access Paper or Ask Questions

An Intelligent Social Learning-based Optimization Strategy for Black-box Robotic Control with Reinforcement Learning

Nov 11, 2023

Xubo Yang, Jian Gao, Ting Wang, Yaozhen He

Figure 1 for An Intelligent Social Learning-based Optimization Strategy for Black-box Robotic Control with Reinforcement Learning

Figure 2 for An Intelligent Social Learning-based Optimization Strategy for Black-box Robotic Control with Reinforcement Learning

Figure 3 for An Intelligent Social Learning-based Optimization Strategy for Black-box Robotic Control with Reinforcement Learning

Figure 4 for An Intelligent Social Learning-based Optimization Strategy for Black-box Robotic Control with Reinforcement Learning

Abstract:Implementing intelligent control of robots is a difficult task, especially when dealing with complex black-box systems, because of the lack of visibility and understanding of how these robots work internally. This paper proposes an Intelligent Social Learning (ISL) algorithm to enable intelligent control of black-box robotic systems. Inspired by mutual learning among individuals in human social groups, ISL includes learning, imitation, and self-study styles. Individuals in the learning style use the Levy flight search strategy to learn from the best performer and form the closest relationships. In the imitation style, individuals mimic the best performer with a second-level rapport by employing a random perturbation strategy. In the self-study style, individuals learn independently using a normal distribution sampling method while maintaining a distant relationship with the best performer. Individuals in the population are regarded as autonomous intelligent agents in each style. Neural networks perform strategic actions in three styles to interact with the environment and the robot and iteratively optimize the network policy. Overall, ISL builds on the principles of intelligent optimization, incorporating ideas from reinforcement learning, and possesses strong search capabilities, fast computation speed, fewer hyperparameters, and insensitivity to sparse rewards. The proposed ISL algorithm is compared with four state-of-the-art methods on six continuous control benchmark cases in MuJoCo to verify its effectiveness and advantages. Furthermore, ISL is adopted in the simulation and experimental grasping tasks of the UR3 robot for validations, and satisfactory solutions are yielded.

Via

Access Paper or Ask Questions

Improving Ranking Correlation of Supernet with Candidates Enhancement and Progressive Training

Aug 12, 2021

Ziwei Yang, Ruyi Zhang, Zhi Yang, Xubo Yang, Lei Wang, Zheyang Li

Figure 1 for Improving Ranking Correlation of Supernet with Candidates Enhancement and Progressive Training

Figure 2 for Improving Ranking Correlation of Supernet with Candidates Enhancement and Progressive Training

Figure 3 for Improving Ranking Correlation of Supernet with Candidates Enhancement and Progressive Training

Figure 4 for Improving Ranking Correlation of Supernet with Candidates Enhancement and Progressive Training

Abstract:One-shot neural architecture search (NAS) applies weight-sharing supernet to reduce the unaffordable computation overhead of automated architecture designing. However, the weight-sharing technique worsens the ranking consistency of performance due to the interferences between different candidate networks. To address this issue, we propose a candidates enhancement method and progressive training pipeline to improve the ranking correlation of supernet. Specifically, we carefully redesign the sub-networks in the supernet and map the original supernet to a new one of high capacity. In addition, we gradually add narrow branches of supernet to reduce the degree of weight sharing which effectively alleviates the mutual interference between sub-networks. Finally, our method ranks the 1st place in the Supernet Track of CVPR2021 1st Lightweight NAS Challenge.

* 5 pages, 2 figures. CVPR2021 NAS challenge

Via

Access Paper or Ask Questions

Cascade Bagging for Accuracy Prediction with Few Training Samples

Aug 12, 2021

Ruyi Zhang, Ziwei Yang, Zhi Yang, Xubo Yang, Lei Wang, Zheyang Li

Figure 1 for Cascade Bagging for Accuracy Prediction with Few Training Samples

Figure 2 for Cascade Bagging for Accuracy Prediction with Few Training Samples

Figure 3 for Cascade Bagging for Accuracy Prediction with Few Training Samples

Figure 4 for Cascade Bagging for Accuracy Prediction with Few Training Samples

Abstract:Accuracy predictor is trained to predict the validation accuracy of an network from its architecture encoding. It can effectively assist in designing networks and improving Neural Architecture Search(NAS) efficiency. However, a high-performance predictor depends on adequate trainning samples, which requires unaffordable computation overhead. To alleviate this problem, we propose a novel framework to train an accuracy predictor under few training samples. The framework consists ofdata augmentation methods and an ensemble learning algorithm. The data augmentation methods calibrate weak labels and inject noise to feature space. The ensemble learning algorithm, termed cascade bagging, trains two-level models by sampling data and features. In the end, the advantages of above methods are proved in the Performance Prediciton Track of CVPR2021 1st Lightweight NAS Challenge. Our code is made public at: https://github.com/dlongry/Solutionto-CVPR2021-NAS-Track2.

Via

Access Paper or Ask Questions

Foveated Neural Radiance Fields for Real-Time and Egocentric Virtual Reality

Mar 30, 2021

Nianchen Deng, Zhenyi He, Jiannan Ye, Praneeth Chakravarthula, Xubo Yang, Qi Sun

Figure 1 for Foveated Neural Radiance Fields for Real-Time and Egocentric Virtual Reality

Figure 2 for Foveated Neural Radiance Fields for Real-Time and Egocentric Virtual Reality

Figure 3 for Foveated Neural Radiance Fields for Real-Time and Egocentric Virtual Reality

Figure 4 for Foveated Neural Radiance Fields for Real-Time and Egocentric Virtual Reality

Abstract:Traditional high-quality 3D graphics requires large volumes of fine-detailed scene data for rendering. This demand compromises computational efficiency and local storage resources. Specifically, it becomes more concerning for future wearable and portable virtual and augmented reality (VR/AR) displays. Recent approaches to combat this problem include remote rendering/streaming and neural representations of 3D assets. These approaches have redefined the traditional local storage-rendering pipeline by distributed computing or compression of large data. However, these methods typically suffer from high latency or low quality for practical visualization of large immersive virtual scenes, notably with extra high resolution and refresh rate requirements for VR applications such as gaming and design. Tailored for the future portable, low-storage, and energy-efficient VR platforms, we present the first gaze-contingent 3D neural representation and view synthesis method. We incorporate the human psychophysics of visual- and stereo-acuity into an egocentric neural representation of 3D scenery. Furthermore, we jointly optimize the latency/performance and visual quality, while mutually bridging human perception and neural scene synthesis, to achieve perceptually high-quality immersive interaction. Both objective analysis and subjective study demonstrate the effectiveness of our approach in significantly reducing local storage volume and synthesis latency (up to 99% reduction in both data size and computational time), while simultaneously presenting high-fidelity rendering, with perceptual quality identical to that of fully locally stored and rendered high-quality imagery.

Via

Access Paper or Ask Questions