Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Peng Zhai

RENet: Fault-Tolerant Motion Control for Quadruped Robots via Redundant Estimator Networks under Visual Collapse

Sep 11, 2025

Yueqi Zhang, Quancheng Qian, Taixian Hou, Peng Zhai, Xiaoyi Wei, Kangmai Hu, Jiafu Yi, Lihua Zhang

Abstract:Vision-based locomotion in outdoor environments presents significant challenges for quadruped robots. Accurate environmental prediction and effective handling of depth sensor noise during real-world deployment remain difficult, severely restricting the outdoor applications of such algorithms. To address these deployment challenges in vision-based motion control, this letter proposes the Redundant Estimator Network (RENet) framework. The framework employs a dual-estimator architecture that ensures robust motion performance while maintaining deployment stability during onboard vision failures. Through an online estimator adaptation, our method enables seamless transitions between estimation modules when handling visual perception uncertainties. Experimental validation on a real-world robot demonstrates the framework's effectiveness in complex outdoor environments, showing particular advantages in scenarios with degraded visual perception. This framework demonstrates its potential as a practical solution for reliable robotic deployment in challenging field conditions. Project website: https://RENet-Loco.github.io/

* IEEE Robotics and Automation Letters (2025)
* Accepted for IEEE Robotics and Automation Letters (RA-L)

Via

Access Paper or Ask Questions

Music-Driven Legged Robots: Synchronized Walking to Rhythmic Beats

Mar 06, 2025

Taixian Hou, Yueqi Zhang, Xiaoyi Wei, Zhiyan Dong, Jiafu Yi, Peng Zhai, Lihua Zhang

Abstract:We address the challenge of effectively controlling the locomotion of legged robots by incorporating precise frequency and phase characteristics, which is often ignored in locomotion policies that do not account for the periodic nature of walking. We propose a hierarchical architecture that integrates a low-level phase tracker, oscillators, and a high-level phase modulator. This controller allows quadruped robots to walk in a natural manner that is synchronized with external musical rhythms. Our method generates diverse gaits across different frequencies and achieves real-time synchronization with music in the physical world. This research establishes a foundational framework for enabling real-time execution of accurate rhythmic motions in legged robots. Video is available at website: https://music-walker.github.io/.

* ICRA2025 accepted

Via

Access Paper or Ask Questions

Continuous Control of Diverse Skills in Quadruped Robots Without Complete Expert Datasets

Mar 05, 2025

Jiaxin Tu, Xiaoyi Wei, Yueqi Zhang, Taixian Hou, Xiaofei Gao, Zhiyan Dong, Peng Zhai, Lihua Zhang

Abstract:Learning diverse skills for quadruped robots presents significant challenges, such as mastering complex transitions between different skills and handling tasks of varying difficulty. Existing imitation learning methods, while successful, rely on expensive datasets to reproduce expert behaviors. Inspired by introspective learning, we propose Progressive Adversarial Self-Imitation Skill Transition (PASIST), a novel method that eliminates the need for complete expert datasets. PASIST autonomously explores and selects high-quality trajectories based on predefined target poses instead of demonstrations, leveraging the Generative Adversarial Self-Imitation Learning (GASIL) framework. To further enhance learning, We develop a skill selection module to mitigate mode collapse by balancing the weights of skills with varying levels of difficulty. Through these methods, PASIST is able to reproduce skills corresponding to the target pose while achieving smooth and natural transitions between them. Evaluations on both simulation platforms and the Solo 8 robot confirm the effectiveness of PASIST, offering an efficient alternative to expert-driven learning.

* Accepted by ICRA 2025

Via

Access Paper or Ask Questions

Role Play: Learning Adaptive Role-Specific Strategies in Multi-Agent Interactions

Nov 02, 2024

Weifan Long, Wen Wen, Peng Zhai, Lihua Zhang

Abstract:Zero-shot coordination problem in multi-agent reinforcement learning (MARL), which requires agents to adapt to unseen agents, has attracted increasing attention. Traditional approaches often rely on the Self-Play (SP) framework to generate a diverse set of policies in a policy pool, which serves to improve the generalization capability of the final agent. However, these frameworks may struggle to capture the full spectrum of potential strategies, especially in real-world scenarios that demand agents balance cooperation with competition. In such settings, agents need strategies that can adapt to varying and often conflicting goals. Drawing inspiration from Social Value Orientation (SVO)-where individuals maintain stable value orientations during interactions with others-we propose a novel framework called \emph{Role Play} (RP). RP employs role embeddings to transform the challenge of policy diversity into a more manageable diversity of roles. It trains a common policy with role embedding observations and employs a role predictor to estimate the joint role embeddings of other agents, helping the learning agent adapt to its assigned role. We theoretically prove that an approximate optimal policy can be achieved by optimizing the expected cumulative reward relative to an approximate role-based policy. Experimental results in both cooperative (Overcooked) and mixed-motive games (Harvest, CleanUp) reveal that RP consistently outperforms strong baselines when interacting with unseen agents, highlighting its robustness and adaptability in complex environments.

Via

Access Paper or Ask Questions

A Robust Quadruped Robot with Twisting Waist for Flexible Motions

Oct 08, 2024

Quancheng Qian, Xiaoyi Wei, Zonghao Zhang, Jiaxin Tu, Yueqi Zhang, Taixian Hou, Xiaofei Gao, Peng Zhai, Lihua Zhang

Figure 1 for A Robust Quadruped Robot with Twisting Waist for Flexible Motions

Figure 2 for A Robust Quadruped Robot with Twisting Waist for Flexible Motions

Figure 3 for A Robust Quadruped Robot with Twisting Waist for Flexible Motions

Figure 4 for A Robust Quadruped Robot with Twisting Waist for Flexible Motions

Abstract:The waist plays a crucial role in the agile movement of many animals in nature. It provides the torso with additional degrees of freedom and flexibility, inspiring researchers to incorporate this biological feature into robotic structures to enhance robot locomotion. This paper presents a cost-effective and low-complexity waist mechanism integrated into the structure of the open-source robot solo8, adding a new degree of freedom (DOF) to its torso. We refer to this novel robot as solo9. Additionally, we propose a full-body control method for the waist-equipped quadruped robot based on generative adversarial imitation learning (GAIL). During training, the discriminator is used as input for iterative optimization of the policy and dataset, enabling solo9 to achieve flexible steering maneuvers across various gaits. Extensive tests of solo9's steering capabilities, terrain adaptability, and robustness are conducted in both simulation and real-world scenarios, with detailed comparisons to solo8 and solo12, demonstrating the effectiveness of the control algorithm and the advantages of the waist mechanism.

Via

Access Paper or Ask Questions

Large Vision-Language Models as Emotion Recognizers in Context Awareness

Jul 16, 2024

Yuxuan Lei, Dingkang Yang, Zhaoyu Chen, Jiawei Chen, Peng Zhai, Lihua Zhang

Figure 1 for Large Vision-Language Models as Emotion Recognizers in Context Awareness

Figure 2 for Large Vision-Language Models as Emotion Recognizers in Context Awareness

Figure 3 for Large Vision-Language Models as Emotion Recognizers in Context Awareness

Figure 4 for Large Vision-Language Models as Emotion Recognizers in Context Awareness

Abstract:Context-aware emotion recognition (CAER) is a complex and significant task that requires perceiving emotions from various contextual cues. Previous approaches primarily focus on designing sophisticated architectures to extract emotional cues from images. However, their knowledge is confined to specific training datasets and may reflect the subjective emotional biases of the annotators. Furthermore, acquiring large amounts of labeled data is often challenging in real-world applications. In this paper, we systematically explore the potential of leveraging Large Vision-Language Models (LVLMs) to empower the CAER task from three paradigms: 1) We fine-tune LVLMs on two CAER datasets, which is the most common way to transfer large models to downstream tasks. 2) We design zero-shot and few-shot patterns to evaluate the performance of LVLMs in scenarios with limited data or even completely unseen. In this case, a training-free framework is proposed to fully exploit the In-Context Learning (ICL) capabilities of LVLMs. Specifically, we develop an image similarity-based ranking algorithm to retrieve examples; subsequently, the instructions, retrieved examples, and the test example are combined to feed LVLMs to obtain the corresponding sentiment judgment. 3) To leverage the rich knowledge base of LVLMs, we incorporate Chain-of-Thought (CoT) into our framework to enhance the model's reasoning ability and provide interpretable results. Extensive experiments and analyses demonstrate that LVLMs achieve competitive performance in the CAER task across different paradigms. Notably, the superior performance in few-shot settings indicates the feasibility of LVLMs for accomplishing specific tasks without extensive training.

Via

Access Paper or Ask Questions

Asynchronous Multimodal Video Sequence Fusion via Learning Modality-Exclusive and -Agnostic Representations

Jul 06, 2024

Dingkang Yang, Mingcheng Li, Linhao Qu, Kun Yang, Peng Zhai, Song Wang, Lihua Zhang

Figure 1 for Asynchronous Multimodal Video Sequence Fusion via Learning Modality-Exclusive and -Agnostic Representations

Figure 2 for Asynchronous Multimodal Video Sequence Fusion via Learning Modality-Exclusive and -Agnostic Representations

Figure 3 for Asynchronous Multimodal Video Sequence Fusion via Learning Modality-Exclusive and -Agnostic Representations

Figure 4 for Asynchronous Multimodal Video Sequence Fusion via Learning Modality-Exclusive and -Agnostic Representations

Abstract:Understanding human intentions (e.g., emotions) from videos has received considerable attention recently. Video streams generally constitute a blend of temporal data stemming from distinct modalities, including natural language, facial expressions, and auditory clues. Despite the impressive advancements of previous works via attention-based paradigms, the inherent temporal asynchrony and modality heterogeneity challenges remain in multimodal sequence fusion, causing adverse performance bottlenecks. To tackle these issues, we propose a Multimodal fusion approach for learning modality-Exclusive and modality-Agnostic representations (MEA) to refine multimodal features and leverage the complementarity across distinct modalities. On the one hand, MEA introduces a predictive self-attention module to capture reliable context dynamics within modalities and reinforce unique features over the modality-exclusive spaces. On the other hand, a hierarchical cross-modal attention module is designed to explore valuable element correlations among modalities over the modality-agnostic space. Meanwhile, a double-discriminator strategy is presented to ensure the production of distinct representations in an adversarial manner. Eventually, we propose a decoupled graph fusion mechanism to enhance knowledge exchange across heterogeneous modalities and learn robust multimodal representations for downstream tasks. Numerous experiments are implemented on three multimodal datasets with asynchronous sequences. Systematic analyses show the necessity of our approach.

* TCSVT 2024

Via

Access Paper or Ask Questions

PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications

May 29, 2024

Dingkang Yang, Jinjie Wei, Dongling Xiao, Shunli Wang, Tong Wu, Gang Li, Mingcheng Li, Shuaibing Wang, Jiawei Chen, Yue Jiang(+4 more)

Figure 1 for PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications

Figure 2 for PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications

Figure 3 for PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications

Figure 4 for PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications

Abstract:Developing intelligent pediatric consultation systems offers promising prospects for improving diagnostic efficiency, especially in China, where healthcare resources are scarce. Despite recent advances in Large Language Models (LLMs) for Chinese medicine, their performance is sub-optimal in pediatric applications due to inadequate instruction data and vulnerable training procedures. To address the above issues, this paper builds PedCorpus, a high-quality dataset of over 300,000 multi-task instructions from pediatric textbooks, guidelines, and knowledge graph resources to fulfil diverse diagnostic demands. Upon well-designed PedCorpus, we propose PediatricsGPT, the first Chinese pediatric LLM assistant built on a systematic and robust training pipeline. In the continuous pre-training phase, we introduce a hybrid instruction pre-training mechanism to mitigate the internal-injected knowledge inconsistency of LLMs for medical domain adaptation. Immediately, the full-parameter Supervised Fine-Tuning (SFT) is utilized to incorporate the general medical knowledge schema into the models. After that, we devise a direct following preference optimization to enhance the generation of pediatrician-like humanistic responses. In the parameter-efficient secondary SFT phase, a mixture of universal-specific experts strategy is presented to resolve the competency conflict between medical generalist and pediatric expertise mastery. Extensive results based on the metrics, GPT-4, and doctor evaluations on distinct doctor downstream tasks show that PediatricsGPT consistently outperforms previous Chinese medical LLMs. Our model and dataset will be open-source for community development.

* A Technical Report on a Powerful Chinese Medical Large Language Model

Via

Access Paper or Ask Questions

Towards Multimodal Sentiment Analysis Debiasing via Bias Purification

Mar 08, 2024

Dingkang Yang, Mingcheng Li, Dongling Xiao, Yang Liu, Kun Yang, Zhaoyu Chen, Yuzheng Wang, Peng Zhai, Ke Li, Lihua Zhang

Abstract:Multimodal Sentiment Analysis (MSA) aims to understand human intentions by integrating emotion-related clues from diverse modalities, such as visual, language, and audio. Unfortunately, the current MSA task invariably suffers from unplanned dataset biases, particularly multimodal utterance-level label bias and word-level context bias. These harmful biases potentially mislead models to focus on statistical shortcuts and spurious correlations, causing severe performance bottlenecks. To alleviate these issues, we present a Multimodal Counterfactual Inference Sentiment (MCIS) analysis framework based on causality rather than conventional likelihood. Concretely, we first formulate a causal graph to discover harmful biases from already-trained vanilla models. In the inference phase, given a factual multimodal input, MCIS imagines two counterfactual scenarios to purify and mitigate these biases. Then, MCIS can make unbiased decisions from biased observations by comparing factual and counterfactual outcomes. We conduct extensive experiments on several standard MSA benchmarks. Qualitative and quantitative results show the effectiveness of the proposed framework.

* 14 pages

Via

Access Paper or Ask Questions

Multi-Task Learning of Active Fault-Tolerant Controller for Leg Failures in Quadruped robots

Feb 14, 2024

Taixian Hou, Jiaxin Tu, Xiaofei Gao, Zhiyan Dong, Peng Zhai, Lihua Zhang

Figure 1 for Multi-Task Learning of Active Fault-Tolerant Controller for Leg Failures in Quadruped robots

Figure 2 for Multi-Task Learning of Active Fault-Tolerant Controller for Leg Failures in Quadruped robots

Figure 3 for Multi-Task Learning of Active Fault-Tolerant Controller for Leg Failures in Quadruped robots

Figure 4 for Multi-Task Learning of Active Fault-Tolerant Controller for Leg Failures in Quadruped robots

Abstract:Electric quadruped robots used in outdoor exploration are susceptible to leg-related electrical or mechanical failures. Unexpected joint power loss and joint locking can immediately pose a falling threat. Typically, controllers lack the capability to actively sense the condition of their own joints and take proactive actions. Maintaining the original motion patterns could lead to disastrous consequences, as the controller may produce irrational output within a short period of time, further creating the risk of serious physical injuries. This paper presents a hierarchical fault-tolerant control scheme employing a multi-task training architecture capable of actively perceiving and overcoming two types of leg joint faults. The architecture simultaneously trains three joint task policies for health, power loss, and locking scenarios in parallel, introducing a symmetric reflection initialization technique to ensure rapid and stable gait skill transformations. Experiments demonstrate that the control scheme is robust in unexpected scenarios where a single leg experiences concurrent joint faults in two joints. Furthermore, the policy retains the robot's planar mobility, enabling rough velocity tracking. Finally, zero-shot Sim2Real transfer is achieved on the real-world SOLO8 robot, countering both electrical and mechanical failures.

* 6 pages, 9 figures, ICRA2024 Accepted

Via

Access Paper or Ask Questions