Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Li Kang

VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

Jun 10, 2025

Li Kang, Xiufeng Song, Heng Zhou, Yiran Qin, Jie Yang, Xiaohong Liu, Philip Torr, Lei Bai, Zhenfei Yin

Abstract:Coordinating multiple embodied agents in dynamic environments remains a core challenge in artificial intelligence, requiring both perception-driven reasoning and scalable cooperation strategies. While recent works have leveraged large language models (LLMs) for multi-agent planning, a few have begun to explore vision-language models (VLMs) for visual reasoning. However, these VLM-based approaches remain limited in their support for diverse embodiment types. In this work, we introduce VIKI-Bench, the first hierarchical benchmark tailored for embodied multi-agent cooperation, featuring three structured levels: agent activation, task planning, and trajectory perception. VIKI-Bench includes diverse robot embodiments, multi-view visual observations, and structured supervision signals to evaluate reasoning grounded in visual inputs. To demonstrate the utility of VIKI-Bench, we propose VIKI-R, a two-stage framework that fine-tunes a pretrained vision-language model (VLM) using Chain-of-Thought annotated demonstrations, followed by reinforcement learning under multi-level reward signals. Our extensive experiments show that VIKI-R significantly outperforms baselines method across all task levels. Furthermore, we show that reinforcement learning enables the emergence of compositional cooperation patterns among heterogeneous agents. Together, VIKI-Bench and VIKI-R offer a unified testbed and method for advancing multi-agent, visual-driven cooperation in embodied AI systems.

* Project page: https://faceong.github.io/VIKI-R/

Via

Access Paper or Ask Questions

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints

Mar 20, 2025

Yiran Qin, Li Kang, Xiufeng Song, Zhenfei Yin, Xiaohong Liu, Xihui Liu, Ruimao Zhang, Lei Bai

Abstract:Designing effective embodied multi-agent systems is critical for solving complex real-world tasks across domains. Due to the complexity of multi-agent embodied systems, existing methods fail to automatically generate safe and efficient training data for such systems. To this end, we propose the concept of compositional constraints for embodied multi-agent systems, addressing the challenges arising from collaboration among embodied agents. We design various interfaces tailored to different types of constraints, enabling seamless interaction with the physical world. Leveraging compositional constraints and specifically designed interfaces, we develop an automated data collection framework for embodied multi-agent systems and introduce the first benchmark for embodied multi-agent manipulation, RoboFactory. Based on RoboFactory benchmark, we adapt and evaluate the method of imitation learning and analyzed its performance in different difficulty agent tasks. Furthermore, we explore the architectures and training strategies for multi-agent imitation learning, aiming to build safe and efficient embodied multi-agent systems.

* Project page: https://iranqin.github.io/robofactory/

Via

Access Paper or Ask Questions

FedSR: A Semi-Decentralized Federated Learning Algorithm for Non-IIDness in IoT System

Mar 19, 2024

Jianjun Huang, Lixin Ye, Li Kang

Abstract:In the Industrial Internet of Things (IoT), a large amount of data will be generated every day. Due to privacy and security issues, it is difficult to collect all these data together to train deep learning models, thus the federated learning, a distributed machine learning paradigm that protects data privacy, has been widely used in IoT. However, in practical federated learning, the data distributions usually have large differences across devices, and the heterogeneity of data will deteriorate the performance of the model. Moreover, federated learning in IoT usually has a large number of devices involved in training, and the limited communication resource of cloud servers become a bottleneck for training. To address the above issues, in this paper, we combine centralized federated learning with decentralized federated learning to design a semi-decentralized cloud-edge-device hierarchical federated learning framework, which can mitigate the impact of data heterogeneity, and can be deployed at lage scale in IoT. To address the effect of data heterogeneity, we use an incremental subgradient optimization algorithm in each ring cluster to improve the generalization ability of the ring cluster models. Our extensive experiments show that our approach can effectively mitigate the impact of data heterogeneity and alleviate the communication bottleneck in cloud servers.

* 11 pages, 10 figures

Via

Access Paper or Ask Questions

Supervised Anomaly Detection via Conditional Generative Adversarial Network and Ensemble Active Learning

Apr 24, 2021

Zhi Chen, Jiang Duan, Li Kang, Guoping Qiu

Figure 1 for Supervised Anomaly Detection via Conditional Generative Adversarial Network and Ensemble Active Learning

Figure 2 for Supervised Anomaly Detection via Conditional Generative Adversarial Network and Ensemble Active Learning

Figure 3 for Supervised Anomaly Detection via Conditional Generative Adversarial Network and Ensemble Active Learning

Figure 4 for Supervised Anomaly Detection via Conditional Generative Adversarial Network and Ensemble Active Learning

Abstract:Anomaly detection has wide applications in machine intelligence but is still a difficult unsolved problem. Major challenges include the rarity of labeled anomalies and it is a class highly imbalanced problem. Traditional unsupervised anomaly detectors are suboptimal while supervised models can easily make biased predictions towards normal data. In this paper, we present a new supervised anomaly detector through introducing the novel Ensemble Active Learning Generative Adversarial Network (EAL-GAN). EAL-GAN is a conditional GAN having a unique one generator vs. multiple discriminators architecture where anomaly detection is implemented by an auxiliary classifier of the discriminator. In addition to using the conditional GAN to generate class balanced supplementary training data, an innovative ensemble learning loss function ensuring each discriminator makes up for the deficiencies of the others is designed to overcome the class imbalanced problem, and an active learning algorithm is introduced to significantly reduce the cost of labeling real-world data. We present extensive experimental results to demonstrate that the new anomaly detector consistently outperforms a variety of SOTA methods by significant margins. The codes are available on Github.

Via

Access Paper or Ask Questions