Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiazhen Liu

JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 Minutes

May 10, 2025

Shalin Anand Jain, Jiazhen Liu, Siva Kailas, Harish Ravichandar

Abstract:Multi-agent reinforcement learning (MARL) has emerged as a promising solution for learning complex and scalable coordination behaviors in multi-robot systems. However, established MARL platforms (e.g., SMAC and MPE) lack robotics relevance and hardware deployment, leaving multi-robot learning researchers to develop bespoke environments and hardware testbeds dedicated to the development and evaluation of their individual contributions. The Multi-Agent RL Benchmark and Learning Environment for the Robotarium (MARBLER) is an exciting recent step in providing a standardized robotics-relevant platform for MARL, by bridging the Robotarium testbed with existing MARL software infrastructure. However, MARBLER lacks support for parallelization and GPU/TPU execution, making the platform prohibitively slow compared to modern MARL environments and hindering adoption. We contribute JaxRobotarium, a Jax-powered end-to-end simulation, learning, deployment, and benchmarking platform for the Robotarium. JaxRobotarium enables rapid training and deployment of multi-robot reinforcement learning (MRRL) policies with realistic robot dynamics and safety constraints, supporting both parallelization and hardware acceleration. Our generalizable learning interface provides an easy-to-use integration with SOTA MARL libraries (e.g., JaxMARL). In addition, JaxRobotarium includes eight standardized coordination scenarios, including four novel scenarios that bring established MARL benchmark tasks (e.g., RWARE and Level-Based Foraging) to a realistic robotics setting. We demonstrate that JaxRobotarium retains high simulation fidelity while achieving dramatic speedups over baseline (20x in training and 150x in simulation), and provides an open-access sim-to-real evaluation pipeline through the Robotarium testbed, accelerating and democratizing access to multi-robot learning research and evaluation.

* 22 pages, 14 figures, 10 tables

Via

Access Paper or Ask Questions

Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions

Oct 15, 2024

Yuhan Fu, Ruobing Xie, Jiazhen Liu, Bangxiang Lan, Xingwu Sun, Zhanhui Kang, Xirong Li

Figure 1 for Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions

Figure 2 for Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions

Figure 3 for Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions

Figure 4 for Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions

Abstract:Hallucinations in multimodal large language models (MLLMs) hinder their practical applications. To address this, we propose a Magnifier Prompt (MagPrompt), a simple yet effective method to tackle hallucinations in MLLMs via extremely simple instructions. MagPrompt is based on the following two key principles, which guide the design of various effective prompts, demonstrating robustness: (1) MLLMs should focus more on the image. (2) When there are conflicts between the image and the model's inner knowledge, MLLMs should prioritize the image. MagPrompt is training-free and can be applied to open-source and closed-source models, such as GPT-4o and Gemini-pro. It performs well across many datasets and its effectiveness is comparable or even better than more complex methods like VCD. Furthermore, our prompt design principles and experimental analyses provide valuable insights into multimodal hallucination.

* 9 pages, 13 tables, 4 figures

Via

Access Paper or Ask Questions

Resilient and Adaptive Replanning for Multi-Robot Target Tracking with Sensing and Communication Danger Zones

Sep 17, 2024

Peihan Li, Yuwei Wu, Jiazhen Liu, Gaurav S. Sukhatme, Vijay Kumar, Lifeng Zhou

Abstract:Multi-robot collaboration for target tracking presents significant challenges in hazardous environments, including addressing robot failures, dynamic priority changes, and other unpredictable factors. Moreover, these challenges are increased in adversarial settings if the environment is unknown. In this paper, we propose a resilient and adaptive framework for multi-robot, multi-target tracking in environments with unknown sensing and communication danger zones. The damages posed by these zones are temporary, allowing robots to track targets while accepting the risk of entering dangerous areas. We formulate the problem as an optimization with soft chance constraints, enabling real-time adjustments to robot behavior based on varying types of dangers and failures. An adaptive replanning strategy is introduced, featuring different triggers to improve group performance. This approach allows for dynamic prioritization of target tracking and risk aversion or resilience, depending on evolving resources and real-time conditions. To validate the effectiveness of the proposed method, we benchmark and evaluate it across multiple scenarios in simulation and conduct several real-world experiments.

Via

Access Paper or Ask Questions

PhD: A Prompted Visual Hallucination Evaluation Dataset

Mar 17, 2024

Jiazhen Liu, Yuhan Fu, Ruobing Xie, Runquan Xie, Xingwu Sun, Fengzong Lian, Zhanhui Kang, Xirong Li

Abstract:The rapid growth of Large Language Models (LLMs) has driven the development of Large Vision-Language Models (LVLMs). The challenge of hallucination, prevalent in LLMs, also emerges in LVLMs. However, most existing efforts mainly focus on object hallucination in LVLM, ignoring diverse types of LVLM hallucinations. In this study, we delve into the Intrinsic Vision-Language Hallucination (IVL-Hallu) issue, thoroughly analyzing different types of IVL-Hallu on their causes and reflections. Specifically, we propose several novel IVL-Hallu tasks and categorize them into four types: (a) object hallucination, which arises from the misidentification of objects, (b) attribute hallucination, which is caused by the misidentification of attributes, (c) multi-modal conflicting hallucination, which derives from the contradictions between textual and visual information, and (d) counter-common-sense hallucination, which owes to the contradictions between the LVLM knowledge and actual images. Based on these taxonomies, we propose a more challenging benchmark named PhD to evaluate and explore IVL-Hallu. An automated pipeline is proposed for generating different types of IVL-Hallu data. Extensive experiments on five SOTA LVLMs reveal their inability to effectively tackle our proposed IVL-Hallu tasks, with detailed analyses and insights on the origins and possible solutions of these new challenging IVL-Hallu tasks, facilitating future researches on IVL-Hallu and LVLM. The benchmark can be accessed at https://github.com/jiazhen-code/IntrinsicHallu

Via

Access Paper or Ask Questions

Artificial Intelligence for Complex Network: Potential, Methodology and Application

Feb 23, 2024

Jingtao Ding, Chang Liu, Yu Zheng, Yunke Zhang, Zihan Yu, Ruikun Li, Hongyi Chen, Jinghua Piao, Huandong Wang, Jiazhen Liu(+1 more)

Abstract:Complex networks pervade various real-world systems, from the natural environment to human societies. The essence of these networks is in their ability to transition and evolve from microscopic disorder-where network topology and node dynamics intertwine-to a macroscopic order characterized by certain collective behaviors. Over the past two decades, complex network science has significantly enhanced our understanding of the statistical mechanics, structures, and dynamics underlying real-world networks. Despite these advancements, there remain considerable challenges in exploring more realistic systems and enhancing practical applications. The emergence of artificial intelligence (AI) technologies, coupled with the abundance of diverse real-world network data, has heralded a new era in complex network science research. This survey aims to systematically address the potential advantages of AI in overcoming the lingering challenges of complex network research. It endeavors to summarize the pivotal research problems and provide an exhaustive review of the corresponding methodologies and applications. Through this comprehensive survey-the first of its kind on AI for complex networks-we expect to provide valuable insights that will drive further research and advancement in this interdisciplinary field.

* 51 pages, 4 figures, 10 tables

Via

Access Paper or Ask Questions

Multi-Robot Localization and Target Tracking with Connectivity Maintenance and Collision Avoidance

Oct 10, 2022

Rahul Zahroof, Jiazhen Liu, Lifeng Zhou, Vijay Kumar

Figure 1 for Multi-Robot Localization and Target Tracking with Connectivity Maintenance and Collision Avoidance

Figure 2 for Multi-Robot Localization and Target Tracking with Connectivity Maintenance and Collision Avoidance

Figure 3 for Multi-Robot Localization and Target Tracking with Connectivity Maintenance and Collision Avoidance

Abstract:We study the problem that requires a team of robots to perform joint localization and target tracking task while ensuring team connectivity and collision avoidance. The problem can be formalized as a nonlinear, non-convex optimization program, which is typically hard to solve. To this end, we design a two-staged approach that utilizes a greedy algorithm to optimize the joint localization and target tracking performance and applies control barrier functions to ensure safety constraints, i.e., maintaining connectivity of the robot team and preventing inter-robot collisions. Simulated Gazebo experiments verify the effectiveness of the proposed approach. We further compare our greedy algorithm to a non-linear optimization solver and a random algorithm, in terms of the joint localization and tracking quality as well as the computation time. The results demonstrate that our greedy algorithm achieves high task quality and runs efficiently.

Via

Access Paper or Ask Questions

Attention-aware Resource Allocation and QoE Analysis for Metaverse xURLLC Services

Aug 11, 2022

Hongyang Du, Jiazhen Liu, Dusit Niyato, Jiawen Kang, Zehui Xiong, Junshan Zhang, Dong In Kim

Figure 1 for Attention-aware Resource Allocation and QoE Analysis for Metaverse xURLLC Services

Figure 2 for Attention-aware Resource Allocation and QoE Analysis for Metaverse xURLLC Services

Figure 3 for Attention-aware Resource Allocation and QoE Analysis for Metaverse xURLLC Services

Figure 4 for Attention-aware Resource Allocation and QoE Analysis for Metaverse xURLLC Services

Abstract:As a virtual world interacting with the real world, Metaverse encapsulates our expectations of the next-generation Internet, bringing new key performance indicators (KPIs). Especially, Metaverse services based on graphical technologies, e.g., virtual traveling, require the low latency of virtual object data transmitting and the high reliability of user instruction uploading. Although conventional ultra-reliable and low-latency communications (URLLC) can satisfy the vast majority of objective service KPIs, it is difficult to offer users a personalized immersive experience that is a distinctive feature of next-generation Internet services. Since the quality of experience (QoE) can be regarded as a comprehensive KPI, the URLLC is evolved towards the next generation URLLC (xURLLC) to achieve higher QoE for Metaverse services by allocating more resources to virtual objects in which users are more interested. In this paper, we study the interaction between the Metaverse service provider (MSP) and the network infrastructure provider (InP) to deploy Metaverse xURLLC services. An optimal contract design framework is provided. Specifically, the utility of the MSP, defined as a function of Metaverse users' QoE, is to be maximized, while ensuring the incentives of the InP. To model the QoE of Metaverse xURLLC services, we propose a novel metric named Meta-Immersion that incorporates both the objective network KPIs and subjective feelings of Metaverse users. Using a user-object-attention level (UOAL) dataset, we develop and validate an attention-aware rendering capacity allocation scheme to improve QoE. It is shown that an average of 20.1% QoE improvement is achieved by the xURLLC compared to the conventional URLLC with the uniform allocation scheme. A higher percentage of QoE improvement, e.g., 40%, is achieved when the total resources are limited.

Via

Access Paper or Ask Questions

Decentralized Risk-Aware Tracking of Multiple Targets

Aug 04, 2022

Jiazhen Liu, Lifeng Zhou, Ragesh Ramachandran, Gaurav S. Sukhatme, Vijay Kumar

Figure 1 for Decentralized Risk-Aware Tracking of Multiple Targets

Figure 2 for Decentralized Risk-Aware Tracking of Multiple Targets

Figure 3 for Decentralized Risk-Aware Tracking of Multiple Targets

Figure 4 for Decentralized Risk-Aware Tracking of Multiple Targets

Abstract:We consider the setting where a team of robots is tasked with tracking multiple targets with the following property: approaching the targets enables more accurate target position estimation, but also increases the risk of sensor failures. Therefore, it is essential to address the trade-off between tracking quality maximization and risk minimization. In our previous work, a centralized controller is developed to plan motions for all the robots -- however, this is not a scalable approach. Here, we present a decentralized and risk-aware multi-target tracking framework, in which each robot plans its motion trading off tracking accuracy maximization and aversion to risk, while only relying on its own information and information exchanged with its neighbors. We use the control barrier function to guarantee network connectivity throughout the tracking process. Extensive numerical experiments demonstrate that our system can achieve similar tracking accuracy and risk-awareness to its centralized counterpart.

* DARS2022 submission preprint

Via

Access Paper or Ask Questions

Semi-Supervised Keypoint Detector and Descriptor for Retinal Image Matching

Jul 16, 2022

Jiazhen Liu, Xirong Li, Qijie Wei, Jie Xu, Dayong Ding

Figure 1 for Semi-Supervised Keypoint Detector and Descriptor for Retinal Image Matching

Figure 2 for Semi-Supervised Keypoint Detector and Descriptor for Retinal Image Matching

Figure 3 for Semi-Supervised Keypoint Detector and Descriptor for Retinal Image Matching

Figure 4 for Semi-Supervised Keypoint Detector and Descriptor for Retinal Image Matching

Abstract:For retinal image matching (RIM), we propose SuperRetina, the first end-to-end method with jointly trainable keypoint detector and descriptor. SuperRetina is trained in a novel semi-supervised manner. A small set of (nearly 100) images are incompletely labeled and used to supervise the network to detect keypoints on the vascular tree. To attack the incompleteness of manual labeling, we propose Progressive Keypoint Expansion to enrich the keypoint labels at each training epoch. By utilizing a keypoint-based improved triplet loss as its description loss, SuperRetina produces highly discriminative descriptors at full input image size. Extensive experiments on multiple real-world datasets justify the viability of SuperRetina. Even with manual labeling replaced by auto labeling and thus making the training process fully manual-annotation free, SuperRetina compares favorably against a number of strong baselines for two RIM tasks, i.e. image registration and identity verification. SuperRetina will be open source.

* Accepted to ECCV 2022

Via

Access Paper or Ask Questions

FV-UPatches: Enhancing Universality in Finger Vein Recognition

Jun 02, 2022

Ziyan Chen, Jiazhen Liu, Changwen Cao, Changlong Jin, Hakil Kim

Figure 1 for FV-UPatches: Enhancing Universality in Finger Vein Recognition

Figure 2 for FV-UPatches: Enhancing Universality in Finger Vein Recognition

Figure 3 for FV-UPatches: Enhancing Universality in Finger Vein Recognition

Figure 4 for FV-UPatches: Enhancing Universality in Finger Vein Recognition

Abstract:Many deep learning-based models have been introduced in finger vein recognition in recent years. These solutions, however, suffer from data dependency and are difficult to achieve model generalization. To address this problem, we are inspired by the idea of domain adaptation and propose a universal learning-based framework, which achieves generalization while training with limited data. To reduce differences between data distributions, a compressed U-Net is introduced as a domain mapper to map the raw region of interest image onto a target domain. The concentrated target domain is a unified feature space for the subsequent matching, in which a local descriptor model SOSNet is employed to embed patches into descriptors measuring the similarity of matching pairs. In the proposed framework, the domain mapper is an approximation to a specific extraction function thus the training is only a one-time effort with limited data. Moreover, the local descriptor model can be trained to be representative enough based on a public dataset of non-finger-vein images. The whole pipeline enables the framework to be well generalized, making it possible to enhance universality and helps to reduce costs of data collection, tuning and retraining. The comparable experimental results to state-of-the-art (SOTA) performance in five public datasets prove the effectiveness of the proposed framework. Furthermore, the framework shows application potential in other vein-based biometric recognition as well.

Via

Access Paper or Ask Questions