Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Cailian Chen

RouteMoA: Dynamic Routing without Pre-Inference Boosts Efficient Mixture-of-Agents

Jan 26, 2026

Jize Wang, Han Wu, Zhiyuan You, Yiming Song, Yijun Wang, Zifei Shan, Yining Li, Songyang Zhang, Xinyi Le, Cailian Chen(+2 more)

Abstract:Mixture-of-Agents (MoA) improves LLM performance through layered collaboration, but its dense topology raises costs and latency. Existing methods employ LLM judges to filter responses, yet still require all models to perform inference before judging, failing to cut costs effectively. They also lack model selection criteria and struggle with large model pools, where full inference is costly and can exceed context limits. To address this, we propose RouteMoA, an efficient mixture-of-agents framework with dynamic routing. It employs a lightweight scorer to perform initial screening by predicting coarse-grained performance from the query, narrowing candidates to a high-potential subset without inference. A mixture of judges then refines these scores through lightweight self- and cross-assessment based on existing model outputs, providing posterior correction without additional inference. Finally, a model ranking mechanism selects models by balancing performance, cost, and latency. RouteMoA outperforms MoA across varying tasks and model pool sizes, reducing cost by 89.8% and latency by 63.6% in the large-scale model pool.

Via

Access Paper or Ask Questions

NTIRE 2025 Image Shadow Removal Challenge Report

Jun 18, 2025

Florin-Alexandru Vasluianu, Tim Seizinger, Zhuyun Zhou, Cailian Chen, Zongwei Wu, Radu Timofte, Mingjia Li, Jin Hu, Hainuo Wang, Hengxing Liu(+72 more)

Figure 1 for NTIRE 2025 Image Shadow Removal Challenge Report

Figure 2 for NTIRE 2025 Image Shadow Removal Challenge Report

Figure 3 for NTIRE 2025 Image Shadow Removal Challenge Report

Figure 4 for NTIRE 2025 Image Shadow Removal Challenge Report

Abstract:This work examines the findings of the NTIRE 2025 Shadow Removal Challenge. A total of 306 participants have registered, with 17 teams successfully submitting their solutions during the final evaluation phase. Following the last two editions, this challenge had two evaluation tracks: one focusing on reconstruction fidelity and the other on visual perception through a user study. Both tracks were evaluated with images from the WSRD+ dataset, simulating interactions between self- and cast-shadows with a large number of diverse objects, textures, and materials.

Via

Access Paper or Ask Questions

CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs

Dec 27, 2024

Siyu Wang, Cailian Chen, Xinyi Le, Qimin Xu, Lei Xu, Yanzhou Zhang, Jie Yang

Figure 1 for CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs

Figure 2 for CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs

Figure 3 for CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs

Figure 4 for CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs

Abstract:Computer-aided design (CAD) significantly enhances the efficiency, accuracy, and innovation of design processes by enabling precise 2D and 3D modeling, extensive analysis, and optimization. Existing methods for creating CAD models rely on latent vectors or point clouds, which are difficult to obtain and costly to store. Recent advances in Multimodal Large Language Models (MLLMs) have inspired researchers to use natural language instructions and images for CAD model construction. However, these models still struggle with inferring accurate 3D spatial location and orientation, leading to inaccuracies in determining the spatial 3D starting points and extrusion directions for constructing geometries. This work introduces CAD-GPT, a CAD synthesis method with spatial reasoning-enhanced MLLM that takes either a single image or a textual description as input. To achieve precise spatial inference, our approach introduces a 3D Modeling Spatial Mechanism. This method maps 3D spatial positions and 3D sketch plane rotation angles into a 1D linguistic feature space using a specialized spatial unfolding mechanism, while discretizing 2D sketch coordinates into an appropriate planar space to enable precise determination of spatial starting position, sketch orientation, and 2D sketch coordinate translations. Extensive experiments demonstrate that CAD-GPT consistently outperforms existing state-of-the-art methods in CAD model synthesis, both quantitatively and qualitatively.

Via

Access Paper or Ask Questions

GTA: A Benchmark for General Tool Agents

Jul 11, 2024

Jize Wang, Zerun Ma, Yining Li, Songyang Zhang, Cailian Chen, Kai Chen, Xinyi Le

Figure 1 for GTA: A Benchmark for General Tool Agents

Figure 2 for GTA: A Benchmark for General Tool Agents

Figure 3 for GTA: A Benchmark for General Tool Agents

Figure 4 for GTA: A Benchmark for General Tool Agents

Abstract:Significant focus has been placed on integrating large language models (LLMs) with various tools in developing general-purpose agents. This poses a challenge to LLMs' tool-use capabilities. However, there are evident gaps between existing tool-use evaluations and real-world scenarios. Current evaluations often use AI-generated queries, single-step tasks, dummy tools, and text-only interactions, failing to reveal the agents' real-world problem-solving abilities effectively. To address this, we propose GTA, a benchmark for General Tool Agents, featuring three main aspects: (i) Real user queries: human-written queries with simple real-world objectives but implicit tool-use, requiring the LLM to reason the suitable tools and plan the solution steps. (ii) Real deployed tools: an evaluation platform equipped with tools across perception, operation, logic, and creativity categories to evaluate the agents' actual task execution performance. (iii) Real multimodal inputs: authentic image files, such as spatial scenes, web page screenshots, tables, code snippets, and printed/handwritten materials, used as the query contexts to align with real-world scenarios closely. We design 229 real-world tasks and executable tool chains to evaluate mainstream LLMs. Our findings show that real-world user queries are challenging for existing LLMs, with GPT-4 completing less than 50% of the tasks and most LLMs achieving below 25%. This evaluation reveals the bottlenecks in the tool-use capabilities of current LLMs in real-world scenarios, which provides future direction for advancing general-purpose tool agents. The code and dataset are available at https://github.com/open-compass/GTA.

* Github repo: https://github.com/open-compass/GTA

Via

Access Paper or Ask Questions

Safety-Critical Optimal Control for Robotic Manipulators in A Cluttered Environment

Nov 11, 2022

Xuda Ding, Han Wang, Yi Ren, Yu Zheng, Cailian Chen, Jianping He

Abstract:Designing safety-critical control for robotic manipulators is challenging, especially in a cluttered environment. First, the actual trajectory of a manipulator might deviate from the planned one due to the complex collision environments and non-trivial dynamics, leading to collision; Second, the feasible space for the manipulator is hard to obtain since the explicit distance functions between collision meshes are unknown. By analyzing the relationship between the safe set and the controlled invariant set, this paper proposes a data-driven control barrier function (CBF) construction method, which extracts CBF from distance samples. Specifically, the CBF guarantees the controlled invariant property for considering the system dynamics. The data-driven method samples the distance function and determines the safe set. Then, the CBF is synthesized based on the safe set by a scenario-based sum of square (SOS) program. Unlike most existing linearization based approaches, our method reserves the volume of the feasible space for planning without approximation, which helps find a solution in a cluttered environment. The control law is obtained by solving a CBF-based quadratic program in real time, which works as a safe filter for the desired planning-based controller. Moreover, our method guarantees safety with the proven probabilistic result. Our method is validated on a 7-DOF manipulator in both real and virtual cluttered environments. The experiments show that the manipulator is able to execute tasks where the clearance between obstacles is in millimeters.

* Submitted to IEEE RA-L

Via

Access Paper or Ask Questions

CANS: Communication Limited Camera Network Self-Configuration for Intelligent Industrial Surveillance

Sep 13, 2021

Jingzheng Tu, Qimin Xu, Cailian Chen

Figure 1 for CANS: Communication Limited Camera Network Self-Configuration for Intelligent Industrial Surveillance

Figure 2 for CANS: Communication Limited Camera Network Self-Configuration for Intelligent Industrial Surveillance

Figure 3 for CANS: Communication Limited Camera Network Self-Configuration for Intelligent Industrial Surveillance

Figure 4 for CANS: Communication Limited Camera Network Self-Configuration for Intelligent Industrial Surveillance

Abstract:Realtime and intelligent video surveillance via camera networks involve computation-intensive vision detection tasks with massive video data, which is crucial for safety in the edge-enabled industrial Internet of Things (IIoT). Multiple video streams compete for limited communication resources on the link between edge devices and camera networks, resulting in considerable communication congestion. It postpones the completion time and degrades the accuracy of vision detection tasks. Thus, achieving high accuracy of vision detection tasks under the communication constraints and vision task deadline constraints is challenging. Previous works focus on single camera configuration to balance the tradeoff between accuracy and processing time of detection tasks by setting video quality parameters. In this paper, an adaptive camera network self-configuration method (CANS) of video surveillance is proposed to cope with multiple video streams of heterogeneous quality of service (QoS) demands for edge-enabled IIoT. Moreover, it adapts to video content and network dynamics. Specifically, the tradeoff between two key performance metrics, \emph{i.e.,} accuracy and latency, is formulated as an NP-hard optimization problem with latency constraints. Simulation on real-world surveillance datasets demonstrates that the proposed CANS method achieves low end-to-end latency (13 ms on average) with high accuracy (92\% on average) with network dynamics. The results validate the effectiveness of the CANS.

* 6 pages, 11 figures

Via

Access Paper or Ask Questions

Low-Latency Federated Learning over Wireless Channels with Differential Privacy

Jun 20, 2021

Kang Wei, Jun Li, Chuan Ma, Ming Ding, Cailian Chen, Shi Jin, Zhu Han, H. Vincent Poor

Figure 1 for Low-Latency Federated Learning over Wireless Channels with Differential Privacy

Figure 2 for Low-Latency Federated Learning over Wireless Channels with Differential Privacy

Figure 3 for Low-Latency Federated Learning over Wireless Channels with Differential Privacy

Figure 4 for Low-Latency Federated Learning over Wireless Channels with Differential Privacy

Abstract:In federated learning (FL), model training is distributed over clients and local models are aggregated by a central server. The performance of uploaded models in such situations can vary widely due to imbalanced data distributions, potential demands on privacy protections, and quality of transmissions. In this paper, we aim to minimize FL training delay over wireless channels, constrained by overall training performance as well as each client's differential privacy (DP) requirement. We solve this problem in the framework of multi-agent multi-armed bandit (MAMAB) to deal with the situation where there are multiple clients confornting different unknown transmission environments, e.g., channel fading and interferences. Specifically, we first transform the long-term constraints on both training performance and each client's DP into a virtual queue based on the Lyapunov drift technique. Then, we convert the MAMAB to a max-min bipartite matching problem at each communication round, by estimating rewards with the upper confidence bound (UCB) approach. More importantly, we propose two efficient solutions to this matching problem, i.e., modified Hungarian algorithm and greedy matching with a better alternative (GMBA), in which the first one can achieve the optimal solution with a high complexity while the second one approaches a better trade-off by enabling a verified low-complexity with little performance loss. In addition, we develop an upper bound on the expected regret of this MAMAB based FL framework, which shows a linear growth over the logarithm of communication rounds, justifying its theoretical feasibility. Extensive experimental results are conducted to validate the effectiveness of our proposed algorithms, and the impacts of various parameters on the FL performance over wireless edge networks are also discussed.

Via

Access Paper or Ask Questions

On Topology Inference for Networked Dynamical Systems: Principles and Performances

Jun 02, 2021

Yushan Li, Jianping He, Cailian Chen, Xinping Guan

Figure 1 for On Topology Inference for Networked Dynamical Systems: Principles and Performances

Figure 2 for On Topology Inference for Networked Dynamical Systems: Principles and Performances

Figure 3 for On Topology Inference for Networked Dynamical Systems: Principles and Performances

Figure 4 for On Topology Inference for Networked Dynamical Systems: Principles and Performances

Abstract:Topology inference for networked dynamical systems (NDSs) plays a crucial role in many areas. Knowledge of the system topology can aid in detecting anomalies, spotting trends, predicting future behavior and so on. Different from the majority of pioneering works, this paper investigates the principles and performances of topology inference from the perspective of node causality and correlation. Specifically, we advocate a comprehensive analysis framework to unveil the mutual relationship, convergence and accuracy of the proposed methods and other benchmark methods, i.e., the Granger and ordinary least square (OLS) estimators. Our method allows for unknown observation noises, both asymptotic and marginal stabilities for NDSs, while encompasses a correlation-based modification design to alleviate performance degradation in small observation scale. To explicitly demonstrate the inference performance of the estimators, we leverage the concentration measure in Gaussian space, and derive the non-asymptotic rates of the inference errors for linear time-invariant (LTI) cases. Considering when the observations are not sufficient to support the estimators, we provide an excitation-based method to infer the one-hop and multi-hop neighbors with probability guarantees. Furthermore, we point out the theoretical results can be extended to switching topologies and nonlinear dynamics cases. Extensive simulations highlight the outperformance of the proposed method.

Via

Access Paper or Ask Questions

Learning-based Intelligent Attack against Mobile Robots with Obstacle-avoidance

Oct 14, 2019

Yushan Li, Jianping He, Cailian Chen, Xinping Guan

Figure 1 for Learning-based Intelligent Attack against Mobile Robots with Obstacle-avoidance

Figure 2 for Learning-based Intelligent Attack against Mobile Robots with Obstacle-avoidance

Figure 3 for Learning-based Intelligent Attack against Mobile Robots with Obstacle-avoidance

Figure 4 for Learning-based Intelligent Attack against Mobile Robots with Obstacle-avoidance

Abstract:The security issue of mobile robots have attracted considerable attention in recent years. Most existing works focus on detection and countermeasures for some classic attacks from cyberspace. Nevertheless, those work are generally based on some prior assumptions for the attacker (e.g., the system dynamics is known, or internal access is compromised). A few work are delicated to physical attacks, however, there still lacks certain intelligence and advanced control design. In this paper, we propose a physical-based and intelligent attack framework against the obstacle-avoidance of mobile robots. The novelty of our work lies in the following: i) Without any prior information of the system dynamics, the attacker can learn the detection area and goal position of a mobile robot by trial and observation, and the obstacle-avoidance mechanism is learned by support vector regression (SVR) method; ii) Considering different attack requirements, different attack strategies are proposed to implement the attack efficiently; iii) The framework is suitable for holonomic and non-holonomic mobile robots, and the algorithm performance analysis about time complexity and optimality is provided. Furthermore, the condition is obtained to guarantee the success of the attack. Simulations illustrate the effectiveness of the proposed framework.

Via

Access Paper or Ask Questions

Efficient Metropolitan Traffic Prediction Based on Graph Recurrent Neural Network

Nov 02, 2018

Xiaoyu Wang, Cailian Chen, Yang Min, Jianping He, Bo Yang, Yang Zhang

Figure 1 for Efficient Metropolitan Traffic Prediction Based on Graph Recurrent Neural Network

Figure 2 for Efficient Metropolitan Traffic Prediction Based on Graph Recurrent Neural Network

Figure 3 for Efficient Metropolitan Traffic Prediction Based on Graph Recurrent Neural Network

Figure 4 for Efficient Metropolitan Traffic Prediction Based on Graph Recurrent Neural Network

Abstract:Traffic prediction is a fundamental and vital task in Intelligence Transportation System (ITS), but it is very challenging to get high accuracy while containing low computational complexity due to the spatiotemporal characteristics of traffic flow, especially under the metropolitan circumstances. In this work, a new topological framework, called Linkage Network, is proposed to model the road networks and present the propagation patterns of traffic flow. Based on the Linkage Network model, a novel online predictor, named Graph Recurrent Neural Network (GRNN), is designed to learn the propagation patterns in the graph. It could simultaneously predict traffic flow for all road segments based on the information gathered from the whole graph, which thus reduces the computational complexity significantly from O(nm) to O(n+m), while keeping the high accuracy. Moreover, it can also predict the variations of traffic trends. Experiments based on real-world data demonstrate that the proposed method outperforms the existing prediction methods.

* 8 pages, 7 figures

Via

Access Paper or Ask Questions