Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiawei Hu

CheXPO: Preference Optimization for Chest X-ray VLMs with Counterfactual Rationale

Jul 09, 2025

Xiao Liang, Jiawei Hu, Di Wang, Zhi Ma, Lin Zhao, Ronghan Li, Bo Wan, Quan Wang

Abstract:Vision-language models (VLMs) are prone to hallucinations that critically compromise reliability in medical applications. While preference optimization can mitigate these hallucinations through clinical feedback, its implementation faces challenges such as clinically irrelevant training samples, imbalanced data distributions, and prohibitive expert annotation costs. To address these challenges, we introduce CheXPO, a Chest X-ray Preference Optimization strategy that combines confidence-similarity joint mining with counterfactual rationale. Our approach begins by synthesizing a unified, fine-grained multi-task chest X-ray visual instruction dataset across different question types for supervised fine-tuning (SFT). We then identify hard examples through token-level confidence analysis of SFT failures and use similarity-based retrieval to expand hard examples for balancing preference sample distributions, while synthetic counterfactual rationales provide fine-grained clinical preferences, eliminating the need for additional expert input. Experiments show that CheXPO achieves 8.93% relative performance gain using only 5% of SFT samples, reaching state-of-the-art performance across diverse clinical tasks and providing a scalable, interpretable solution for real-world radiology applications.

Via

Access Paper or Ask Questions

NTIRE 2025 Challenge on Image Super-Resolution ($\times$4): Methods and Results

Apr 20, 2025

Zheng Chen, Kai Liu, Jue Gong, Jingkai Wang, Lei Sun, Zongwei Wu, Radu Timofte, Yulun Zhang, Xiangyu Kong, Xiaoxuan Yu(+101 more)

Abstract:This paper presents the NTIRE 2025 image super-resolution ($\times$4) challenge, one of the associated competitions of the 10th NTIRE Workshop at CVPR 2025. The challenge aims to recover high-resolution (HR) images from low-resolution (LR) counterparts generated through bicubic downsampling with a $\times$4 scaling factor. The objective is to develop effective network designs or solutions that achieve state-of-the-art SR performance. To reflect the dual objectives of image SR research, the challenge includes two sub-tracks: (1) a restoration track, emphasizes pixel-wise accuracy and ranks submissions based on PSNR; (2) a perceptual track, focuses on visual realism and ranks results by a perceptual score. A total of 286 participants registered for the competition, with 25 teams submitting valid entries. This report summarizes the challenge design, datasets, evaluation protocol, the main results, and methods of each team. The challenge serves as a benchmark to advance the state of the art and foster progress in image SR.

* NTIRE 2025 webpage: https://www.cvlai.net/ntire/2025. Code: https://github.com/zhengchen1999/NTIRE2025_ImageSR_x4

Via

Access Paper or Ask Questions

LightLLM: A Versatile Large Language Model for Predictive Light Sensing

Nov 20, 2024

Jiawei Hu, Hong Jia, Mahbub Hassan, Lina Yao, Brano Kusy, Wen Hu

Figure 1 for LightLLM: A Versatile Large Language Model for Predictive Light Sensing

Figure 2 for LightLLM: A Versatile Large Language Model for Predictive Light Sensing

Figure 3 for LightLLM: A Versatile Large Language Model for Predictive Light Sensing

Figure 4 for LightLLM: A Versatile Large Language Model for Predictive Light Sensing

Abstract:We propose LightLLM, a model that fine tunes pre-trained large language models (LLMs) for light-based sensing tasks. It integrates a sensor data encoder to extract key features, a contextual prompt to provide environmental information, and a fusion layer to combine these inputs into a unified representation. This combined input is then processed by the pre-trained LLM, which remains frozen while being fine-tuned through the addition of lightweight, trainable components, allowing the model to adapt to new tasks without altering its original parameters. This approach enables flexible adaptation of LLM to specialized light sensing tasks with minimal computational overhead and retraining effort. We have implemented LightLLM for three light sensing tasks: light-based localization, outdoor solar forecasting, and indoor solar estimation. Using real-world experimental datasets, we demonstrate that LightLLM significantly outperforms state-of-the-art methods, achieving 4.4x improvement in localization accuracy and 3.4x improvement in indoor solar estimation when tested in previously unseen environments. We further demonstrate that LightLLM outperforms ChatGPT-4 with direct prompting, highlighting the advantages of LightLLM's specialized architecture for sensor data fusion with textual prompts.

* 15 pages, 14 figures, 5 tables

Via

Access Paper or Ask Questions

ERQ: Error Reduction for Post-Training Quantization of Vision Transformers

Jul 09, 2024

Yunshan Zhong, Jiawei Hu, You Huang, Yuxin Zhang, Rongrong Ji

Figure 1 for ERQ: Error Reduction for Post-Training Quantization of Vision Transformers

Figure 2 for ERQ: Error Reduction for Post-Training Quantization of Vision Transformers

Figure 3 for ERQ: Error Reduction for Post-Training Quantization of Vision Transformers

Figure 4 for ERQ: Error Reduction for Post-Training Quantization of Vision Transformers

Abstract:Post-training quantization (PTQ) for vision transformers (ViTs) has garnered significant attention due to its efficiency in compressing models. However, existing methods typically overlook the intricate interdependence between quantized weight and activation, leading to considerable quantization error. In this paper, we propose ERQ, a two-step PTQ approach meticulously crafted to sequentially reduce the quantization error arising from activation and weight quantization. ERQ first introduces Activation quantization error reduction (Aqer) that strategically formulates the minimization of activation quantization error as a Ridge Regression problem, tackling it by updating weights with full-precision. Subsequently, ERQ introduces Weight quantization error reduction (Wqer) that adopts an iterative approach to mitigate the quantization error induced by weight quantization. In each iteration, an empirically derived, efficient proxy is employed to refine the rounding directions of quantized weights, coupled with a Ridge Regression solver to curtail weight quantization error. Experimental results attest to the effectiveness of our approach. Notably, ERQ surpasses the state-of-the-art GPTQ by 22.36% in accuracy for W3A4 ViT-S.

* ICML2024 (Spotlight)

Via

Access Paper or Ask Questions

Motion Planning for Multiple Mobile Manipulator System in Complex Flipping Manipulation

Dec 11, 2023

Wenhang Liu, Kun Song, Meng Ren, Jiawei Hu, Michael Yu Wang, Zhenhua Xiong

Abstract:Multiple robot systems are favored for object manipulation and transportation, especially for large objects. However, in more complex manipulation such as flipping, these systems encounter a new challenge, configuration disconnectivity of manipulators. Grasping objects by manipulators will impose closed-chain constraints on the system, which in turn limits the feasible motions of manipulators and further compromises the configuration connectivity. Multiple mobile manipulator systems show much more flexibility in object manipulation with the mobility of the mobile platform and have the potential to address the above problem. In this paper, a novel planning framework is proposed for complex flipping manipulation by incorporating platform motions and regrasping. Firstly, two types of trajectories, mobile manipulator planning and regrasping planning, are classified and can be assigned different priorities for different tasks. Secondly, corresponding planning methods are designed for each type of trajectory. Specifically, in mobile manipulator planning, the configuration of the platform is determined through optimization to ensure connectivity when the manipulator approaches configuration boundaries. In regrasping planning, closed-chain constraints are temporarily disregarded and the manipulation capabilities are prioritized to facilitate subsequent planning. Finally, the structure of the overall planning framework is provided. Experimental results demonstrate that the proposed planner efficiently plans the motions of the system to accomplish flipping manipulation. Additionally, a comprehensive experiment emphasizes the significance of our planner in extending the capabilities of multiple mobile manipulator systems in complex tasks.

Via

Access Paper or Ask Questions

I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization

Nov 16, 2023

Yunshan Zhong, Jiawei Hu, Mingbao Lin, Mengzhao Chen, Rongrong Ji

Abstract:Albeit the scalable performance of vision transformers (ViTs), the dense computational costs (training & inference) undermine their position in industrial applications. Post-training quantization (PTQ), tuning ViTs with a tiny dataset and running in a low-bit format, well addresses the cost issue but unluckily bears more performance drops in lower-bit cases. In this paper, we introduce I&S-ViT, a novel method that regulates the PTQ of ViTs in an inclusive and stable fashion. I&S-ViT first identifies two issues in the PTQ of ViTs: (1) Quantization inefficiency in the prevalent log2 quantizer for post-Softmax activations; (2) Rugged and magnified loss landscape in coarse-grained quantization granularity for post-LayerNorm activations. Then, I&S-ViT addresses these issues by introducing: (1) A novel shift-uniform-log2 quantizer (SULQ) that incorporates a shift mechanism followed by uniform quantization to achieve both an inclusive domain representation and accurate distribution approximation; (2) A three-stage smooth optimization strategy (SOS) that amalgamates the strengths of channel-wise and layer-wise quantization to enable stable learning. Comprehensive evaluations across diverse vision tasks validate I&S-ViT' superiority over existing PTQ of ViTs methods, particularly in low-bit scenarios. For instance, I&S-ViT elevates the performance of 3-bit ViT-B by an impressive 50.68%.

Via

Access Paper or Ask Questions

Forward Kinematics of Object Transport by a Multi-Robot System with Deformable Sheet

Oct 18, 2023

Jiawei Hu, Wenhang Liu, Jingang Yi, Zhenhua Xiong

Abstract:We present object handling and transport by a multi-robot team with a deformable sheet as a carrier. Due to the deformability of the sheet and the high dimension of the whole system, it is challenging to clearly describe all the possible positions of the object on the sheet for a given formation of the multi-robot system. A complete forward kinematics (FK) method is proposed in this paper for object handling by an $N$-mobile robot team with a deformable sheet. Based on the virtual variable cables model, a constrained quadratic problem (CQP) is formulated by combining the form closure and minimum potential energy conditions of the system. Analytical solutions to the CQP are presented and then further verified with the force closure condition. With the proposed FK method, all possible solutions are obtained with the given initial sheet shape and the robot team formation. We demonstrate the effectiveness, completeness, and efficiency of the FK method with simulation and experimental results.

* 8 pages, 6 figures, has been submitted to IEEE Robotics and Automation Letters

Via

Access Paper or Ask Questions

Decentralized Coverage Path Planning with Reinforcement Learning and Dual Guidance

Oct 14, 2022

Yongkai Liu, Jiawei Hu, Wei Dong

Figure 1 for Decentralized Coverage Path Planning with Reinforcement Learning and Dual Guidance

Figure 2 for Decentralized Coverage Path Planning with Reinforcement Learning and Dual Guidance

Figure 3 for Decentralized Coverage Path Planning with Reinforcement Learning and Dual Guidance

Figure 4 for Decentralized Coverage Path Planning with Reinforcement Learning and Dual Guidance

Abstract:Planning coverage path for multiple robots in a decentralized way enhances robustness to coverage tasks handling uncertain malfunctions. To achieve high efficiency in a distributed manner for each single robot, a comprehensive understanding of both the complicated environments and cooperative agents intent is crucial. Unfortunately, existing works commonly consider only part of these factors, resulting in imbalanced subareas or unnecessary overlaps. To tackle this issue, we introduce a Decentralized reinforcement learning framework with dual guidance to train each agent to solve the decentralized multiple coverage path planning problem straightly through the environment states. As distributed robots require others intentions to perform better coverage efficiency, we utilize two guidance methods, artificial potential fields and heuristic guidance, to include and integrate others intentions into observations for each robot. With our constructed framework, results have shown our agents successfully learn to determine their own subareas while achieving full coverage, balanced subareas and low overlap rates. We then implement spanning tree cover within those subareas to construct actual routes for each robot and complete given coverage tasks. Our performance is also compared with the state of the art decentralized method showing at most 10 percent lower overlap rates while performing high efficiency in similar environments.

Via

Access Paper or Ask Questions

A Novel Graph-based Motion Planner of Multi-Mobile Robot Systems with Formation and Obstacle Constraints

Oct 07, 2022

Wenhang Liu, Jiawei Hu, Heng Zhang, Michael Yu Wang, Zhenhua Xiong

Figure 1 for A Novel Graph-based Motion Planner of Multi-Mobile Robot Systems with Formation and Obstacle Constraints

Figure 2 for A Novel Graph-based Motion Planner of Multi-Mobile Robot Systems with Formation and Obstacle Constraints

Figure 3 for A Novel Graph-based Motion Planner of Multi-Mobile Robot Systems with Formation and Obstacle Constraints

Figure 4 for A Novel Graph-based Motion Planner of Multi-Mobile Robot Systems with Formation and Obstacle Constraints

Abstract:Multi-mobile robot systems show great advantages over one single robot in many applications. However, the robots are required to form desired task-specified formations, making feasible motions decrease significantly. Thus, it is challenging to determine whether the robots can pass through an obstructed environment under formation constraints, especially in an obstacle-rich environment. Furthermore, is there an optimal path for the robots? To deal with the two problems, a novel graphbased motion planner is proposed in this paper. A mapping between workspace and configuration space of multi-mobile robot systems is first built, where valid configurations can be acquired to satisfy both formation constraints and collision avoidance. Then, an undirected graph is generated by verifying connectivity between valid configurations. The breadth-first search method is employed to answer the question of whether there is a feasible path on the graph. Finally, an optimal path will be planned on the updated graph, considering the cost of path length and formation preference. Simulation results show that the planner can be applied to get optimal motions of robots under formation constraints in obstacle-rich environments. Additionally, different constraints are considered.

Via

Access Paper or Ask Questions

Obstacle Crossing by Multi-mobile Robots in Object Transportation with Deformable Sheet

Nov 17, 2021

Jiawei Hu, Wenhang Liu, Heng Zhang, Zhenhua Xiong

Figure 1 for Obstacle Crossing by Multi-mobile Robots in Object Transportation with Deformable Sheet

Figure 2 for Obstacle Crossing by Multi-mobile Robots in Object Transportation with Deformable Sheet

Figure 3 for Obstacle Crossing by Multi-mobile Robots in Object Transportation with Deformable Sheet

Figure 4 for Obstacle Crossing by Multi-mobile Robots in Object Transportation with Deformable Sheet

Abstract:Multi-robot transportation (MRT) is to transport the object to the destination by the cooperation of multiple robots. In the process of object transportation, obstacle avoidance is an indispensable feature. In traditional local planners, obstacles are usually considered insurmountable, so the robot team bypasses the obstacles as a whole. However, many obstacles can be crossed in real situation. Studying the obstacle crossing ability of robot team can improve the efficiency of MRT and increase the planning success rate in complex environment. Inspired by the patient transfer through bed sheet, this paper focuses on the object transportation by multi-mobile robots with deformable sheet. A new local planner with obstacle crossing capability is proposed, which consists of three parts: deformable sheet modeling, formation optimization and local path generation. It can successfully find an obstacle crossing path in complex scenarios where other planners fail. The effectiveness and the versatility of the planner is verified by a case study with three mobile robots in the experiment and a simulation with four robots.

* 8 pages, 12 figures, a Submission for RAL and ICRA2022

Via

Access Paper or Ask Questions