Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiajun Wang

Mono4DEditor: Text-Driven 4D Scene Editing from Monocular Video via Point-Level Localization of Language-Embedded Gaussians

Oct 10, 2025

Jin-Chuan Shi, Chengye Su, Jiajun Wang, Ariel Shamir, Miao Wang

Abstract:Editing 4D scenes reconstructed from monocular videos based on text prompts is a valuable yet challenging task with broad applications in content creation and virtual environments. The key difficulty lies in achieving semantically precise edits in localized regions of complex, dynamic scenes, while preserving the integrity of unedited content. To address this, we introduce Mono4DEditor, a novel framework for flexible and accurate text-driven 4D scene editing. Our method augments 3D Gaussians with quantized CLIP features to form a language-embedded dynamic representation, enabling efficient semantic querying of arbitrary spatial regions. We further propose a two-stage point-level localization strategy that first selects candidate Gaussians via CLIP similarity and then refines their spatial extent to improve accuracy. Finally, targeted edits are performed on localized regions using a diffusion-based video editing model, with flow and scribble guidance ensuring spatial fidelity and temporal coherence. Extensive experiments demonstrate that Mono4DEditor enables high-quality, text-driven edits across diverse scenes and object types, while preserving the appearance and geometry of unedited areas and surpassing prior approaches in both flexibility and visual fidelity.

* 19 pages, 9 figures

Via

Access Paper or Ask Questions

Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel Bugs

May 26, 2025

Zhenhao Zhou, Zhuochen Huang, Yike He, Chong Wang, Jiajun Wang, Yijian Wu, Xin Peng, Yiling Lou

Abstract:The Linux kernel is a critical system, serving as the foundation for numerous systems. Bugs in the Linux kernel can cause serious consequences, affecting billions of users. Fault localization (FL), which aims at identifying the buggy code elements in software, plays an essential role in software quality assurance. While recent LLM agents have achieved promising accuracy in FL on recent benchmarks like SWE-bench, it remains unclear how well these methods perform in the Linux kernel, where FL is much more challenging due to the large-scale code base, limited observability, and diverse impact factors. In this paper, we introduce LinuxFLBench, a FL benchmark constructed from real-world Linux kernel bugs. We conduct an empirical study to assess the performance of state-of-the-art LLM agents on the Linux kernel. Our initial results reveal that existing agents struggle with this task, achieving a best top-1 accuracy of only 41.6% at file level. To address this challenge, we propose LinuxFL$^+$, an enhancement framework designed to improve FL effectiveness of LLM agents for the Linux kernel. LinuxFL$^+$ substantially improves the FL accuracy of all studied agents (e.g., 7.2% - 11.2% accuracy increase) with minimal costs. Data and code are available at https://github.com/FudanSELab/LinuxFLBench.

Via

Access Paper or Ask Questions

Multi-modal Evidential Fusion Network for Trusted PET/CT Tumor Segmentation

Jun 26, 2024

Yuxuan Qi, Li Lin, Jiajun Wang, Jingya Zhang, Bin Zhang

Abstract:Accurate segmentation of tumors in PET/CT images is important in computer-aided diagnosis and treatment of cancer. The key issue of such a segmentation problem lies in the effective integration of complementary information from PET and CT images. However, the quality of PET and CT images varies widely in clinical settings, which leads to uncertainty in the modality information extracted by networks. To take the uncertainty into account in multi-modal information fusion, this paper proposes a novel Multi-modal Evidential Fusion Network (MEFN) comprising a Cross-Modal Feature Learning (CFL) module and a Multi-modal Trusted Fusion (MTF) module. The CFL module reduces the domain gap upon modality conversion and highlights common tumor features, thereby alleviating the needs of the segmentation module to handle modality specificity. The MTF module utilizes mutual attention mechanisms and an uncertainty calibrator to fuse modality features based on modality uncertainty and then fuse the segmentation results under the guidance of Dempster-Shafer Theory. Besides, a new uncertainty perceptual loss is introduced to force the model focusing on uncertain features and hence improve its ability to extract trusted modality information. Extensive comparative experiments are conducted on two publicly available PET/CT datasets to evaluate the performance of our proposed method whose results demonstrate that our MEFN significantly outperforms state-of-the-art methods with improvements of 2.15% and 3.23% in DSC scores on the AutoPET dataset and the Hecktor dataset, respectively. More importantly, our model can provide radiologists with credible uncertainty of the segmentation results for their decision in accepting or rejecting the automatic segmentation results, which is particularly important for clinical applications. Our code will be available at https://github.com/QPaws/MEFN.

Via

Access Paper or Ask Questions

Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation

Jun 04, 2024

Jiajun Wang, Morteza Ghahremani, Yitong Li, Björn Ommer, Christian Wachinger

Figure 1 for Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation

Figure 2 for Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation

Figure 3 for Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation

Figure 4 for Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation

Abstract:Controllable text-to-image (T2I) diffusion models have shown impressive performance in generating high-quality visual content through the incorporation of various conditions. Current methods, however, exhibit limited performance when guided by skeleton human poses, especially in complex pose conditions such as side or rear perspectives of human figures. To address this issue, we present Stable-Pose, a novel adapter model that introduces a coarse-to-fine attention masking strategy into a vision Transformer (ViT) to gain accurate pose guidance for T2I models. Stable-Pose is designed to adeptly handle pose conditions within pre-trained Stable Diffusion, providing a refined and efficient way of aligning pose representation during image synthesis. We leverage the query-key self-attention mechanism of ViTs to explore the interconnections among different anatomical parts in human pose skeletons. Masked pose images are used to smoothly refine the attention maps based on target pose-related features in a hierarchical manner, transitioning from coarse to fine levels. Additionally, our loss function is formulated to allocate increased emphasis to the pose region, thereby augmenting the model's precision in capturing intricate pose details. We assessed the performance of Stable-Pose across five public datasets under a wide range of indoor and outdoor human pose scenarios. Stable-Pose achieved an AP score of 57.1 in the LAION-Human dataset, marking around 13% improvement over the established technique ControlNet. The project link and code is available at https://github.com/ai-med/StablePose.

Via

Access Paper or Ask Questions

Energy-Efficient Routing Protocol Based on Multi-Threshold Segmentation in Wireless Sensors Networks for Precision Agriculture

Jul 03, 2023

Yindi Yao, Xiong Li, Yanpeng Cui, Jiajun Wang, Chen Wang

Figure 1 for Energy-Efficient Routing Protocol Based on Multi-Threshold Segmentation in Wireless Sensors Networks for Precision Agriculture

Figure 2 for Energy-Efficient Routing Protocol Based on Multi-Threshold Segmentation in Wireless Sensors Networks for Precision Agriculture

Figure 3 for Energy-Efficient Routing Protocol Based on Multi-Threshold Segmentation in Wireless Sensors Networks for Precision Agriculture

Figure 4 for Energy-Efficient Routing Protocol Based on Multi-Threshold Segmentation in Wireless Sensors Networks for Precision Agriculture

Abstract:Wireless sensor networks (WSNs), one of the fundamental technologies of the Internet of Things (IoT), can provide sensing and communication services efficiently for IoT-based applications, especially energy-limited applications. Clustering routing protocol plays an important role in reducing energy consumption and prolonging network lifetime. The cluster formation and cluster head selection are the key to improving the performance of the clustering routing protocol. An energy-efficient routing protocol based on multi-threshold segmentation (EERPMS) was proposed in this paper to improve the rationality of cluster formation and cluster head selection. In the stage of cluster formation, inspired by multi-threshold image segmentation, an innovative node clustering algorithm was developed. In the stage of cluster head selection, aiming at minimizing the network energy consumption, a calculation theory of the optimal number and location of cluster heads was established. Furthermore, a novel cluster head selection algorithm was constructed based on the residual energy and optimal location of cluster heads. Simulation results show that EERPMS can improve the distribution uniformity of cluster heads, prolong the network lifetime and save up to 64.50%, 58.60%, and 56.15% network energy as compared to RLEACH, CRPFCM, and FIGWO protocols respectively.

* in IEEE Sensors Journal, vol. 22, no. 7, pp. 6216-6231, 1 Apr. 2022
* 16 pages, 24 figure, 4 tables

Via

Access Paper or Ask Questions

Recursive Hierarchical Projection for Whole-Body Control with Task Priority Transition

Sep 22, 2021

Gang Han, Jiajun Wang, Xiaozhu Ju, Mingguo Zhao

Figure 1 for Recursive Hierarchical Projection for Whole-Body Control with Task Priority Transition

Figure 2 for Recursive Hierarchical Projection for Whole-Body Control with Task Priority Transition

Figure 3 for Recursive Hierarchical Projection for Whole-Body Control with Task Priority Transition

Figure 4 for Recursive Hierarchical Projection for Whole-Body Control with Task Priority Transition

Abstract:Redundant robots are desired to execute multitasks with different priorities simultaneously. The task priorities are necessary to be transitioned for complex task scheduling of whole-body control (WBC). Many methods focused on guaranteeing the control continuity during task priority transition, however either increased the computation consumption or sacrificed the accuracy of tasks inevitably. This work formulates the WBC problem with task priority transition as an Hierarchical Quadratic Programming (HQP) with Recursive Hierarchical Projection (RHP) matrices. The tasks of each level are solved recursively through HQP. We propose the RHP matrix to form the continuously changing projection of each level so that the task priority transition is achieved without increasing computation consumption. Additionally, the recursive approach solves the WBC problem without losing the accuracy of tasks. We verify the effectiveness of this scheme by the comparative simulations of the reactive collision avoidance through multi-tasks priority transitions.

* 6 pages, 9 figures, submitted to ICRA 2022

Via

Access Paper or Ask Questions

Mixed Control for Whole-Body Compliance of a Humanoid Robot

Sep 16, 2021

Xiaozhu Ju, Jiajun Wang, Gang Han, Mingguo Zhao

Figure 1 for Mixed Control for Whole-Body Compliance of a Humanoid Robot

Figure 2 for Mixed Control for Whole-Body Compliance of a Humanoid Robot

Figure 3 for Mixed Control for Whole-Body Compliance of a Humanoid Robot

Figure 4 for Mixed Control for Whole-Body Compliance of a Humanoid Robot

Abstract:The hierarchical quadratic programming (HQP) is commonly applied to consider strict hierarchies of multi-tasks and robot's physical inequality constraints during whole-body compliance. However, for the one-step HQP, the solution can oscillate when it is close to the boundary of constraints. It is because the abrupt hit of the bounds gives rise to unrealisable jerks and even infeasible solutions. This paper proposes the mixed control, which blends the single-axis model predictive control (MPC) and proportional derivate (PD) control for the whole-body compliance to overcome these deficiencies. The MPC predicts the distances between the bounds and the control target of the critical tasks, and it provides smooth and feasible solutions by prediction and optimisation in advance. However, applying MPC will inevitably increase the computation time. Therefore, to achieve a 500 Hz servo rate, the PD controllers still regulate other tasks to save computation resources. Also, we use a more efficient null space projection (NSP) whole-body controller instead of the HQP and distribute the single-axis MPCs into four CPU cores for parallel computation. Finally, we validate the desired capabilities of the proposed strategy via Simulations and the experiment on the humanoid robot Walker X.

* 6 pages, 5 figures, submitted to ICRA 2022

Via

Access Paper or Ask Questions

Whole-Body Control with Motion/Force Transmissibility for Parallel-Legged Robot

Sep 15, 2021

Jiajun Wang, Gang Han, Xiaozhu Ju, Mingguo Zhao

Figure 1 for Whole-Body Control with Motion/Force Transmissibility for Parallel-Legged Robot

Figure 2 for Whole-Body Control with Motion/Force Transmissibility for Parallel-Legged Robot

Figure 3 for Whole-Body Control with Motion/Force Transmissibility for Parallel-Legged Robot

Figure 4 for Whole-Body Control with Motion/Force Transmissibility for Parallel-Legged Robot

Abstract:Whole-body control (WBC) has been applied to the locomotion of legged robots. However, current WBC methods have not considered the intrinsic features of parallel mechanisms, especially motion/force transmissibility (MFT). In this work, we propose an MFT-enhanced WBC scheme. Introducing MFT into a WBC is challenging due to the nonlinear relationship between MFT indices and the robot configuration. To overcome this challenge, we establish the MFT preferable space of the robot and formulate it as a polyhedron in the joint space at the acceleration level. Then, the WBC employs the polyhedron as a soft constraint. As a result, the robot possesses high-speed and high-acceleration capabilities by satisfying this constraint as well as staying away from its singularity. In contrast with the WBC without considering MFT, our proposed scheme is more robust to external disturbances, e.g., push recovery and uneven terrain locomotion. simulations and experiments on a parallel-legged bipedal robot are provided to demonstrate the performance and robustness of the proposed method.

* 6 pages, 7 figures, submitted to ICRA 2022

Via

Access Paper or Ask Questions

Dynamic Balancing of Humanoid Robot Walker3 with Proprioceptive Actuation: Systematic Design of Algorithm, Software and Hardware

Aug 09, 2021

Yan Xie, Jiajun Wang, Hao Dong, Xiaoyu Ren, Liqun Huang, Mingguo Zhao

Figure 1 for Dynamic Balancing of Humanoid Robot Walker3 with Proprioceptive Actuation: Systematic Design of Algorithm, Software and Hardware

Figure 2 for Dynamic Balancing of Humanoid Robot Walker3 with Proprioceptive Actuation: Systematic Design of Algorithm, Software and Hardware

Figure 3 for Dynamic Balancing of Humanoid Robot Walker3 with Proprioceptive Actuation: Systematic Design of Algorithm, Software and Hardware

Figure 4 for Dynamic Balancing of Humanoid Robot Walker3 with Proprioceptive Actuation: Systematic Design of Algorithm, Software and Hardware

Abstract:Dynamic balancing under uncertain disturbances is important for a humanoid robot, which requires a good capability of coordinating the entire body redundancy to execute multi tasks. Whole-body control (WBC) based on hierarchical optimization has been generally accepted and utilized in torque-controlled robots. A good hierarchy is the prerequisite for WBC and can be predefined according to prior knowledge. However, the real-time computation would be problematic in the physical applications considering the computational complexity of WBC. For robots with proprioceptive actuation, the joint friction in gear reducer would also degrade the torque tracking performance. In our paper, a reasonable hierarchy of tasks and constraints is first customized for robot dynamic balancing. Then a real-time WBC is implemented via a computationally efficient WBC software. Such a method is solved on a modular master control system UBTMaster characterized by the real-time communication and powerful computing capability. After the joint friction being well covered by the model identification, extensive experiments on various balancing scenarios are conducted on a humanoid Walker3 with proprioceptive actuation. The robot shows an outstanding balance performance even under external impulses as well as the two feet of the robot suffering the inclination and shift disturbances independently. The results demonstrate that with the strict hierarchy, real-time computation and joint friction being handled carefully, the robot with proprioceptive actuation can manage the dynamic physical interactions with the unstructured environments well.

* journal

Via

Access Paper or Ask Questions