Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yulin Li

DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception

May 07, 2025

Junjie Wang, Bin Chen, Yulin Li, Bin Kang, Yichi Chen, Zhuotao Tian

Abstract:Dense visual prediction tasks have been constrained by their reliance on predefined categories, limiting their applicability in real-world scenarios where visual concepts are unbounded. While Vision-Language Models (VLMs) like CLIP have shown promise in open-vocabulary tasks, their direct application to dense prediction often leads to suboptimal performance due to limitations in local feature representation. In this work, we present our observation that CLIP's image tokens struggle to effectively aggregate information from spatially or semantically related regions, resulting in features that lack local discriminability and spatial consistency. To address this issue, we propose DeCLIP, a novel framework that enhances CLIP by decoupling the self-attention module to obtain ``content'' and ``context'' features respectively. The ``content'' features are aligned with image crop representations to improve local discriminability, while ``context'' features learn to retain the spatial correlations under the guidance of vision foundation models, such as DINO. Extensive experiments demonstrate that DeCLIP significantly outperforms existing methods across multiple open-vocabulary dense prediction tasks, including object detection and semantic segmentation. Code is available at \textcolor{magenta}{https://github.com/xiaomoguhz/DeCLIP}.

Via

Access Paper or Ask Questions

Interactive Navigation for Legged Manipulators with Learned Arm-Pushing Controller

Mar 03, 2025

Zhihai Bi, Kai Chen, Chunxin Zheng, Yulin Li, Haoang Li, Jun Ma

Figure 1 for Interactive Navigation for Legged Manipulators with Learned Arm-Pushing Controller

Figure 2 for Interactive Navigation for Legged Manipulators with Learned Arm-Pushing Controller

Figure 3 for Interactive Navigation for Legged Manipulators with Learned Arm-Pushing Controller

Figure 4 for Interactive Navigation for Legged Manipulators with Learned Arm-Pushing Controller

Abstract:Interactive navigation is crucial in scenarios where proactively interacting with objects can yield shorter paths, thus significantly improving traversal efficiency. Existing methods primarily focus on using the robot body to relocate large obstacles (which could be comparable to the size of a robot). However, they prove ineffective in narrow or constrained spaces where the robot's dimensions restrict its manipulation capabilities. This paper introduces a novel interactive navigation framework for legged manipulators, featuring an active arm-pushing mechanism that enables the robot to reposition movable obstacles in space-constrained environments. To this end, we develop a reinforcement learning-based arm-pushing controller with a two-stage reward strategy for large-object manipulation. Specifically, this strategy first directs the manipulator to a designated pushing zone to achieve a kinematically feasible contact configuration. Then, the end effector is guided to maintain its position at appropriate contact points for stable object displacement while preventing toppling. The simulations validate the robustness of the arm-pushing controller, showing that the two-stage reward strategy improves policy convergence and long-term performance. Real-world experiments further demonstrate the effectiveness of the proposed navigation framework, which achieves shorter paths and reduced traversal time. The open-source project can be found at https://github.com/Zhihaibi/Interactive-Navigation-for-legged-manipulator.git.

Via

Access Paper or Ask Questions

On the Surprising Robustness of Sequential Convex Optimization for Contact-Implicit Motion Planning

Feb 03, 2025

Yulin Li, Haoyu Han, Shucheng Kang, Jun Ma, Heng Yang

Abstract:Contact-implicit motion planning-embedding contact sequencing as implicit complementarity constraints-holds the promise of leveraging continuous optimization to discover new contact patterns online. Nevertheless, the resulting optimization, being an instance of Mathematical Programming with Complementary Constraints, fails the classical constraint qualifications that are crucial for the convergence of popular numerical solvers. We present robust contact-implicit motion planning with sequential convex programming (CRISP), a solver that departs from the usual primal-dual algorithmic framework but instead only focuses on the primal problem. CRISP solves a convex quadratic program with an adaptive trust region radius at each iteration, and its convergence is evaluated by a merit function using weighted penalty. We (i) provide sufficient conditions on CRISP's convergence to first-order stationary points of the merit function; (ii) release a high-performance C++ implementation of CRISP with a generic nonlinear programming interface; and (iii) demonstrate CRISP's surprising robustness in solving contact-implicit planning with naive initialization. In fact, CRISP solves several contact-implicit problems with all-zero initialization.

Via

Access Paper or Ask Questions

Local Reactive Control for Mobile Manipulators with Whole-Body Safety in Complex Environments

Jan 06, 2025

Chunxin Zheng, Yulin Li, Zhiyuan Song, Zhihai Bi, Jinni Zhou, Boyu Zhou, Jun Ma

Figure 1 for Local Reactive Control for Mobile Manipulators with Whole-Body Safety in Complex Environments

Figure 2 for Local Reactive Control for Mobile Manipulators with Whole-Body Safety in Complex Environments

Figure 3 for Local Reactive Control for Mobile Manipulators with Whole-Body Safety in Complex Environments

Figure 4 for Local Reactive Control for Mobile Manipulators with Whole-Body Safety in Complex Environments

Abstract:Mobile manipulators typically encounter significant challenges in navigating narrow, cluttered environments due to their high-dimensional state spaces and complex kinematics. While reactive methods excel in dynamic settings, they struggle to efficiently incorporate complex, coupled constraints across the entire state space. In this work, we present a novel local reactive controller that reformulates the time-domain single-step problem into a multi-step optimization problem in the spatial domain, leveraging the propagation of a serial kinematic chain. This transformation facilitates the formulation of customized, decoupled link-specific constraints, which is further solved efficiently with augmented Lagrangian differential dynamic programming (AL-DDP). Our approach naturally absorbs spatial kinematic propagation in the forward pass and processes all link-specific constraints simultaneously during the backward pass, enhancing both constraint management and computational efficiency. Notably, in this framework, we formulate collision avoidance constraints for each link using accurate geometric models with extracted free regions, and this improves the maneuverability of the mobile manipulator in narrow, cluttered spaces. Experimental results showcase significant improvements in safety, efficiency, and task completion rates. These findings underscore the robustness of the proposed method, particularly in narrow, cluttered environments where conventional approaches could falter. The open-source project can be found at https://github.com/Chunx1nZHENG/MM-with-Whole-Body-Safety-Release.git.

Via

Access Paper or Ask Questions

UDMC: Unified Decision-Making and Control Framework for Urban Autonomous Driving with Motion Prediction of Traffic Participants

Jan 05, 2025

Haichao Liu, Kai Chen, Yulin Li, Zhenmin Huang, Ming Liu, Jun Ma

Figure 1 for UDMC: Unified Decision-Making and Control Framework for Urban Autonomous Driving with Motion Prediction of Traffic Participants

Figure 2 for UDMC: Unified Decision-Making and Control Framework for Urban Autonomous Driving with Motion Prediction of Traffic Participants

Figure 3 for UDMC: Unified Decision-Making and Control Framework for Urban Autonomous Driving with Motion Prediction of Traffic Participants

Figure 4 for UDMC: Unified Decision-Making and Control Framework for Urban Autonomous Driving with Motion Prediction of Traffic Participants

Abstract:Current autonomous driving systems often struggle to balance decision-making and motion control while ensuring safety and traffic rule compliance, especially in complex urban environments. Existing methods may fall short due to separate handling of these functionalities, leading to inefficiencies and safety compromises. To address these challenges, we introduce UDMC, an interpretable and unified Level 4 autonomous driving framework. UDMC integrates decision-making and motion control into a single optimal control problem (OCP), considering the dynamic interactions with surrounding vehicles, pedestrians, road lanes, and traffic signals. By employing innovative potential functions to model traffic participants and regulations, and incorporating a specialized motion prediction module, our framework enhances on-road safety and rule adherence. The integrated design allows for real-time execution of flexible maneuvers suited to diverse driving scenarios. High-fidelity simulations conducted in CARLA exemplify the framework's computational efficiency, robustness, and safety, resulting in superior driving performance when compared against various baseline models. Our open-source project is available at https://github.com/henryhcliu/udmc_carla.git.

Via

Access Paper or Ask Questions

FRTree Planner: Robot Navigation in Cluttered and Unknown Environments with Tree of Free Regions

Oct 26, 2024

Yulin Li, Zhicheng Song, Chunxin Zheng, Zhihai Bi, Kai Chen, Michael Yu Wang, Jun Ma

Figure 1 for FRTree Planner: Robot Navigation in Cluttered and Unknown Environments with Tree of Free Regions

Figure 2 for FRTree Planner: Robot Navigation in Cluttered and Unknown Environments with Tree of Free Regions

Figure 3 for FRTree Planner: Robot Navigation in Cluttered and Unknown Environments with Tree of Free Regions

Figure 4 for FRTree Planner: Robot Navigation in Cluttered and Unknown Environments with Tree of Free Regions

Abstract:In this work, we present FRTree planner, a novel robot navigation framework that leverages a tree structure of free regions, specifically designed for navigation in cluttered and unknown environments with narrow passages. The framework continuously incorporates real-time perceptive information to identify distinct navigation options and dynamically expands the tree toward explorable and traversable directions. This dynamically constructed tree incrementally encodes the geometric and topological information of the collision-free space, enabling efficient selection of the intermediate goals, navigating around dead-end situations, and avoidance of dynamic obstacles without a prior map. Crucially, our method performs a comprehensive analysis of the geometric relationship between free regions and the robot during online replanning. In particular, the planner assesses the accessibility of candidate passages based on the robot's geometries, facilitating the effective selection of the most viable intermediate goals through accessible narrow passages while minimizing unnecessary detours. By combining the free region information with a bi-level trajectory optimization tailored for robots with specific geometries, our approach generates robust and adaptable obstacle avoidance strategies in confined spaces. Through extensive simulations and real-world experiments, FRTree demonstrates its superiority over benchmark methods in generating safe, efficient motion plans through highly cluttered and unknown terrains with narrow gaps.

Via

Access Paper or Ask Questions

**S-RRT*-based Obstacle Avoidance Autonomous Motion Planner for Continuum-rigid Manipulator**

Sep 27, 2024

Yulin Li, Tetsuro Miyazaki, Yoshiki Yamamoto, Kenji Kawashima

Figure 1 for S-RRT*-based Obstacle Avoidance Autonomous Motion Planner for Continuum-rigid Manipulator

Figure 2 for S-RRT*-based Obstacle Avoidance Autonomous Motion Planner for Continuum-rigid Manipulator

Figure 3 for S-RRT*-based Obstacle Avoidance Autonomous Motion Planner for Continuum-rigid Manipulator

Figure 4 for S-RRT*-based Obstacle Avoidance Autonomous Motion Planner for Continuum-rigid Manipulator

Abstract:Continuum robots are compact and flexible, making them suitable for use in the industries and in medical surgeries. Rapidly-exploring random trees (RRT) are a highly efficient path planning method, and its variant, S-RRT, can generate smooth feasible paths for the end-effector. By combining RRT with inverse instantaneous kinematics (IIK), complete motion planning for the continuum arm can be achieved. Due to the high degrees of freedom of continuum arms, the null space in IIK can be utilized for obstacle avoidance. In this work, we propose a novel approach that uses the S-RRT* algorithm to create paths for the continuum-rigid manipulator. By employing IIK and null space techniques, continuous joint configurations are generated that not only track the path but also enable obstacle avoidance. Simulation results demonstrate that our method effectively handles motion planning and obstacle avoidance while generating high-quality end-effector paths in complex environments. Furthermore, compared to similar IIK methods, our approach exhibits superior computation time.

Via

Access Paper or Ask Questions

Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models

Sep 11, 2024

Jiahang Cao, Qiang Zhang, Jingkai Sun, Jiaxu Wang, Hao Cheng, Yulin Li, Jun Ma, Yecheng Shao, Wen Zhao, Gang Han(+2 more)

Figure 1 for Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models

Figure 2 for Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models

Figure 3 for Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models

Figure 4 for Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models

Abstract:Diffusion models have been widely employed in the field of 3D manipulation due to their efficient capability to learn distributions, allowing for precise prediction of action trajectories. However, diffusion models typically rely on large parameter UNet backbones as policy networks, which can be challenging to deploy on resource-constrained devices. Recently, the Mamba model has emerged as a promising solution for efficient modeling, offering low computational complexity and strong performance in sequence modeling. In this work, we propose the Mamba Policy, a lighter but stronger policy that reduces the parameter count by over 80% compared to the original policy network while achieving superior performance. Specifically, we introduce the XMamba Block, which effectively integrates input information with conditional features and leverages a combination of Mamba and Attention mechanisms for deep feature extraction. Extensive experiments demonstrate that the Mamba Policy excels on the Adroit, Dexart, and MetaWorld datasets, requiring significantly fewer computational resources. Additionally, we highlight the Mamba Policy's enhanced robustness in long-horizon scenarios compared to baseline methods and explore the performance of various Mamba variants within the Mamba Policy framework. Our project page is in https://andycao1125.github.io/mamba_policy/.

* 7 pages, 5 figures

Via

Access Paper or Ask Questions

Deep Causal Learning to Explain and Quantify The Geo-Tension's Impact on Natural Gas Market

Jul 15, 2024

Philipp Kai Peter, Yulin Li, Ziyue Li, Wolfgang Ketter

Figure 1 for Deep Causal Learning to Explain and Quantify The Geo-Tension's Impact on Natural Gas Market

Figure 2 for Deep Causal Learning to Explain and Quantify The Geo-Tension's Impact on Natural Gas Market

Figure 3 for Deep Causal Learning to Explain and Quantify The Geo-Tension's Impact on Natural Gas Market

Figure 4 for Deep Causal Learning to Explain and Quantify The Geo-Tension's Impact on Natural Gas Market

Abstract:Natural gas demand is a crucial factor for predicting natural gas prices and thus has a direct influence on the power system. However, existing methods face challenges in assessing the impact of shocks, such as the outbreak of the Russian-Ukrainian war. In this context, we apply deep neural network-based Granger causality to identify important drivers of natural gas demand. Furthermore, the resulting dependencies are used to construct a counterfactual case without the outbreak of the war, providing a quantifiable estimate of the overall effect of the shock on various German energy sectors. The code and dataset are available at https://github.com/bonaldli/CausalEnergy.

* Accepted by IEEE PES ISGT Europe 2024 conference

Via

Access Paper or Ask Questions

StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond

Jun 04, 2024

Pengyuan Lyu, Yulin Li, Hao Zhou, Weihong Ma, Xingyu Wan, Qunyi Xie, Liang Wu, Chengquan Zhang, Kun Yao, Errui Ding(+1 more)

Abstract:Text-rich images have significant and extensive value, deeply integrated into various aspects of human life. Notably, both visual cues and linguistic symbols in text-rich images play crucial roles in information transmission but are accompanied by diverse challenges. Therefore, the efficient and effective understanding of text-rich images is a crucial litmus test for the capability of Vision-Language Models. We have crafted an efficient vision-language model, StrucTexTv3, tailored to tackle various intelligent tasks for text-rich images. The significant design of StrucTexTv3 is presented in the following aspects: Firstly, we adopt a combination of an effective multi-scale reduced visual transformer and a multi-granularity token sampler (MG-Sampler) as a visual token generator, successfully solving the challenges of high-resolution input and complex representation learning for text-rich images. Secondly, we enhance the perception and comprehension abilities of StrucTexTv3 through instruction learning, seamlessly integrating various text-oriented tasks into a unified framework. Thirdly, we have curated a comprehensive collection of high-quality text-rich images, abbreviated as TIM-30M, encompassing diverse scenarios like incidental scenes, office documents, web pages, and screenshots, thereby improving the robustness of our model. Our method achieved SOTA results in text-rich image perception tasks, and significantly improved performance in comprehension tasks. Among multimodal models with LLM decoder of approximately 1.8B parameters, it stands out as a leader, which also makes the deployment of edge devices feasible. In summary, the StrucTexTv3 model, featuring efficient structural design, outstanding performance, and broad adaptability, offers robust support for diverse intelligent application tasks involving text-rich images, thus exhibiting immense potential for widespread application.

Via

Access Paper or Ask Questions