Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dezhen Song

SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation

May 01, 2025

Quang P. M. Pham, Khoi T. N. Nguyen, Nhi H. Doan, Cuong A. Pham, Kentaro Inui, Dezhen Song

Abstract:Efficient path planning in robotics, particularly within large-scale, dynamic environments, remains a significant hurdle. While Large Language Models (LLMs) offer strong reasoning capabilities, their high computational cost and limited adaptability in dynamic scenarios hinder real-time deployment on edge devices. We present SmallPlan -- a novel framework leveraging LLMs as teacher models to train lightweight Small Language Models (SLMs) for high-level path planning tasks. In SmallPlan, the SLMs provide optimal action sequences to navigate across scene graphs that compactly represent full-scaled 3D scenes. The SLMs are trained in a simulation-powered, interleaved manner with LLM-guided supervised fine-tuning (SFT) and reinforcement learning (RL). This strategy not only enables SLMs to successfully complete navigation tasks but also makes them aware of important factors like travel distance and number of trials. Through experiments, we demonstrate that the fine-tuned SLMs perform competitively with larger models like GPT-4o on sequential path planning, without suffering from hallucination and overfitting. SmallPlan is resource-efficient, making it well-suited for edge-device deployment and advancing practical autonomous robotics.

* Paper is under review

Via

Access Paper or Ask Questions

Energy Efficient Planning for Repetitive Heterogeneous Tasks in Precision Agriculture

Apr 04, 2025

Shuangyu Xie, Ken Goldberg, Dezhen Song

Abstract:Robotic weed removal in precision agriculture introduces a repetitive heterogeneous task planning (RHTP) challenge for a mobile manipulator. RHTP has two unique characteristics: 1) an observe-first-and-manipulate-later (OFML) temporal constraint that forces a unique ordering of two different tasks for each target and 2) energy savings from efficient task collocation to minimize unnecessary movements. RHTP can be framed as a stochastic renewal process. According to the Renewal Reward Theorem, the expected energy usage per task cycle is the long-run average. Traditional task and motion planning focuses on feasibility rather than optimality due to the unknown object and obstacle position prior to execution. However, the known target/obstacle distribution in precision agriculture allows minimizing the expected energy usage. For each instance in this renewal process, we first compute task space partition, a novel data structure that computes all possibilities of task multiplexing and its probabilities with robot reachability. Then we propose a region-based set-coverage problem to formulate the RHTP as a mixed-integer nonlinear programming. We have implemented and solved RHTP using Branch-and-Bound solver. Compared to a baseline in simulations based on real field data, the results suggest a significant improvement in path length, number of robot stops, overall energy usage, and number of replans.

* ICRA 2025

Via

Access Paper or Ask Questions

Can Large Vision Language Models Read Maps Like a Human?

Mar 18, 2025

Shuo Xing, Zezhou Sun, Shuangyu Xie, Kaiyuan Chen, Yanjia Huang, Yuping Wang, Jiachen Li, Dezhen Song, Zhengzhong Tu

Abstract:In this paper, we introduce MapBench-the first dataset specifically designed for human-readable, pixel-based map-based outdoor navigation, curated from complex path finding scenarios. MapBench comprises over 1600 pixel space map path finding problems from 100 diverse maps. In MapBench, LVLMs generate language-based navigation instructions given a map image and a query with beginning and end landmarks. For each map, MapBench provides Map Space Scene Graph (MSSG) as an indexing data structure to convert between natural language and evaluate LVLM-generated results. We demonstrate that MapBench significantly challenges state-of-the-art LVLMs both zero-shot prompting and a Chain-of-Thought (CoT) augmented reasoning framework that decomposes map navigation into sequential cognitive processes. Our evaluation of both open-source and closed-source LVLMs underscores the substantial difficulty posed by MapBench, revealing critical limitations in their spatial reasoning and structured decision-making capabilities. We release all the code and dataset in https://github.com/taco-group/MapBench.

* 35 pages

Via

Access Paper or Ask Questions

Simulating Automotive Radar with Lidar and Camera Inputs

Mar 11, 2025

Peili Song, Dezhen Song, Yifan Yang, Enfan Lan, Jingtai Liu

Abstract:Low-cost millimeter automotive radar has received more and more attention due to its ability to handle adverse weather and lighting conditions in autonomous driving. However, the lack of quality datasets hinders research and development. We report a new method that is able to simulate 4D millimeter wave radar signals including pitch, yaw, range, and Doppler velocity along with radar signal strength (RSS) using camera image, light detection and ranging (lidar) point cloud, and ego-velocity. The method is based on two new neural networks: 1) DIS-Net, which estimates the spatial distribution and number of radar signals, and 2) RSS-Net, which predicts the RSS of the signal based on appearance and geometric information. We have implemented and tested our method using open datasets from 3 different models of commercial automotive radar. The experimental results show that our method can successfully generate high-fidelity radar signals. Moreover, we have trained a popular object detection neural network with data augmented by our synthesized radar. The network outperforms the counterpart trained only on raw radar data, a promising result to facilitate future radar-based research and development.

* submitted to IROS 2025

Via

Access Paper or Ask Questions

TESGNN: Temporal Equivariant Scene Graph Neural Networks for Efficient and Robust Multi-View 3D Scene Understanding

Nov 15, 2024

Quang P. M. Pham, Khoi T. N. Nguyen, Lan C. Ngo, Dezhen Song, Truong Do, Truong Son Hy

Figure 1 for TESGNN: Temporal Equivariant Scene Graph Neural Networks for Efficient and Robust Multi-View 3D Scene Understanding

Figure 2 for TESGNN: Temporal Equivariant Scene Graph Neural Networks for Efficient and Robust Multi-View 3D Scene Understanding

Figure 3 for TESGNN: Temporal Equivariant Scene Graph Neural Networks for Efficient and Robust Multi-View 3D Scene Understanding

Figure 4 for TESGNN: Temporal Equivariant Scene Graph Neural Networks for Efficient and Robust Multi-View 3D Scene Understanding

Abstract:Scene graphs have proven to be highly effective for various scene understanding tasks due to their compact and explicit representation of relational information. However, current methods often overlook the critical importance of preserving symmetry when generating scene graphs from 3D point clouds, which can lead to reduced accuracy and robustness, particularly when dealing with noisy, multi-view data. This work, to the best of our knowledge, presents the first implementation of an Equivariant Scene Graph Neural Network (ESGNN) to generate semantic scene graphs from 3D point clouds, specifically for enhanced scene understanding. Furthermore, a significant limitation of prior methods is the absence of temporal modeling to capture time-dependent relationships among dynamically evolving entities within a scene. To address this gap, we introduce a novel temporal layer that leverages the symmetry-preserving properties of ESGNN to fuse scene graphs across multiple sequences into a unified global representation by an approximate graph-matching algorithm. Our combined architecture, termed the Temporal Equivariant Scene Graph Neural Network (TESGNN), not only surpasses existing state-of-the-art methods in scene estimation accuracy but also achieves faster convergence. Importantly, TESGNN is computationally efficient and straightforward to implement using existing frameworks, making it well-suited for real-time applications in robotics and computer vision. This approach paves the way for more robust and scalable solutions to complex multi-view scene understanding challenges. Our source code is publicly available at: https://github.com/HySonLab/TESGraph

* arXiv admin note: text overlap with arXiv:2407.00609

Via

Access Paper or Ask Questions

Toward Precise Robotic Weed Flaming Using a Mobile Manipulator with a Flamethrower

Jul 06, 2024

Di Wang, Chengsong Hu, Shuangyu Xie, Joe Johnson, Hojun Ji, Yingtao Jiang, Muthukumar Bagavathiannan, Dezhen Song

Abstract:Robotic weed flaming is a new and environmentally friendly approach to weed removal in the agricultural field. Using a mobile manipulator equipped with a flamethrower, we design a new system and algorithm to enable effective weed flaming, which requires robotic manipulation with a soft and deformable end effector, as the thermal coverage of the flame is affected by dynamic or unknown environmental factors such as gravity, wind, atmospheric pressure, fuel tank pressure, and pose of the nozzle. System development includes overall design, hardware integration, and software pipeline. To enable precise weed removal, the greatest challenge is to detect and predict dynamic flame coverage in real time before motion planning, which is quite different from a conventional rigid gripper in grasping or a spray gun in painting. Based on the images from two onboard infrared cameras and the pose information of the flamethrower nozzle on a mobile manipulator, we propose a new dynamic flame coverage model. The flame model uses a center-arc curve with a Gaussian cross-section model to describe the flame coverage in real time. The experiments have demonstrated the working system and shown that our model and algorithm can achieve a mean average precision (mAP) of more than 76\% in the reprojected images during online prediction.

* IROS 2024

Via

Access Paper or Ask Questions

Calibration System and Algorithm Design for a Soft Hinged Micro Scanning Mirror with a Triaxial Hall Effect Sensor

Nov 24, 2023

Di Wang, Xiaoyu Duan, Shu-Hao Yeh, Jun Zou, Dezhen Song

Figure 1 for Calibration System and Algorithm Design for a Soft Hinged Micro Scanning Mirror with a Triaxial Hall Effect Sensor

Figure 2 for Calibration System and Algorithm Design for a Soft Hinged Micro Scanning Mirror with a Triaxial Hall Effect Sensor

Figure 3 for Calibration System and Algorithm Design for a Soft Hinged Micro Scanning Mirror with a Triaxial Hall Effect Sensor

Figure 4 for Calibration System and Algorithm Design for a Soft Hinged Micro Scanning Mirror with a Triaxial Hall Effect Sensor

Abstract:Micro scanning mirrors (MSM) extend the range and field of view of LiDARs, medical imaging devices, and laser projectors. However, a new class of soft-hinged MSMs contains out-of-plane translation in addition to the 2 degree-of-freedom rotations, which presents a cabliration challenge. We report a new calibration system and algorithm design to address the challenge. In the calibration system, a new low-cost calibration rig design employs a minimal 2-laser beam approach. The new new algorithm builds on the reflection principle and an optimization approach to precisely measure MSM poses. To establish the mapping between Hall sensor readings and MSM poses, we propose a self-synchronizing periodicity-based model fitting calibration approach. We achieve an MSM poses estimation accuracy of 0.020{\deg} with a standard deviation of 0.011{\deg}.

Via

Access Paper or Ask Questions

A Fingertip Sensor and Algorithms for Pre-touch Distance Ranging and Material Detection in Robotic Grasping

Nov 17, 2023

Cheng Fang, Di Wang, Fengzhi Guo, Jun Zou, Dezhen Song

Abstract:To enhance robotic grasping capabilities, we are developing new contactless fingertip sensors to measure distance in close proximity and simultaneously detect the type of material and the interior structure. These sensors are referred to as pre-touch dual-modal and dual-mechanism (PDM$^2$) sensors, and they operate using both pulse-echo ultrasound (US) and optoacoustic (OA) modalities. We present the design of a PDM$^2$ sensor that utilizes a pulsed laser beam and a customized ultrasound transceiver with a wide acoustic bandwidth for ranging and sensing. Both US and OA signals are collected simultaneously, triggered by the same laser pulse. To validate our design, we have fabricated a prototype of the PDM$^2$ sensor and integrated it into an object scanning system. We have also developed algorithms to enable the sensor, including time-of-flight (ToF) auto estimation, ranging rectification, sensor and system calibration, distance ranging, material/structure detection, and object contour detection and reconstruction. The experimental results demonstrate that the new PDM$^2$ sensor and its algorithms effectively enable the object scanning system to achieve satisfactory ranging and contour reconstruction performances, along with satisfying material/structure detection capabilities. In conclusion, the PDM$^2$ sensor offers a practical and powerful solution to improve grasping of unknown objects with the robotic gripper by providing advanced perception capabilities.

Via

Access Paper or Ask Questions

Coupled Active Perception and Manipulation Planning for a Mobile Manipulator in Precision Agriculture Applications

Sep 28, 2023

Shuangyu Xie, Chengsong Hu, Di Wang, Joe Johnson, Muthukumar Bagavathiannan, Dezhen Song

Abstract:A mobile manipulator often finds itself in an application where it needs to take a close-up view before performing a manipulation task. Named this as a coupled active perception and manipulation (CAPM) problem, we model the uncertainty in the perception process and devise a key state/task planning approach that considers reachability conditions as task constraints of both perception and manipulation tasks for the mobile platform. By minimizing the expected energy usage in the body key state planning while satisfying task constraints, our algorithm achieves the best balance between the task success rate and energy usage. We have implemented the algorithm and tested it in both simulation and physical experiments. The results have confirmed that our algorithm has a lower energy consumption compared to a two-stage decoupled approach, while still maintaining a success rate of 100\% for the task.

* submitted to ICRA 2024

Via

Access Paper or Ask Questions

Road Boundary Estimation Using Sparse Automotive Radar Inputs

Sep 15, 2023

Aaron Kingery, Dezhen Song

Abstract:This paper presents a new approach to detecting road boundaries based on sparse radar signals. We model the roadway using a homogeneous model and derive its conditional predictive model under known radar motion. Using the conditional predictive model and model radar points using a Dirichlet Process Mixture Model (DPMM), we employ Mean Field Variational Inference (MFVI) to derive an unconditional road boundary model distribution. In order to generate initial candidate solutions for the MFVI, we develop a custom Random Sample and Consensus (RANSAC) variant to propose unseen model instances as candidate road boundaries. For each radar point cloud we alternate the MFVI and RANSAC proposal steps until convergence to generate the best estimate of all candidate models. We select the candidate model with the minimum lateral distance to the radar on each side as the estimates of the left and right boundaries. We have implemented the proposed algorithm in C++. We have tested the algorithm and it has shown satisfactory results. More specifically, the mean lane boundary estimation error is not more than 11.0 cm.

* 10 pages, 7 figures

Via

Access Paper or Ask Questions