Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yiting Zhang

Certifiably Safe Manipulation of Deformable Linear Objects via Joint Shape and Tension Prediction

May 20, 2025

Yiting Zhang, Shichen Li

Abstract:Manipulating deformable linear objects (DLOs) is challenging due to their complex dynamics and the need for safe interaction in contact-rich environments. Most existing models focus on shape prediction alone and fail to account for contact and tension constraints, which can lead to damage to both the DLO and the robot. In this work, we propose a certifiably safe motion planning and control framework for DLO manipulation. At the core of our method is a predictive model that jointly estimates the DLO's future shape and tension. These predictions are integrated into a real-time trajectory optimizer based on polynomial zonotopes, allowing us to enforce safety constraints throughout the execution. We evaluate our framework on a simulated wire harness assembly task using a 7-DOF robotic arm. Compared to state-of-the-art methods, our approach achieves a higher task success rate while avoiding all safety violations. The results demonstrate that our method enables robust and safe DLO manipulation in contact-rich environments.

* Accepted to ICRA 2025 Workshop on Learning Meets Model-Based Methods for Contact-Rich Manipulation

Via

Access Paper or Ask Questions

XPG-RL: Reinforcement Learning with Explainable Priority Guidance for Efficiency-Boosted Mechanical Search

Apr 29, 2025

Yiting Zhang, Shichen Li, Elena Shrestha

Abstract:Mechanical search (MS) in cluttered environments remains a significant challenge for autonomous manipulators, requiring long-horizon planning and robust state estimation under occlusions and partial observability. In this work, we introduce XPG-RL, a reinforcement learning framework that enables agents to efficiently perform MS tasks through explainable, priority-guided decision-making based on raw sensory inputs. XPG-RL integrates a task-driven action prioritization mechanism with a learned context-aware switching strategy that dynamically selects from a discrete set of action primitives such as target grasping, occlusion removal, and viewpoint adjustment. Within this strategy, a policy is optimized to output adaptive threshold values that govern the discrete selection among action primitives. The perception module fuses RGB-D inputs with semantic and geometric features to produce a structured scene representation for downstream decision-making. Extensive experiments in both simulation and real-world settings demonstrate that XPG-RL consistently outperforms baseline methods in task success rates and motion efficiency, achieving up to 4.5$\times$ higher efficiency in long-horizon tasks. These results underscore the benefits of integrating domain knowledge with learnable decision-making policies for robust and efficient robotic manipulation.

* 13 pages, 5 figures

Via

Access Paper or Ask Questions

AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations

Apr 10, 2025

Junli Liu, Qizhi Chen, Zhigang Wang, Yiwen Tang, Yiting Zhang, Chi Yan, Dong Wang, Xuelong Li, Bin Zhao

Abstract:Visual grounding (VG) aims to localize target objects in an image based on natural language descriptions. In this paper, we propose AerialVG, a new task focusing on visual grounding from aerial views. Compared to traditional VG, AerialVG poses new challenges, \emph{e.g.}, appearance-based grounding is insufficient to distinguish among multiple visually similar objects, and positional relations should be emphasized. Besides, existing VG models struggle when applied to aerial imagery, where high-resolution images cause significant difficulties. To address these challenges, we introduce the first AerialVG dataset, consisting of 5K real-world aerial images, 50K manually annotated descriptions, and 103K objects. Particularly, each annotation in AerialVG dataset contains multiple target objects annotated with relative spatial relations, requiring models to perform comprehensive spatial reasoning. Furthermore, we propose an innovative model especially for the AerialVG task, where a Hierarchical Cross-Attention is devised to focus on target regions, and a Relation-Aware Grounding module is designed to infer positional relations. Experimental results validate the effectiveness of our dataset and method, highlighting the importance of spatial reasoning in aerial visual grounding. The code and dataset will be released.

* 8 pages, 6 figures

Via

Access Paper or Ask Questions

OceanSim: A GPU-Accelerated Underwater Robot Perception Simulation Framework

Mar 03, 2025

Jingyu Song, Haoyu Ma, Onur Bagoren, Advaith V. Sethuraman, Yiting Zhang, Katherine A. Skinner

Abstract:Underwater simulators offer support for building robust underwater perception solutions. Significant work has recently been done to develop new simulators and to advance the performance of existing underwater simulators. Still, there remains room for improvement on physics-based underwater sensor modeling and rendering efficiency. In this paper, we propose OceanSim, a high-fidelity GPU-accelerated underwater simulator to address this research gap. We propose advanced physics-based rendering techniques to reduce the sim-to-real gap for underwater image simulation. We develop OceanSim to fully leverage the computing advantages of GPUs and achieve real-time imaging sonar rendering and fast synthetic data generation. We evaluate the capabilities and realism of OceanSim using real-world data to provide qualitative and quantitative results. The project page for OceanSim is https://umfieldrobotics.github.io/OceanSim.

* 8 pages, 6 figures

Via

Access Paper or Ask Questions

FreeGaussian: Guidance-free Controllable 3D Gaussian Splats with Flow Derivatives

Oct 29, 2024

Qizhi Chen, Delin Qu, Yiwen Tang, Haoming Song, Yiting Zhang, Dong Wang, Bin Zhao, Xuelong Li

Figure 1 for FreeGaussian: Guidance-free Controllable 3D Gaussian Splats with Flow Derivatives

Figure 2 for FreeGaussian: Guidance-free Controllable 3D Gaussian Splats with Flow Derivatives

Figure 3 for FreeGaussian: Guidance-free Controllable 3D Gaussian Splats with Flow Derivatives

Figure 4 for FreeGaussian: Guidance-free Controllable 3D Gaussian Splats with Flow Derivatives

Abstract:Reconstructing controllable Gaussian splats from monocular video is a challenging task due to its inherently insufficient constraints. Widely adopted approaches supervise complex interactions with additional masks and control signal annotations, limiting their real-world applications. In this paper, we propose an annotation guidance-free method, dubbed FreeGaussian, that mathematically derives dynamic Gaussian motion from optical flow and camera motion using novel dynamic Gaussian constraints. By establishing a connection between 2D flows and 3D Gaussian dynamic control, our method enables self-supervised optimization and continuity of dynamic Gaussian motions from flow priors. Furthermore, we introduce a 3D spherical vector controlling scheme, which represents the state with a 3D Gaussian trajectory, thereby eliminating the need for complex 1D control signal calculations and simplifying controllable Gaussian modeling. Quantitative and qualitative evaluations on extensive experiments demonstrate the state-of-the-art visual performance and control capability of our method. Project page: https://freegaussian.github.io.

Via

Access Paper or Ask Questions

An Expeditious Spatial Mean Radiant Temperature Mapping Framework using Visual SLAM and Semantic Segmentation

Oct 12, 2024

Wei Liang, Yiting Zhang, Ji Zhang, Erica Cochran Hameen

Figure 1 for An Expeditious Spatial Mean Radiant Temperature Mapping Framework using Visual SLAM and Semantic Segmentation

Figure 2 for An Expeditious Spatial Mean Radiant Temperature Mapping Framework using Visual SLAM and Semantic Segmentation

Figure 3 for An Expeditious Spatial Mean Radiant Temperature Mapping Framework using Visual SLAM and Semantic Segmentation

Figure 4 for An Expeditious Spatial Mean Radiant Temperature Mapping Framework using Visual SLAM and Semantic Segmentation

Abstract:Ensuring thermal comfort is essential for the well-being and productivity of individuals in built environments. Of the various thermal comfort indicators, the mean radiant temperature (MRT) is very challenging to measure. Most common measurement methodologies are time-consuming and not user-friendly. To address this issue, this paper proposes a novel MRT measurement framework that uses visual simultaneous localization and mapping (SLAM) and semantic segmentation techniques. The proposed approach follows the rule of thumb of the traditional MRT calculation method using surface temperature and view factors. However, it employs visual SLAM and creates a 3D thermal point cloud with enriched surface temperature information. The framework then implements Grounded SAM, a new object detection and segmentation tool to extract features with distinct temperature profiles on building surfaces. The detailed segmentation of thermal features not only reduces potential errors in the calculation of the MRT but also provides an efficient reconstruction of the spatial MRT distribution in the indoor environment. We also validate the calculation results with the reference measurement methodology. This data-driven framework offers faster and more efficient MRT measurements and spatial mapping than conventional methods. It can enable the direct engagement of researchers and practitioners in MRT measurements and contribute to research on thermal comfort and radiant cooling and heating systems.

* Accepted by 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop

Via

Access Paper or Ask Questions

Differentiable Discrete Elastic Rods for Real-Time Modeling of Deformable Linear Objects

Jun 09, 2024

Yizhou Chen, Yiting Zhang, Zachary Brei, Tiancheng Zhang, Yuzhen Chen, Julie Wu, Ram Vasudevan

Abstract:This paper addresses the task of modeling Deformable Linear Objects (DLOs), such as ropes and cables, during dynamic motion over long time horizons. This task presents significant challenges due to the complex dynamics of DLOs. To address these challenges, this paper proposes differentiable Discrete Elastic Rods For deformable linear Objects with Real-time Modeling (DEFORM), a novel framework that combines a differentiable physics-based model with a learning framework to model DLOs accurately and in real-time. The performance of DEFORM is evaluated in an experimental setup involving two industrial robots and a variety of sensors. A comprehensive series of experiments demonstrate the efficacy of DEFORM in terms of accuracy, computational speed, and generalizability when compared to state-of-the-art alternatives. To further demonstrate the utility of DEFORM, this paper integrates it into a perception pipeline and illustrates its superior performance when compared to the state-of-the-art methods while tracking a DLO even in the presence of occlusions. Finally, this paper illustrates the superior performance of DEFORM when compared to state-of-the-art methods when it is applied to perform autonomous planning and control of DLOs.

Via

Access Paper or Ask Questions

Knowledge Graph Large Language Model for Link Prediction

Mar 19, 2024

Dong Shu, Tianle Chen, Mingyu Jin, Yiting Zhang, Chong Zhang, Mengnan Du, Yongfeng Zhang

Abstract:The task of predicting multiple links within knowledge graphs (KGs) stands as a challenge in the field of knowledge graph analysis, a challenge increasingly resolvable due to advancements in natural language processing (NLP) and KG embedding techniques. This paper introduces a novel methodology, the Knowledge Graph Large Language Model Framework (KG-LLM), which leverages pivotal NLP paradigms, including chain-of-thought (CoT) prompting and in-context learning (ICL), to enhance multi-hop link prediction in KGs. By converting the KG to a CoT prompt, our framework is designed to discern and learn the latent representations of entities and their interrelations. To show the efficacy of the KG-LLM Framework, we fine-tune three leading Large Language Models (LLMs) within this framework, employing both non-ICL and ICL tasks for a comprehensive evaluation. Further, we explore the framework's potential to provide LLMs with zero-shot capabilities for handling previously unseen prompts. Our experimental findings discover that integrating ICL and CoT not only augments the performance of our approach but also significantly boosts the models' generalization capacity, thereby ensuring more precise predictions in unfamiliar scenarios.

* 23 pages, 2 figures

Via

Access Paper or Ask Questions

Bayesian Federated Learning Via Expectation Maximization and Turbo Deep Approximate Message Passing

Feb 12, 2024

Wei Xu, An Liu, Yiting Zhang, Vincent Lau

Figure 1 for Bayesian Federated Learning Via Expectation Maximization and Turbo Deep Approximate Message Passing

Figure 2 for Bayesian Federated Learning Via Expectation Maximization and Turbo Deep Approximate Message Passing

Figure 3 for Bayesian Federated Learning Via Expectation Maximization and Turbo Deep Approximate Message Passing

Figure 4 for Bayesian Federated Learning Via Expectation Maximization and Turbo Deep Approximate Message Passing

Abstract:Federated learning (FL) is a machine learning paradigm where the clients possess decentralized training data and the central server handles aggregation and scheduling. Typically, FL algorithms involve clients training their local models using stochastic gradient descent (SGD), which carries drawbacks such as slow convergence and being prone to getting stuck in suboptimal solutions. In this work, we propose a message passing based Bayesian federated learning (BFL) framework to avoid these drawbacks.Specifically, we formulate the problem of deep neural network (DNN) learning and compression and as a sparse Bayesian inference problem, in which group sparse prior is employed to achieve structured model compression. Then, we propose an efficient BFL algorithm called EMTDAMP, where expectation maximization (EM) and turbo deep approximate message passing (TDAMP) are combined to achieve distributed learning and compression. The central server aggregates local posterior distributions to update global posterior distributions and update hyperparameters based on EM to accelerate convergence. The clients perform TDAMP to achieve efficient approximate message passing over DNN with joint prior distribution. We detail the application of EMTDAMP to Boston housing price prediction and handwriting recognition, and present extensive numerical results to demonstrate the advantages of EMTDAMP.

Via

Access Paper or Ask Questions