Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chenguang Yang

PCF-Grasp: Converting Point Completion to Geometry Feature to Enhance 6-DoF Grasp

Apr 22, 2025

Yaofeng Cheng, Fusheng Zha, Wei Guo, Pengfei Wang, Chao Zeng, Lining Sun, Chenguang Yang

Abstract:The 6-Degree of Freedom (DoF) grasp method based on point clouds has shown significant potential in enabling robots to grasp target objects. However, most existing methods are based on the point clouds (2.5D points) generated from single-view depth images. These point clouds only have one surface side of the object providing incomplete geometry information, which mislead the grasping algorithm to judge the shape of the target object, resulting in low grasping accuracy. Humans can accurately grasp objects from a single view by leveraging their geometry experience to estimate object shapes. Inspired by humans, we propose a novel 6-DoF grasping framework that converts the point completion results as object shape features to train the 6-DoF grasp network. Here, point completion can generate approximate complete points from the 2.5D points similar to the human geometry experience, and converting it as shape features is the way to utilize it to improve grasp efficiency. Furthermore, due to the gap between the network generation and actual execution, we integrate a score filter into our framework to select more executable grasp proposals for the real robot. This enables our method to maintain a high grasp quality in any camera viewpoint. Extensive experiments demonstrate that utilizing complete point features enables the generation of significantly more accurate grasp proposals and the inclusion of a score filter greatly enhances the credibility of real-world robot grasping. Our method achieves a 17.8\% success rate higher than the state-of-the-art method in real-world experiments.

Via

Access Paper or Ask Questions

Non-Prehensile Tool-Object Manipulation by Integrating LLM-Based Planning and Manoeuvrability-Driven Controls

Dec 09, 2024

Hoi-Yin Lee, Peng Zhou, Anqing Duan, Wanyu Ma, Chenguang Yang, David Navarro-Alarcon

Abstract:The ability to wield tools was once considered exclusive to human intelligence, but it's now known that many other animals, like crows, possess this capability. Yet, robotic systems still fall short of matching biological dexterity. In this paper, we investigate the use of Large Language Models (LLMs), tool affordances, and object manoeuvrability for non-prehensile tool-based manipulation tasks. Our novel method leverages LLMs based on scene information and natural language instructions to enable symbolic task planning for tool-object manipulation. This approach allows the system to convert the human language sentence into a sequence of feasible motion functions. We have developed a novel manoeuvrability-driven controller using a new tool affordance model derived from visual feedback. This controller helps guide the robot's tool utilization and manipulation actions, even within confined areas, using a stepping incremental approach. The proposed methodology is evaluated with experiments to prove its effectiveness under various manipulation scenarios.

Via

Access Paper or Ask Questions

Fine-Grained Alignment in Vision-and-Language Navigation through Bayesian Optimization

Nov 22, 2024

Yuhang Song, Mario Gianni, Chenguang Yang, Kunyang Lin, Te-Chuan Chiu, Anh Nguyen, Chun-Yi Lee

Figure 1 for Fine-Grained Alignment in Vision-and-Language Navigation through Bayesian Optimization

Figure 2 for Fine-Grained Alignment in Vision-and-Language Navigation through Bayesian Optimization

Figure 3 for Fine-Grained Alignment in Vision-and-Language Navigation through Bayesian Optimization

Figure 4 for Fine-Grained Alignment in Vision-and-Language Navigation through Bayesian Optimization

Abstract:This paper addresses the challenge of fine-grained alignment in Vision-and-Language Navigation (VLN) tasks, where robots navigate realistic 3D environments based on natural language instructions. Current approaches use contrastive learning to align language with visual trajectory sequences. Nevertheless, they encounter difficulties with fine-grained vision negatives. To enhance cross-modal embeddings, we introduce a novel Bayesian Optimization-based adversarial optimization framework for creating fine-grained contrastive vision samples. To validate the proposed methodology, we conduct a series of experiments to assess the effectiveness of the enriched embeddings on fine-grained vision negatives. We conduct experiments on two common VLN benchmarks R2R and REVERIE, experiments on the them demonstrate that these embeddings benefit navigation, and can lead to a promising performance enhancement. Our source code and trained models are available at: https://anonymous.4open.science/r/FGVLN.

Via

Access Paper or Ask Questions

Redundant Observer-Based Tracking Control for Object Extraction Using a Cable Connected UAV

Oct 31, 2024

Benjamin J. Marshall, Yunda Yan, James Knowles, Chenguang Yang, Cunjia Liu

Figure 1 for Redundant Observer-Based Tracking Control for Object Extraction Using a Cable Connected UAV

Figure 2 for Redundant Observer-Based Tracking Control for Object Extraction Using a Cable Connected UAV

Figure 3 for Redundant Observer-Based Tracking Control for Object Extraction Using a Cable Connected UAV

Figure 4 for Redundant Observer-Based Tracking Control for Object Extraction Using a Cable Connected UAV

Abstract:A new disturbance observer based control scheme is developed for a quadrotor under the concurrent disturbances from a lightweight elastic tether cable and a lumped vertical disturbance. This elastic tether is unusual as it creates a disturbance proportional to the multicopter's translational movement. This paper takes an observer-based approach to estimate the stiffness coefficient of the cable and uses the system model to update the estimates of the external forces, which are then compensated in the control action. Given that the tethered cable force affects both horizontal channels of the quadrotor and is also coupled with the vertical channel, the proposed disturbance observer is constructed to exploit the redundant measurements across all three channels to jointly estimate the cable stiffness and the vertical disturbance. A pseudo-inverse method is used to determine the observer gain functions, such that the estimation of the two quantities is decoupled and stable. Compared to standard disturbance observers which assume nearly constant disturbances, the proposed approach can quickly adjust its total force estimate as the tethered quadrotor changes its position or tautness of the tether. This is applied to two experiments - a tracking performance test where the multicopter moves under a constant tether strain, and an object extraction test. In the second test, the multicopter manipulates a nonlinear mechanism mimicking the extraction of a wedged object. In both cases, the proposed approach shows significant improvement over standard Disturbance Observer and Extended State Observer approaches. A video summary of the experiments can be found at https://youtu.be/9gKr13WTj-k.

Via

Access Paper or Ask Questions

Learning effective pruning at initialization from iterative pruning

Aug 27, 2024

Shengkai Liu, Yaofeng Cheng, Fusheng Zha, Wei Guo, Lining Sun, Zhenshan Bing, Chenguang Yang

Abstract:Pruning at initialization (PaI) reduces training costs by removing weights before training, which becomes increasingly crucial with the growing network size. However, current PaI methods still have a large accuracy gap with iterative pruning, especially at high sparsity levels. This raises an intriguing question: can we get inspiration from iterative pruning to improve the PaI performance? In the lottery ticket hypothesis, the iterative rewind pruning (IRP) finds subnetworks retroactively by rewinding the parameter to the original initialization in every pruning iteration, which means all the subnetworks are based on the initial state. Here, we hypothesise the surviving subnetworks are more important and bridge the initial feature and their surviving score as the PaI criterion. We employ an end-to-end neural network (\textbf{AutoS}parse) to learn this correlation, input the model's initial features, output their score and then prune the lowest score parameters before training. To validate the accuracy and generalization of our method, we performed PaI across various models. Results show that our approach outperforms existing methods in high-sparsity settings. Notably, as the underlying logic of model pruning is consistent in different models, only one-time IRP on one model is needed (e.g., once IRP on ResNet-18/CIFAR-10, AutoS can be generalized to VGG-16/CIFAR-10, ResNet-18/TinyImageNet, et al.). As the first neural network-based PaI method, we conduct extensive experiments to validate the factors influencing this approach. These results reveal the learning tendencies of neural networks and provide new insights into our understanding and research of PaI from a practical perspective. Our code is available at: https://github.com/ChengYaofeng/AutoSparse.git.

Via

Access Paper or Ask Questions

TacShade A New 3D-printed Soft Optical Tactile Sensor Based on Light, Shadow and Greyscale for Shape Reconstruction

Jun 01, 2024

Zhenyu Lu, Jialong Yang, Haoran Li, Yifan Li, Weiyong Si, Nathan Lepora, Chenguang Yang

Figure 1 for TacShade A New 3D-printed Soft Optical Tactile Sensor Based on Light, Shadow and Greyscale for Shape Reconstruction

Figure 2 for TacShade A New 3D-printed Soft Optical Tactile Sensor Based on Light, Shadow and Greyscale for Shape Reconstruction

Figure 3 for TacShade A New 3D-printed Soft Optical Tactile Sensor Based on Light, Shadow and Greyscale for Shape Reconstruction

Figure 4 for TacShade A New 3D-printed Soft Optical Tactile Sensor Based on Light, Shadow and Greyscale for Shape Reconstruction

Abstract:In this paper, we present the TacShade a newly designed 3D-printed soft optical tactile sensor. The sensor is developed for shape reconstruction under the inspiration of sketch drawing that uses the density of sketch lines to draw light and shadow, resulting in the creation of a 3D-view effect. TacShade, building upon the strengths of the TacTip, a single-camera tactile sensor of large in-depth deformation and being sensitive to edge and surface following, improves the structure in that the markers are distributed within the gap of papillae pins. Variations in light, dark, and grey effects can be generated inside the sensor through external contact interactions. The contours of the contacting objects are outlined by white markers, while the contact depth characteristics can be indirectly obtained from the distribution of black pins and white markers, creating a 2.5D visualization. Based on the imaging effect, we improve the Shape from Shading (SFS) algorithm to process tactile images, enabling a coarse but fast reconstruction for the contact objects. Two experiments are performed. The first verifies TacShade s ability to reconstruct the shape of the contact objects through one image for object distinction. The second experiment shows the shape reconstruction capability of TacShade for a large panel with ridged patterns based on the location of robots and image splicing technology.

* This paper has been accepted by ICRA 2024

Via

Access Paper or Ask Questions

Revolutionizing Packaging: A Robotic Bagging Pipeline with Constraint-aware Structure-of-Interest Planning

Mar 15, 2024

Jiaming Qi, Peng Zhou, Pai Zheng, Hongmin Wu, Chenguang Yang, David Navarro-Alarcon, Jia Pan

Figure 1 for Revolutionizing Packaging: A Robotic Bagging Pipeline with Constraint-aware Structure-of-Interest Planning

Figure 2 for Revolutionizing Packaging: A Robotic Bagging Pipeline with Constraint-aware Structure-of-Interest Planning

Figure 3 for Revolutionizing Packaging: A Robotic Bagging Pipeline with Constraint-aware Structure-of-Interest Planning

Figure 4 for Revolutionizing Packaging: A Robotic Bagging Pipeline with Constraint-aware Structure-of-Interest Planning

Abstract:Bagging operations, common in packaging and assisted living applications, are challenging due to a bag's complex deformable properties. To address this, we develop a robotic system for automated bagging tasks using an adaptive structure-of-interest (SOI) manipulation approach. Our method relies on real-time visual feedback to dynamically adjust manipulation without requiring prior knowledge of bag materials or dynamics. We present a robust pipeline featuring state estimation for SOIs using Gaussian Mixture Models (GMM), SOI generation via optimization-based bagging techniques, SOI motion planning with Constrained Bidirectional Rapidly-exploring Random Trees (CBiRRT), and dual-arm manipulation coordinated by Model Predictive Control (MPC). Experiments demonstrate the system's ability to achieve precise, stable bagging of various objects using adaptive coordination of the manipulators. The proposed framework advances the capability of dual-arm robots to perform more sophisticated automation of common tasks involving interactions with deformable objects.

Via

Access Paper or Ask Questions

Bimanual Deformable Bag Manipulation Using a Structure-of-Interest Based Latent Dynamics Model

Jan 21, 2024

Peng Zhou, Pai Zheng, Jiaming Qi, Chenxi Li, Chenguang Yang, David Navarro-Alarcon, Jia Pan

Abstract:The manipulation of deformable objects by robotic systems presents a significant challenge due to their complex and infinite-dimensional configuration spaces. This paper introduces a novel approach to Deformable Object Manipulation (DOM) by emphasizing the identification and manipulation of Structures of Interest (SOIs) in deformable fabric bags. We propose a bimanual manipulation framework that leverages a Graph Neural Network (GNN)-based latent dynamics model to succinctly represent and predict the behavior of these SOIs. Our approach involves constructing a graph representation from partial point cloud data of the object and learning the latent dynamics model that effectively captures the essential deformations of the fabric bag within a reduced computational space. By integrating this latent dynamics model with Model Predictive Control (MPC), we empower robotic manipulators to perform precise and stable manipulation tasks focused on the SOIs. We have validated our framework through various empirical experiments demonstrating its efficacy in bimanual manipulation of fabric bags. Our contributions not only address the complexities inherent in DOM but also provide new perspectives and methodologies for enhancing robotic interactions with deformable objects by concentrating on their critical structural elements. Experimental videos can be obtained from https://sites.google.com/view/bagbot.

Via

Access Paper or Ask Questions

Robot-Assisted Deep Venous Thrombosis Ultrasound Examination using Virtual Fixture

Jan 04, 2024

Dianye Huang, Chenguang Yang, Mingchuan Zhou, Angelos Karlas, Nassir Navab, Zhongliang Jiang

Abstract:Deep Venous Thrombosis (DVT) is a common vascular disease with blood clots inside deep veins, which may block blood flow or even cause a life-threatening pulmonary embolism. A typical exam for DVT using ultrasound (US) imaging is by pressing the target vein until its lumen is fully compressed. However, the compression exam is highly operator-dependent. To alleviate intra- and inter-variations, we present a robotic US system with a novel hybrid force motion control scheme ensuring position and force tracking accuracy, and soft landing of the probe onto the target surface. In addition, a path-based virtual fixture is proposed to realize easy human-robot interaction for repeat compression operation at the lesion location. To ensure the biometric measurements obtained in different examinations are comparable, the 6D scanning path is determined in a coarse-to-fine manner using both an external RGBD camera and US images. The RGBD camera is first used to extract a rough scanning path on the object. Then, the segmented vascular lumen from US images are used to optimize the scanning path to ensure the visibility of the target object. To generate a continuous scan path for developing virtual fixtures, an arc-length based path fitting model considering both position and orientation is proposed. Finally, the whole system is evaluated on a human-like arm phantom with an uneven surface.

* Accepted Paper IEEE T-ASE

Via

Access Paper or Ask Questions

Implicit Subgoal Planning with Variational Autoencoders for Long-Horizon Sparse Reward Robotic Tasks

Dec 25, 2023

Fangyuan Wang, Anqing Duan, Peng Zhou, Shengzeng Huo, Guodong Guo, Chenguang Yang, David Navarro-Alarcon

Abstract:The challenges inherent to long-horizon tasks in robotics persist due to the typical inefficient exploration and sparse rewards in traditional reinforcement learning approaches. To alleviate these challenges, we introduce a novel algorithm, Variational Autoencoder-based Subgoal Inference (VAESI), to accomplish long-horizon tasks through a divide-and-conquer manner. VAESI consists of three components: a Variational Autoencoder (VAE)-based Subgoal Generator, a Hindsight Sampler, and a Value Selector. The VAE-based Subgoal Generator draws inspiration from the human capacity to infer subgoals and reason about the final goal in the context of these subgoals. It is composed of an explicit encoder model, engineered to generate subgoals, and an implicit decoder model, designed to enhance the quality of the generated subgoals by predicting the final goal. Additionally, the Hindsight Sampler selects valid subgoals from an offline dataset to enhance the feasibility of the generated subgoals. The Value Selector utilizes the value function in reinforcement learning to filter the optimal subgoals from subgoal candidates. To validate our method, we conduct several long-horizon tasks in both simulation and the real world, including one locomotion task and three manipulation tasks. The obtained quantitative and qualitative data indicate that our approach achieves promising performance compared to other baseline methods. These experimental results can be seen in the website \url{https://sites.google.com/view/vaesi/home}.

* 14 pages, 16 figures

Via

Access Paper or Ask Questions