Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

En Yen Puang

Learning Stable Robot Grasping with Transformer-based Tactile Control Policies

Jul 30, 2024

En Yen Puang, Zechen Li, Chee Meng Chew, Shan Luo, Yan Wu

Abstract:Measuring grasp stability is an important skill for dexterous robot manipulation tasks, which can be inferred from haptic information with a tactile sensor. Control policies have to detect rotational displacement and slippage from tactile feedback, and determine a re-grasp strategy in term of location and force. Classic stable grasp task only trains control policies to solve for re-grasp location with objects of fixed center of gravity. In this work, we propose a revamped version of stable grasp task that optimises both re-grasp location and gripping force for objects with unknown and moving center of gravity. We tackle this task with a model-free, end-to-end Transformer-based reinforcement learning framework. We show that our approach is able to solve both objectives after training in both simulation and in a real-world setup with zero-shot transfer. We also provide performance analysis of different models to understand the dynamics of optimizing two opposing objectives.

* Accepted by ICIEA 2024

Via

Access Paper or Ask Questions

Hierarchical Point Cloud Encoding and Decoding with Lightweight Self-Attention based Model

Feb 13, 2022

En Yen Puang, Hao Zhang, Hongyuan Zhu, Wei Jing

Figure 1 for Hierarchical Point Cloud Encoding and Decoding with Lightweight Self-Attention based Model

Figure 2 for Hierarchical Point Cloud Encoding and Decoding with Lightweight Self-Attention based Model

Figure 3 for Hierarchical Point Cloud Encoding and Decoding with Lightweight Self-Attention based Model

Figure 4 for Hierarchical Point Cloud Encoding and Decoding with Lightweight Self-Attention based Model

Abstract:In this paper we present SA-CNN, a hierarchical and lightweight self-attention based encoding and decoding architecture for representation learning of point cloud data. The proposed SA-CNN introduces convolution and transposed convolution stacks to capture and generate contextual information among unordered 3D points. Following conventional hierarchical pipeline, the encoding process extracts feature in local-to-global manner, while the decoding process generates feature and point cloud in coarse-to-fine, multi-resolution stages. We demonstrate that SA-CNN is capable of a wide range of applications, namely classification, part segmentation, reconstruction, shape retrieval, and unsupervised classification. While achieving the state-of-the-art or comparable performance in the benchmarks, SA-CNN maintains its model complexity several order of magnitude lower than the others. In term of qualitative results, we visualize the multi-stage point cloud reconstructions and latent walks on rigid objects as well as deformable non-rigid human and robot models.

* Accepted by RA-Letters and ICRA 2022

Via

Access Paper or Ask Questions

End-to-end Reinforcement Learning of Robotic Manipulation with Robust Keypoints Representation

Feb 12, 2022

Tianying Wang, En Yen Puang, Marcus Lee, Yan Wu, Wei Jing

Figure 1 for End-to-end Reinforcement Learning of Robotic Manipulation with Robust Keypoints Representation

Figure 2 for End-to-end Reinforcement Learning of Robotic Manipulation with Robust Keypoints Representation

Figure 3 for End-to-end Reinforcement Learning of Robotic Manipulation with Robust Keypoints Representation

Figure 4 for End-to-end Reinforcement Learning of Robotic Manipulation with Robust Keypoints Representation

Abstract:We present an end-to-end Reinforcement Learning(RL) framework for robotic manipulation tasks, using a robust and efficient keypoints representation. The proposed method learns keypoints from camera images as the state representation, through a self-supervised autoencoder architecture. The keypoints encode the geometric information, as well as the relationship of the tool and target in a compact representation to ensure efficient and robust learning. After keypoints learning, the RL step then learns the robot motion from the extracted keypoints state representation. The keypoints and RL learning processes are entirely done in the simulated environment. We demonstrate the effectiveness of the proposed method on robotic manipulation tasks including grasping and pushing, in different scenarios. We also investigate the generalization capability of the trained model. In addition to the robust keypoints representation, we further apply domain randomization and adversarial training examples to achieve zero-shot sim-to-real transfer in real-world robotic manipulation tasks.

* 8 pages

Via

Access Paper or Ask Questions

KOVIS: Keypoint-based Visual Servoing with Zero-Shot Sim-to-Real Transfer for Robotics Manipulation

Jul 28, 2020

En Yen Puang, Keng Peng Tee, Wei Jing

Figure 1 for KOVIS: Keypoint-based Visual Servoing with Zero-Shot Sim-to-Real Transfer for Robotics Manipulation

Figure 2 for KOVIS: Keypoint-based Visual Servoing with Zero-Shot Sim-to-Real Transfer for Robotics Manipulation

Figure 3 for KOVIS: Keypoint-based Visual Servoing with Zero-Shot Sim-to-Real Transfer for Robotics Manipulation

Figure 4 for KOVIS: Keypoint-based Visual Servoing with Zero-Shot Sim-to-Real Transfer for Robotics Manipulation

Abstract:We present KOVIS, a novel learning-based, calibration-free visual servoing method for fine robotic manipulation tasks with eye-in-hand stereo camera system. We train the deep neural network only in the simulated environment; and the trained model could be directly used for real-world visual servoing tasks. KOVIS consists of two networks. The first keypoint network learns the keypoint representation from the image using with an autoencoder. Then the visual servoing network learns the motion based on keypoints extracted from the camera image. The two networks are trained end-to-end in the simulated environment by self-supervised learning without manual data labeling. After training with data augmentation, domain randomization, and adversarial examples, we are able to achieve zero-shot sim-to-real transfer to real-world robotic manipulation tasks. We demonstrate the effectiveness of the proposed method in both simulated environment and real-world experiment with different robotic manipulation tasks, including grasping, peg-in-hole insertion with 4mm clearance, and M13 screw insertion. The demo video is available at http://youtu.be/gfBJBR2tDzA

* Accepted by IROS 2020

Via

Access Paper or Ask Questions

Multi-path Learning for Object Pose Estimation Across Domains

Aug 01, 2019

Martin Sundermeyer, Maximilian Durner, En Yen Puang, Zoltan-Csaba Marton, Rudolph Triebel

Figure 1 for Multi-path Learning for Object Pose Estimation Across Domains

Figure 2 for Multi-path Learning for Object Pose Estimation Across Domains

Figure 3 for Multi-path Learning for Object Pose Estimation Across Domains

Figure 4 for Multi-path Learning for Object Pose Estimation Across Domains

Abstract:We introduce a scalable approach for object pose estimation trained on simulated RGB views of multiple 3D models together. We learn an encoding of object views that does not only describe the orientation of all objects seen during training, but can also relate views of untrained objects. Our single-encoder-multi-decoder network is trained using a technique we denote "multi-path learning": While the encoder is shared by all objects, each decoder only reconstructs views of a single object. Consequently, views of different instances do not need to be separated in the latent space and can share common features. The resulting encoder generalizes well from synthetic to real data and across various instances, categories, model types and datasets. We systematically investigate the learned encodings, their generalization capabilities and iterative refinement strategies on the ModelNet40 and T-LESS dataset. On T-LESS, we achieve state-of-the-art results with our 6D Object Detection pipeline, both in the RGB and depth domain, outperforming learning-free pipelines at much lower runtimes.

Via

Access Paper or Ask Questions

Team NimbRo at MBZIRC 2017: Autonomous Valve Stem Turning using a Wrench

Oct 06, 2018

Max Schwarz, David Droeschel, Christian Lenz, Arul Selvam Periyasamy, En Yen Puang, Jan Razlaw, Diego Rodriguez, Sebastian Schüller, Michael Schreiber, Sven Behnke

Figure 1 for Team NimbRo at MBZIRC 2017: Autonomous Valve Stem Turning using a Wrench

Figure 2 for Team NimbRo at MBZIRC 2017: Autonomous Valve Stem Turning using a Wrench

Figure 3 for Team NimbRo at MBZIRC 2017: Autonomous Valve Stem Turning using a Wrench

Figure 4 for Team NimbRo at MBZIRC 2017: Autonomous Valve Stem Turning using a Wrench

Abstract:The Mohamed Bin Zayed International Robotics Challenge (MBZIRC) 2017 has defined ambitious new benchmarks to advance the state-of-the-art in autonomous operation of ground-based and flying robots. In this article, we describe our winning entry to MBZIRC Challenge 2: the mobile manipulation robot Mario. It is capable of autonomously solving a valve manipulation task using a wrench tool detected, grasped, and finally employed to turn a valve stem. Mario's omnidirectional base allows both fast locomotion and precise close approach to the manipulation panel. We describe an efficient detector for medium-sized objects in 3D laser scans and apply it to detect the manipulation panel. An object detection architecture based on deep neural networks is used to find and select the correct tool from grayscale images. Parametrized motion primitives are adapted online to percepts of the tool and valve stem in order to turn the stem. We report in detail on our winning performance at the challenge and discuss lessons learned.

* Accepted for Journal of Field Robotics (JFR), Wiley, to appear 2018

Via

Access Paper or Ask Questions