Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Youngbin Park

Acceleration of Actor-Critic Deep Reinforcement Learning for Visual Grasping in Clutter by State Representation Learning Based on Disentanglement of a Raw Input Image

Feb 27, 2020

Taewon Kim, Yeseong Park, Youngbin Park, Il Hong Suh

Figure 1 for Acceleration of Actor-Critic Deep Reinforcement Learning for Visual Grasping in Clutter by State Representation Learning Based on Disentanglement of a Raw Input Image

Figure 2 for Acceleration of Actor-Critic Deep Reinforcement Learning for Visual Grasping in Clutter by State Representation Learning Based on Disentanglement of a Raw Input Image

Figure 3 for Acceleration of Actor-Critic Deep Reinforcement Learning for Visual Grasping in Clutter by State Representation Learning Based on Disentanglement of a Raw Input Image

Figure 4 for Acceleration of Actor-Critic Deep Reinforcement Learning for Visual Grasping in Clutter by State Representation Learning Based on Disentanglement of a Raw Input Image

Abstract:For a robotic grasping task in which diverse unseen target objects exist in a cluttered environment, some deep learning-based methods have achieved state-of-the-art results using visual input directly. In contrast, actor-critic deep reinforcement learning (RL) methods typically perform very poorly when grasping diverse objects, especially when learning from raw images and sparse rewards. To make these RL techniques feasible for vision-based grasping tasks, we employ state representation learning (SRL), where we encode essential information first for subsequent use in RL. However, typical representation learning procedures are unsuitable for extracting pertinent information for learning the grasping skill, because the visual inputs for representation learning, where a robot attempts to grasp a target object in clutter, are extremely complex. We found that preprocessing based on the disentanglement of a raw input image is the key to effectively capturing a compact representation. This enables deep RL to learn robotic grasping skills from highly varied and diverse visual inputs. We demonstrate the effectiveness of this approach with varying levels of disentanglement in a realistic simulated environment.

Via

Access Paper or Ask Questions

Tracking Human-like Natural Motion Using Deep Recurrent Neural Networks

Apr 15, 2016

Youngbin Park, Sungphill Moon, Il Hong Suh

Figure 1 for Tracking Human-like Natural Motion Using Deep Recurrent Neural Networks

Figure 2 for Tracking Human-like Natural Motion Using Deep Recurrent Neural Networks

Figure 3 for Tracking Human-like Natural Motion Using Deep Recurrent Neural Networks

Figure 4 for Tracking Human-like Natural Motion Using Deep Recurrent Neural Networks

Abstract:Kinect skeleton tracker is able to achieve considerable human body tracking performance in convenient and a low-cost manner. However, The tracker often captures unnatural human poses such as discontinuous and vibrated motions when self-occlusions occur. A majority of approaches tackle this problem by using multiple Kinect sensors in a workspace. Combination of the measurements from different sensors is then conducted in Kalman filter framework or optimization problem is formulated for sensor fusion. However, these methods usually require heuristics to measure reliability of measurements observed from each Kinect sensor. In this paper, we developed a method to improve Kinect skeleton using single Kinect sensor, in which supervised learning technique was employed to correct unnatural tracking motions. Specifically, deep recurrent neural networks were used for improving joint positions and velocities of Kinect skeleton, and three methods were proposed to integrate the refined positions and velocities for further enhancement. Moreover, we suggested a novel measure to evaluate naturalness of captured motions. We evaluated the proposed approach by comparison with the ground truth obtained using a commercial optical maker-based motion capture system.

* submitted to ECCV 2016

Via

Access Paper or Ask Questions