Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dominik Bauer

RoboPanoptes: The All-seeing Robot with Whole-body Dexterity

Jan 09, 2025

Xiaomeng Xu, Dominik Bauer, Shuran Song

Abstract:We present RoboPanoptes, a capable yet practical robot system that achieves whole-body dexterity through whole-body vision. Its whole-body dexterity allows the robot to utilize its entire body surface for manipulation, such as leveraging multiple contact points or navigating constrained spaces. Meanwhile, whole-body vision uses a camera system distributed over the robot's surface to provide comprehensive, multi-perspective visual feedback of its own and the environment's state. At its core, RoboPanoptes uses a whole-body visuomotor policy that learns complex manipulation skills directly from human demonstrations, efficiently aggregating information from the distributed cameras while maintaining resilience to sensor failures. Together, these design aspects unlock new capabilities and tasks, allowing RoboPanoptes to unbox in narrow spaces, sweep multiple or oversized objects, and succeed in multi-step stowing in cluttered environments, outperforming baselines in adaptability and efficiency. Results are best viewed on https://robopanoptes.github.io.

* Project website: https://robopanoptes.github.io

Via

Access Paper or Ask Questions

Real2Code: Reconstruct Articulated Objects via Code Generation

Jun 12, 2024

Zhao Mandi, Yijia Weng, Dominik Bauer, Shuran Song

Figure 1 for Real2Code: Reconstruct Articulated Objects via Code Generation

Figure 2 for Real2Code: Reconstruct Articulated Objects via Code Generation

Figure 3 for Real2Code: Reconstruct Articulated Objects via Code Generation

Figure 4 for Real2Code: Reconstruct Articulated Objects via Code Generation

Abstract:We present Real2Code, a novel approach to reconstructing articulated objects via code generation. Given visual observations of an object, we first reconstruct its part geometry using an image segmentation model and a shape completion model. We then represent the object parts with oriented bounding boxes, which are input to a fine-tuned large language model (LLM) to predict joint articulation as code. By leveraging pre-trained vision and language models, our approach scales elegantly with the number of articulated parts, and generalizes from synthetic training data to real world objects in unstructured environments. Experimental results demonstrate that Real2Code significantly outperforms previous state-of-the-art in reconstruction accuracy, and is the first approach to extrapolate beyond objects' structural complexity in the training set, and reconstructs objects with up to 10 articulated parts. When incorporated with a stereo reconstruction model, Real2Code also generalizes to real world objects from a handful of multi-view RGB images, without the need for depth or camera information.

Via

Access Paper or Ask Questions

DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects

Apr 18, 2024

Dominik Bauer, Zhenjia Xu, Shuran Song

Figure 1 for DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects

Figure 2 for DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects

Figure 3 for DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects

Figure 4 for DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects

Abstract:Manipulation of elastoplastic objects like dough often involves topological changes such as splitting and merging. The ability to accurately predict these topological changes that a specific action might incur is critical for planning interactions with elastoplastic objects. We present DoughNet, a Transformer-based architecture for handling these challenges, consisting of two components. First, a denoising autoencoder represents deformable objects of varying topology as sets of latent codes. Second, a visual predictive model performs autoregressive set prediction to determine long-horizon geometrical deformation and topological changes purely in latent space. Given a partial initial state and desired manipulation trajectories, it infers all resulting object geometries and topologies at each step. DoughNet thereby allows to plan robotic manipulation; selecting a suited tool, its pose and opening width to recreate robot- or human-made goals. Our experiments in simulated and real environments show that DoughNet is able to significantly outperform related approaches that consider deformation only as geometrical change.

* Under review. 17 pages, 14 figures

Via

Access Paper or Ask Questions

TrackAgent: 6D Object Tracking via Reinforcement Learning

Jul 28, 2023

Konstantin Röhrl, Dominik Bauer, Timothy Patten, Markus Vincze

Figure 1 for TrackAgent: 6D Object Tracking via Reinforcement Learning

Figure 2 for TrackAgent: 6D Object Tracking via Reinforcement Learning

Figure 3 for TrackAgent: 6D Object Tracking via Reinforcement Learning

Figure 4 for TrackAgent: 6D Object Tracking via Reinforcement Learning

Abstract:Tracking an object's 6D pose, while either the object itself or the observing camera is moving, is important for many robotics and augmented reality applications. While exploiting temporal priors eases this problem, object-specific knowledge is required to recover when tracking is lost. Under the tight time constraints of the tracking task, RGB(D)-based methods are often conceptionally complex or rely on heuristic motion models. In comparison, we propose to simplify object tracking to a reinforced point cloud (depth only) alignment task. This allows us to train a streamlined approach from scratch with limited amounts of sparse 3D point clouds, compared to the large datasets of diverse RGBD sequences required in previous works. We incorporate temporal frame-to-frame registration with object-based recovery by frame-to-model refinement using a reinforcement learning (RL) agent that jointly solves for both objectives. We also show that the RL agent's uncertainty and a rendering-based mask propagation are effective reinitialization triggers.

* International Conference on Computer Vision Systems (ICVS) 2023

Via

Access Paper or Ask Questions

Challenges for Monocular 6D Object Pose Estimation in Robotics

Jul 22, 2023

Stefan Thalhammer, Dominik Bauer, Peter Hönig, Jean-Baptiste Weibel, José García-Rodríguez, Markus Vincze

Figure 1 for Challenges for Monocular 6D Object Pose Estimation in Robotics

Figure 2 for Challenges for Monocular 6D Object Pose Estimation in Robotics

Figure 3 for Challenges for Monocular 6D Object Pose Estimation in Robotics

Figure 4 for Challenges for Monocular 6D Object Pose Estimation in Robotics

Abstract:Object pose estimation is a core perception task that enables, for example, object grasping and scene understanding. The widely available, inexpensive and high-resolution RGB sensors and CNNs that allow for fast inference based on this modality make monocular approaches especially well suited for robotics applications. We observe that previous surveys on object pose estimation establish the state of the art for varying modalities, single- and multi-view settings, and datasets and metrics that consider a multitude of applications. We argue, however, that those works' broad scope hinders the identification of open challenges that are specific to monocular approaches and the derivation of promising future challenges for their application in robotics. By providing a unified view on recent publications from both robotics and computer vision, we find that occlusion handling, novel pose representations, and formalizing and improving category-level pose estimation are still fundamental challenges that are highly relevant for robotics. Moreover, to further improve robotic performance, large object sets, novel objects, refractive materials, and uncertainty estimates are central, largely unsolved open challenges. In order to address them, ontological reasoning, deformability handling, scene-level reasoning, realistic datasets, and the ecological footprint of algorithms need to be improved.

* arXiv admin note: substantial text overlap with arXiv:2302.11827

Via

Access Paper or Ask Questions

A Framework for Designing Anthropomorphic Soft Hands through Interaction

Jun 07, 2023

Pragna Mannam, Kenneth Shaw, Dominik Bauer, Jean Oh, Deepak Pathak, Nancy Pollard

Figure 1 for A Framework for Designing Anthropomorphic Soft Hands through Interaction

Figure 2 for A Framework for Designing Anthropomorphic Soft Hands through Interaction

Figure 3 for A Framework for Designing Anthropomorphic Soft Hands through Interaction

Figure 4 for A Framework for Designing Anthropomorphic Soft Hands through Interaction

Abstract:Modeling and simulating soft robot hands can aid in design iteration for complex and high degree-of-freedom (DoF) morphologies. This can be further supplemented by iterating on the design based on its performance in real world manipulation tasks. However, this requires a framework that allows us to iterate quickly at low costs. In this paper, we present a framework that leverages rapid prototyping of the hand using 3D-printing, and utilizes teleoperation to evaluate the hand in real world manipulation tasks. Using this framework, we design a 3D-printed 16-DoF dexterous anthropomorphic soft hand (DASH) and iteratively improve its design over three iterations. Rapid prototyping techniques such as 3D-printing allow us to directly evaluate the fabricated hand without modeling it in simulation. We show that the design is improved at each iteration through the hand's performance in 30 real-world teleoperated manipulation tasks. Testing over 600 demonstrations shows that our final version of DASH can solve 16 of the 30 tasks compared to Allegro, a popular rigid hand in the market, which can only solve 7 tasks. We open-source our CAD models as well as the teleoperated dataset for further study and are available on our website (https://dash-through-interaction.github.io.)

Via

Access Paper or Ask Questions

SporeAgent: Reinforced Scene-level Plausibility for Object Pose Refinement

Jan 01, 2022

Dominik Bauer, Timothy Patten, Markus Vincze

Figure 1 for SporeAgent: Reinforced Scene-level Plausibility for Object Pose Refinement

Figure 2 for SporeAgent: Reinforced Scene-level Plausibility for Object Pose Refinement

Figure 3 for SporeAgent: Reinforced Scene-level Plausibility for Object Pose Refinement

Figure 4 for SporeAgent: Reinforced Scene-level Plausibility for Object Pose Refinement

Abstract:Observational noise, inaccurate segmentation and ambiguity due to symmetry and occlusion lead to inaccurate object pose estimates. While depth- and RGB-based pose refinement approaches increase the accuracy of the resulting pose estimates, they are susceptible to ambiguity in the observation as they consider visual alignment. We propose to leverage the fact that we often observe static, rigid scenes. Thus, the objects therein need to be under physically plausible poses. We show that considering plausibility reduces ambiguity and, in consequence, allows poses to be more accurately predicted in cluttered environments. To this end, we extend a recent RL-based registration approach towards iterative refinement of object poses. Experiments on the LINEMOD and YCB-VIDEO datasets demonstrate the state-of-the-art performance of our depth-based refinement approach.

* IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2022

Via

Access Paper or Ask Questions

Towards Very Low-Cost Iterative Prototyping for Fully Printable Dexterous Soft Robotic Hands

Nov 02, 2021

Dominik Bauer, Cornelia Bauer, Arjun Lakshmipathy, Roberto Shu, Nancy S. Pollard

Figure 1 for Towards Very Low-Cost Iterative Prototyping for Fully Printable Dexterous Soft Robotic Hands

Figure 2 for Towards Very Low-Cost Iterative Prototyping for Fully Printable Dexterous Soft Robotic Hands

Figure 3 for Towards Very Low-Cost Iterative Prototyping for Fully Printable Dexterous Soft Robotic Hands

Figure 4 for Towards Very Low-Cost Iterative Prototyping for Fully Printable Dexterous Soft Robotic Hands

Abstract:The design and fabrication of soft robot hands is still a time-consuming and difficult process. Advances in rapid prototyping have accelerated the fabrication process significantly while introducing new complexities into the design process. In this work, we present an approach that utilizes novel low-cost fabrication techniques in conjunction with design tools helping soft hand designers to systematically take advantage of multi-material 3D printing to create dexterous soft robotic hands. While very low cost and lightweight, we show that generated designs are highly durable, surprisingly strong, and capable of dexterous grasping.

Via

Access Paper or Ask Questions

Contact Transfer: A Direct, User-Driven Method for Human to Robot Transfer of Grasps and Manipulations

Oct 29, 2021

Arjun Lakshmipathy, Dominik Bauer, Cornelia Bauer, Nancy S. Pollard

Figure 1 for Contact Transfer: A Direct, User-Driven Method for Human to Robot Transfer of Grasps and Manipulations

Figure 2 for Contact Transfer: A Direct, User-Driven Method for Human to Robot Transfer of Grasps and Manipulations

Figure 3 for Contact Transfer: A Direct, User-Driven Method for Human to Robot Transfer of Grasps and Manipulations

Figure 4 for Contact Transfer: A Direct, User-Driven Method for Human to Robot Transfer of Grasps and Manipulations

Abstract:We present a novel method for the direct transfer of grasps and manipulations between objects and hands through utilization of contact areas. Our method fully preserves contact shapes, and in contrast to existing techniques, is not dependent on grasp families, requires no model training or grasp sampling, makes no assumptions about manipulator morphology or kinematics, and allows user control over both transfer parameters and solution optimization. Despite these accommodations, we show that our method is capable of synthesizing kinematically feasible whole hand poses in seconds even for poor initializations or hard to reach contacts. We additionally highlight the method's benefits in both response to design alterations as well as fast approximation over in-hand manipulation sequences. Finally, we demonstrate a solution generated by our method on a physical, custom designed prosthetic hand.

Via

Access Paper or Ask Questions

ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning

Mar 28, 2021

Dominik Bauer, Timothy Patten, Markus Vincze

Figure 1 for ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning

Figure 2 for ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning

Figure 3 for ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning

Figure 4 for ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning

Abstract:Point cloud registration is a common step in many 3D computer vision tasks such as object pose estimation, where a 3D model is aligned to an observation. Classical registration methods generalize well to novel domains but fail when given a noisy observation or a bad initialization. Learning-based methods, in contrast, are more robust but lack in generalization capacity. We propose to consider iterative point cloud registration as a reinforcement learning task and, to this end, present a novel registration agent (ReAgent). We employ imitation learning to initialize its discrete registration policy based on a steady expert policy. Integration with policy optimization, based on our proposed alignment reward, further improves the agent's registration performance. We compare our approach to classical and learning-based registration methods on both ModelNet40 (synthetic) and ScanObjectNN (real data) and show that our ReAgent achieves state-of-the-art accuracy. The lightweight architecture of the agent, moreover, enables reduced inference time as compared to related approaches. In addition, we apply our method to the object pose estimation task on real data (LINEMOD), outperforming state-of-the-art pose refinement approaches.

* Accepted at CVPR 2021

Via

Access Paper or Ask Questions