Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anthony Opipari

Robotics Institute, University of Michigan, Ann Arbor

Configurable Embodied Data Generation for Class-Agnostic RGB-D Video Segmentation

Oct 16, 2024

Anthony Opipari, Aravindhan K Krishnan, Shreekant Gayaka, Min Sun, Cheng-Hao Kuo, Arnie Sen, Odest Chadwicke Jenkins

Figure 1 for Configurable Embodied Data Generation for Class-Agnostic RGB-D Video Segmentation

Figure 2 for Configurable Embodied Data Generation for Class-Agnostic RGB-D Video Segmentation

Figure 3 for Configurable Embodied Data Generation for Class-Agnostic RGB-D Video Segmentation

Figure 4 for Configurable Embodied Data Generation for Class-Agnostic RGB-D Video Segmentation

Abstract:This paper presents a method for generating large-scale datasets to improve class-agnostic video segmentation across robots with different form factors. Specifically, we consider the question of whether video segmentation models trained on generic segmentation data could be more effective for particular robot platforms if robot embodiment is factored into the data generation process. To answer this question, a pipeline is formulated for using 3D reconstructions (e.g. from HM3DSem) to generate segmented videos that are configurable based on a robot's embodiment (e.g. sensor type, sensor placement, and illumination source). A resulting massive RGB-D video panoptic segmentation dataset (MVPd) is introduced for extensive benchmarking with foundation and video segmentation models, as well as to support embodiment-focused research in video segmentation. Our experimental findings demonstrate that using MVPd for finetuning can lead to performance improvements when transferring foundation models to certain robot embodiments, such as specific camera placements. These experiments also show that using 3D modalities (depth images and camera pose) can lead to improvements in video segmentation accuracy and consistency. The project webpage is available at https://topipari.com/projects/MVPd

* Accepted in IEEE Robotics and Automation Letters October 2024

Via

Access Paper or Ask Questions

Single-View 3D Reconstruction via SO(2)-Equivariant Gaussian Sculpting Networks

Sep 11, 2024

Ruihan Xu, Anthony Opipari, Joshua Mah, Stanley Lewis, Haoran Zhang, Hanzhe Guo, Odest Chadwicke Jenkins

Figure 1 for Single-View 3D Reconstruction via SO(2)-Equivariant Gaussian Sculpting Networks

Figure 2 for Single-View 3D Reconstruction via SO(2)-Equivariant Gaussian Sculpting Networks

Figure 3 for Single-View 3D Reconstruction via SO(2)-Equivariant Gaussian Sculpting Networks

Figure 4 for Single-View 3D Reconstruction via SO(2)-Equivariant Gaussian Sculpting Networks

Abstract:This paper introduces SO(2)-Equivariant Gaussian Sculpting Networks (GSNs) as an approach for SO(2)-Equivariant 3D object reconstruction from single-view image observations. GSNs take a single observation as input to generate a Gaussian splat representation describing the observed object's geometry and texture. By using a shared feature extractor before decoding Gaussian colors, covariances, positions, and opacities, GSNs achieve extremely high throughput (>150FPS). Experiments demonstrate that GSNs can be trained efficiently using a multi-view rendering loss and are competitive, in quality, with expensive diffusion-based reconstruction algorithms. The GSN model is validated on multiple benchmark experiments. Moreover, we demonstrate the potential for GSNs to be used within a robotic manipulation pipeline for object-centric grasping.

* Accepted to RSS 2024 Workshop on Geometric and Algebraic Structure in Robot Learning

Via

Access Paper or Ask Questions

OVAL-Prompt: Open-Vocabulary Affordance Localization for Robot Manipulation through LLM Affordance-Grounding

Apr 17, 2024

Edmond Tong, Anthony Opipari, Stanley Lewis, Zhen Zeng, Odest Chadwicke Jenkins

Figure 1 for OVAL-Prompt: Open-Vocabulary Affordance Localization for Robot Manipulation through LLM Affordance-Grounding

Figure 2 for OVAL-Prompt: Open-Vocabulary Affordance Localization for Robot Manipulation through LLM Affordance-Grounding

Figure 3 for OVAL-Prompt: Open-Vocabulary Affordance Localization for Robot Manipulation through LLM Affordance-Grounding

Figure 4 for OVAL-Prompt: Open-Vocabulary Affordance Localization for Robot Manipulation through LLM Affordance-Grounding

Abstract:In order for robots to interact with objects effectively, they must understand the form and function of each object they encounter. Essentially, robots need to understand which actions each object affords, and where those affordances can be acted on. Robots are ultimately expected to operate in unstructured human environments, where the set of objects and affordances is not known to the robot before deployment (i.e. the open-vocabulary setting). In this work, we introduce OVAL-Prompt, a prompt-based approach for open-vocabulary affordance localization in RGB-D images. By leveraging a Vision Language Model (VLM) for open-vocabulary object part segmentation and a Large Language Model (LLM) to ground each part-segment-affordance, OVAL-Prompt demonstrates generalizability to novel object instances, categories, and affordances without domain-specific finetuning. Quantitative experiments demonstrate that without any finetuning, OVAL-Prompt achieves localization accuracy that is competitive with supervised baseline models. Moreover, qualitative experiments show that OVAL-Prompt enables affordance-based robot manipulation of open-vocabulary object instances and categories.

* Accepted to Vision-Language Models for Navigation and Manipulation (VLMNM) Workshop (ICRA 2024)

Via

Access Paper or Ask Questions

TransNet: Transparent Object Manipulation Through Category-Level Pose Estimation

Jul 23, 2023

Huijie Zhang, Anthony Opipari, Xiaotong Chen, Jiyue Zhu, Zeren Yu, Odest Chadwicke Jenkins

Figure 1 for TransNet: Transparent Object Manipulation Through Category-Level Pose Estimation

Figure 2 for TransNet: Transparent Object Manipulation Through Category-Level Pose Estimation

Figure 3 for TransNet: Transparent Object Manipulation Through Category-Level Pose Estimation

Figure 4 for TransNet: Transparent Object Manipulation Through Category-Level Pose Estimation

Abstract:Transparent objects present multiple distinct challenges to visual perception systems. First, their lack of distinguishing visual features makes transparent objects harder to detect and localize than opaque objects. Even humans find certain transparent surfaces with little specular reflection or refraction, like glass doors, difficult to perceive. A second challenge is that depth sensors typically used for opaque object perception cannot obtain accurate depth measurements on transparent surfaces due to their unique reflective properties. Stemming from these challenges, we observe that transparent object instances within the same category, such as cups, look more similar to each other than to ordinary opaque objects of that same category. Given this observation, the present paper explores the possibility of category-level transparent object pose estimation rather than instance-level pose estimation. We propose \textit{\textbf{TransNet}}, a two-stage pipeline that estimates category-level transparent object pose using localized depth completion and surface normal estimation. TransNet is evaluated in terms of pose estimation accuracy on a large-scale transparent object dataset and compared to a state-of-the-art category-level pose estimation approach. Results from this comparison demonstrate that TransNet achieves improved pose estimation accuracy on transparent objects. Moreover, we use TransNet to build an autonomous transparent object manipulation system for robotic pick-and-place and pouring tasks.

Via

Access Paper or Ask Questions

DNBP: Differentiable Nonparametric Belief Propagation

Mar 08, 2023

Anthony Opipari, Jana Pavlasek, Chao Chen, Shoutian Wang, Karthik Desingh, Odest Chadwicke Jenkins

Abstract:We present a differentiable approach to learn the probabilistic factors used for inference by a nonparametric belief propagation algorithm. Existing nonparametric belief propagation methods rely on domain-specific features encoded in the probabilistic factors of a graphical model. In this work, we replace each crafted factor with a differentiable neural network enabling the factors to be learned using an efficient optimization routine from labeled data. By combining differentiable neural networks with an efficient belief propagation algorithm, our method learns to maintain a set of marginal posterior samples using end-to-end training. We evaluate our differentiable nonparametric belief propagation (DNBP) method on a set of articulated pose tracking tasks and compare performance with learned baselines. Results from these experiments demonstrate the effectiveness of using learned factors for tracking and suggest the practical advantage over hand-crafted approaches. The project webpage is available at: https://progress.eecs.umich.edu/projects/dnbp/ .

* arXiv admin note: text overlap with arXiv:2101.05948

Via

Access Paper or Ask Questions

TransNet: Category-Level Transparent Object Pose Estimation

Aug 22, 2022

Huijie Zhang, Anthony Opipari, Xiaotong Chen, Jiyue Zhu, Zeren Yu, Odest Chadwicke Jenkins

Figure 1 for TransNet: Category-Level Transparent Object Pose Estimation

Figure 2 for TransNet: Category-Level Transparent Object Pose Estimation

Figure 3 for TransNet: Category-Level Transparent Object Pose Estimation

Figure 4 for TransNet: Category-Level Transparent Object Pose Estimation

Abstract:Transparent objects present multiple distinct challenges to visual perception systems. First, their lack of distinguishing visual features makes transparent objects harder to detect and localize than opaque objects. Even humans find certain transparent surfaces with little specular reflection or refraction, e.g. glass doors, difficult to perceive. A second challenge is that common depth sensors typically used for opaque object perception cannot obtain accurate depth measurements on transparent objects due to their unique reflective properties. Stemming from these challenges, we observe that transparent object instances within the same category (e.g. cups) look more similar to each other than to ordinary opaque objects of that same category. Given this observation, the present paper sets out to explore the possibility of category-level transparent object pose estimation rather than instance-level pose estimation. We propose TransNet, a two-stage pipeline that learns to estimate category-level transparent object pose using localized depth completion and surface normal estimation. TransNet is evaluated in terms of pose estimation accuracy on a recent, large-scale transparent object dataset and compared to a state-of-the-art category-level pose estimation approach. Results from this comparison demonstrate that TransNet achieves improved pose estimation accuracy on transparent objects and key findings from the included ablation studies suggest future directions for performance improvements.

Via

Access Paper or Ask Questions

ClearPose: Large-scale Transparent Object Dataset and Benchmark

Mar 08, 2022

Xiaotong Chen, Huijie Zhang, Zeren Yu, Anthony Opipari, Odest Chadwicke Jenkins

Figure 1 for ClearPose: Large-scale Transparent Object Dataset and Benchmark

Figure 2 for ClearPose: Large-scale Transparent Object Dataset and Benchmark

Figure 3 for ClearPose: Large-scale Transparent Object Dataset and Benchmark

Figure 4 for ClearPose: Large-scale Transparent Object Dataset and Benchmark

Abstract:Transparent objects are ubiquitous in household settings and pose distinct challenges for visual sensing and perception systems. The optical properties of transparent objects leave conventional 3D sensors alone unreliable for object depth and pose estimation. These challenges are highlighted by the shortage of large-scale RGB-Depth datasets focusing on transparent objects in real-world settings. In this work, we contribute a large-scale real-world RGB-Depth transparent object dataset named ClearPose to serve as a benchmark dataset for segmentation, scene-level depth completion and object-centric pose estimation tasks. The ClearPose dataset contains over 350K labeled real-world RGB-Depth frames and 4M instance annotations covering 63 household objects. The dataset includes object categories commonly used in daily life under various lighting and occluding conditions as well as challenging test scenarios such as cases of occlusion by opaque or translucent objects, non-planar orientations, presence of liquids, etc. We benchmark several state-of-the-art depth completion and object pose estimation deep neural networks on ClearPose.

Via

Access Paper or Ask Questions

Differentiable Nonparametric Belief Propagation

Jan 15, 2021

Anthony Opipari, Chao Chen, Shoutian Wang, Jana Pavlasek, Karthik Desingh, Odest Chadwicke Jenkins

Figure 1 for Differentiable Nonparametric Belief Propagation

Figure 2 for Differentiable Nonparametric Belief Propagation

Figure 3 for Differentiable Nonparametric Belief Propagation

Figure 4 for Differentiable Nonparametric Belief Propagation

Abstract:We present a differentiable approach to learn the probabilistic factors used for inference by a nonparametric belief propagation algorithm. Existing nonparametric belief propagation methods rely on domain-specific features encoded in the probabilistic factors of a graphical model. In this work, we replace each crafted factor with a differentiable neural network enabling the factors to be learned using an efficient optimization routine from labeled data. By combining differentiable neural networks with an efficient belief propagation algorithm, our method learns to maintain a set of marginal posterior samples using end-to-end training. We evaluate our differentiable nonparametric belief propagation (DNBP) method on a set of articulated pose tracking tasks and compare performance with a recurrent neural network. Results from this comparison demonstrate the effectiveness of using learned factors for tracking and suggest the practical advantage over hand-crafted approaches. The project webpage is available at: progress.eecs.umich.edu/projects/dnbp.

* 12 pages, 9 figures

Via

Access Paper or Ask Questions

Factored Pose Estimation of Articulated Objects using Efficient Nonparametric Belief Propagation

Dec 10, 2018

Karthik Desingh, Shiyang Lu, Anthony Opipari, Odest Chadwicke Jenkins

Figure 1 for Factored Pose Estimation of Articulated Objects using Efficient Nonparametric Belief Propagation

Figure 2 for Factored Pose Estimation of Articulated Objects using Efficient Nonparametric Belief Propagation

Figure 3 for Factored Pose Estimation of Articulated Objects using Efficient Nonparametric Belief Propagation

Figure 4 for Factored Pose Estimation of Articulated Objects using Efficient Nonparametric Belief Propagation

Abstract:Robots working in human environments often encounter a wide range of articulated objects, such as tools, cabinets, and other jointed objects. Such articulated objects can take an infinite number of possible poses, as a point in a potentially high-dimensional continuous space. A robot must perceive this continuous pose to manipulate the object to a desired pose. This problem of perception and manipulation of articulated objects remains a challenge due to its high dimensionality and multi-modal uncertainty. In this paper, we propose a factored approach to estimate the poses of articulated objects using an efficient nonparametric belief propagation algorithm. We consider inputs as geometrical models with articulation constraints, and observed RGBD sensor data. The proposed framework produces object-part pose beliefs iteratively. The problem is formulated as a pairwise Markov Random Field (MRF) where each hidden node (continuous pose variable) is an observed object-part's pose and the edges denote the articulation constraints between the parts. We propose articulated pose estimation by Pull Message Passing algorithm for Nonparametric Belief Propagation (PMPNBP) and evaluate its convergence properties over scenes with articulated objects.

Via

Access Paper or Ask Questions

Pull Message Passing for Nonparametric Belief Propagation

Jul 27, 2018

Karthik Desingh, Anthony Opipari, Odest Chadwicke Jenkins

Figure 1 for Pull Message Passing for Nonparametric Belief Propagation

Figure 2 for Pull Message Passing for Nonparametric Belief Propagation

Figure 3 for Pull Message Passing for Nonparametric Belief Propagation

Figure 4 for Pull Message Passing for Nonparametric Belief Propagation

Abstract:We present a "pull" approach to approximate products of Gaussian mixtures within message updates for Nonparametric Belief Propagation (NBP) inference. Existing NBP methods often represent messages between continuous-valued latent variables as Gaussian mixture models. To avoid computational intractability in loopy graphs, NBP necessitates an approximation of the product of such mixtures. Sampling-based product approximations have shown effectiveness for NBP inference. However, such approximations used within the traditional "push" message update procedures quickly become computationally prohibitive for multi-modal distributions over high-dimensional variables. In contrast, we propose a "pull" method, as the Pull Message Passing for Nonparametric Belief propagation (PMPNBP) algorithm, and demonstrate its viability for efficient inference. We report results using an experiment from an existing NBP method, PAMPAS, for inferring the pose of an articulated structure in clutter. Results from this illustrative problem found PMPNBP has a greater ability to efficiently scale the number of components in its mixtures and, consequently, improve inference accuracy.

Via

Access Paper or Ask Questions