Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sven Dickinson

Probabilistic Directed Distance Fields for Ray-Based Shape Representations

Apr 13, 2024

Tristan Aumentado-Armstrong, Stavros Tsogkas, Sven Dickinson, Allan Jepson

Figure 1 for Probabilistic Directed Distance Fields for Ray-Based Shape Representations

Figure 2 for Probabilistic Directed Distance Fields for Ray-Based Shape Representations

Figure 3 for Probabilistic Directed Distance Fields for Ray-Based Shape Representations

Figure 4 for Probabilistic Directed Distance Fields for Ray-Based Shape Representations

Abstract:In modern computer vision, the optimal representation of 3D shape continues to be task-dependent. One fundamental operation applied to such representations is differentiable rendering, as it enables inverse graphics approaches in learning frameworks. Standard explicit shape representations (voxels, point clouds, or meshes) are often easily rendered, but can suffer from limited geometric fidelity, among other issues. On the other hand, implicit representations (occupancy, distance, or radiance fields) preserve greater fidelity, but suffer from complex or inefficient rendering processes, limiting scalability. In this work, we devise Directed Distance Fields (DDFs), a novel neural shape representation that builds upon classical distance fields. The fundamental operation in a DDF maps an oriented point (position and direction) to surface visibility and depth. This enables efficient differentiable rendering, obtaining depth with a single forward pass per pixel, as well as differential geometric quantity extraction (e.g., surface normals), with only additional backward passes. Using probabilistic DDFs (PDDFs), we show how to model inherent discontinuities in the underlying field. We then apply DDFs to several applications, including single-shape fitting, generative modelling, and single-image 3D reconstruction, showcasing strong performance with simple architectural components via the versatility of our representation. Finally, since the dimensionality of DDFs permits view-dependent geometric artifacts, we conduct a theoretical investigation of the constraints necessary for view consistency. We find a small set of field properties that are sufficient to guarantee a DDF is consistent, without knowing, for instance, which shape the field is expressing.

* Extension of arXiv:2112.05300

Via

Access Paper or Ask Questions

Fast-Grasp'D: Dexterous Multi-finger Grasp Generation Through Differentiable Simulation

Jun 13, 2023

Dylan Turpin, Tao Zhong, Shutong Zhang, Guanglei Zhu, Jingzhou Liu, Ritvik Singh, Eric Heiden, Miles Macklin, Stavros Tsogkas, Sven Dickinson(+1 more)

Figure 1 for Fast-Grasp'D: Dexterous Multi-finger Grasp Generation Through Differentiable Simulation

Figure 2 for Fast-Grasp'D: Dexterous Multi-finger Grasp Generation Through Differentiable Simulation

Figure 3 for Fast-Grasp'D: Dexterous Multi-finger Grasp Generation Through Differentiable Simulation

Figure 4 for Fast-Grasp'D: Dexterous Multi-finger Grasp Generation Through Differentiable Simulation

Abstract:Multi-finger grasping relies on high quality training data, which is hard to obtain: human data is hard to transfer and synthetic data relies on simplifying assumptions that reduce grasp quality. By making grasp simulation differentiable, and contact dynamics amenable to gradient-based optimization, we accelerate the search for high-quality grasps with fewer limiting assumptions. We present Grasp'D-1M: a large-scale dataset for multi-finger robotic grasping, synthesized with Fast- Grasp'D, a novel differentiable grasping simulator. Grasp'D- 1M contains one million training examples for three robotic hands (three, four and five-fingered), each with multimodal visual inputs (RGB+depth+segmentation, available in mono and stereo). Grasp synthesis with Fast-Grasp'D is 10x faster than GraspIt! and 20x faster than the prior Grasp'D differentiable simulator. Generated grasps are more stable and contact-rich than GraspIt! grasps, regardless of the distance threshold used for contact generation. We validate the usefulness of our dataset by retraining an existing vision-based grasping pipeline on Grasp'D-1M, and showing a dramatic increase in model performance, predicting grasps with 30% more contact, a 33% higher epsilon metric, and 35% lower simulated displacement. Additional details at https://dexgrasp.github.io.

Via

Access Paper or Ask Questions

Grasp'D: Differentiable Contact-rich Grasp Synthesis for Multi-fingered Hands

Aug 26, 2022

Dylan Turpin, Liquan Wang, Eric Heiden, Yun-Chun Chen, Miles Macklin, Stavros Tsogkas, Sven Dickinson, Animesh Garg

Figure 1 for Grasp'D: Differentiable Contact-rich Grasp Synthesis for Multi-fingered Hands

Figure 2 for Grasp'D: Differentiable Contact-rich Grasp Synthesis for Multi-fingered Hands

Figure 3 for Grasp'D: Differentiable Contact-rich Grasp Synthesis for Multi-fingered Hands

Figure 4 for Grasp'D: Differentiable Contact-rich Grasp Synthesis for Multi-fingered Hands

Abstract:The study of hand-object interaction requires generating viable grasp poses for high-dimensional multi-finger models, often relying on analytic grasp synthesis which tends to produce brittle and unnatural results. This paper presents Grasp'D, an approach for grasp synthesis with a differentiable contact simulation from both known models as well as visual inputs. We use gradient-based methods as an alternative to sampling-based grasp synthesis, which fails without simplifying assumptions, such as pre-specified contact locations and eigengrasps. Such assumptions limit grasp discovery and, in particular, exclude high-contact power grasps. In contrast, our simulation-based approach allows for stable, efficient, physically realistic, high-contact grasp synthesis, even for gripper morphologies with high-degrees of freedom. We identify and address challenges in making grasp simulation amenable to gradient-based optimization, such as non-smooth object surface geometry, contact sparsity, and a rugged optimization landscape. Grasp'D compares favorably to analytic grasp synthesis on human and robotic hand models, and resultant grasps achieve over 4x denser contact, leading to significantly higher grasp stability. Video and code available at https://graspd-eccv22.github.io/.

Via

Access Paper or Ask Questions

Representing 3D Shapes with Probabilistic Directed Distance Fields

Dec 10, 2021

Tristan Aumentado-Armstrong, Stavros Tsogkas, Sven Dickinson, Allan Jepson

Figure 1 for Representing 3D Shapes with Probabilistic Directed Distance Fields

Figure 2 for Representing 3D Shapes with Probabilistic Directed Distance Fields

Figure 3 for Representing 3D Shapes with Probabilistic Directed Distance Fields

Figure 4 for Representing 3D Shapes with Probabilistic Directed Distance Fields

Abstract:Differentiable rendering is an essential operation in modern vision, allowing inverse graphics approaches to 3D understanding to be utilized in modern machine learning frameworks. Explicit shape representations (voxels, point clouds, or meshes), while relatively easily rendered, often suffer from limited geometric fidelity or topological constraints. On the other hand, implicit representations (occupancy, distance, or radiance fields) preserve greater fidelity, but suffer from complex or inefficient rendering processes, limiting scalability. In this work, we endeavour to address both shortcomings with a novel shape representation that allows fast differentiable rendering within an implicit architecture. Building on implicit distance representations, we define Directed Distance Fields (DDFs), which map an oriented point (position and direction) to surface visibility and depth. Such a field can render a depth map with a single forward pass per pixel, enable differential surface geometry extraction (e.g., surface normals and curvatures) via network derivatives, be easily composed, and permit extraction of classical unsigned distance fields. Using probabilistic DDFs (PDDFs), we show how to model inherent discontinuities in the underlying field. Finally, we apply our method to fitting single shapes, unpaired 3D-aware generative image modelling, and single-image 3D reconstruction tasks, showcasing strong performance with simple architectural components via the versatility of our representation.

* 22 pages

Via

Access Paper or Ask Questions

GIFT: Generalizable Interaction-aware Functional Tool Affordances without Labels

Jun 28, 2021

Dylan Turpin, Liquan Wang, Stavros Tsogkas, Sven Dickinson, Animesh Garg

Figure 1 for GIFT: Generalizable Interaction-aware Functional Tool Affordances without Labels

Figure 2 for GIFT: Generalizable Interaction-aware Functional Tool Affordances without Labels

Figure 3 for GIFT: Generalizable Interaction-aware Functional Tool Affordances without Labels

Figure 4 for GIFT: Generalizable Interaction-aware Functional Tool Affordances without Labels

Abstract:Tool use requires reasoning about the fit between an object's affordances and the demands of a task. Visual affordance learning can benefit from goal-directed interaction experience, but current techniques rely on human labels or expert demonstrations to generate this data. In this paper, we describe a method that grounds affordances in physical interactions instead, thus removing the need for human labels or expert policies. We use an efficient sampling-based method to generate successful trajectories that provide contact data, which are then used to reveal affordance representations. Our framework, GIFT, operates in two phases: first, we discover visual affordances from goal-directed interaction with a set of procedurally generated tools; second, we train a model to predict new instances of the discovered affordances on novel tools in a self-supervised fashion. In our experiments, we show that GIFT can leverage a sparse keypoint representation to predict grasp and interaction points to accommodate multiple tasks, such as hooking, reaching, and hammering. GIFT outperforms baselines on all tasks and matches a human oracle on two of three tasks using novel tools.

* Qualitative results available at https://www.pair.toronto.edu/gift-tools-rss21

Via

Access Paper or Ask Questions

Disentangling Geometric Deformation Spaces in Generative Latent Shape Models

Feb 27, 2021

Tristan Aumentado-Armstrong, Stavros Tsogkas, Sven Dickinson, Allan Jepson

Figure 1 for Disentangling Geometric Deformation Spaces in Generative Latent Shape Models

Figure 2 for Disentangling Geometric Deformation Spaces in Generative Latent Shape Models

Figure 3 for Disentangling Geometric Deformation Spaces in Generative Latent Shape Models

Figure 4 for Disentangling Geometric Deformation Spaces in Generative Latent Shape Models

Abstract:A complete representation of 3D objects requires characterizing the space of deformations in an interpretable manner, from articulations of a single instance to changes in shape across categories. In this work, we improve on a prior generative model of geometric disentanglement for 3D shapes, wherein the space of object geometry is factorized into rigid orientation, non-rigid pose, and intrinsic shape. The resulting model can be trained from raw 3D shapes, without correspondences, labels, or even rigid alignment, using a combination of classical spectral geometry and probabilistic disentanglement of a structured latent representation space. Our improvements include more sophisticated handling of rotational invariance and the use of a diffeomorphic flow network to bridge latent and spectral space. The geometric structuring of the latent space imparts an interpretable characterization of the deformation space of an object. Furthermore, it enables tasks like pose transfer and pose-aware retrieval without requiring supervision. We evaluate our model on its generative modelling, representation learning, and disentanglement performance, showing improved rotation invariance and intrinsic-extrinsic factorization quality over the prior model.

* 22 pages

Via

Access Paper or Ask Questions

Appearance Shock Grammar for Fast Medial Axis Extraction from Real Images

Apr 06, 2020

Charles-Olivier Dufresne Camaro, Morteza Rezanejad, Stavros Tsogkas, Kaleem Siddiqi, Sven Dickinson

Figure 1 for Appearance Shock Grammar for Fast Medial Axis Extraction from Real Images

Figure 2 for Appearance Shock Grammar for Fast Medial Axis Extraction from Real Images

Figure 3 for Appearance Shock Grammar for Fast Medial Axis Extraction from Real Images

Figure 4 for Appearance Shock Grammar for Fast Medial Axis Extraction from Real Images

Abstract:We combine ideas from shock graph theory with more recent appearance-based methods for medial axis extraction from complex natural scenes, improving upon the present best unsupervised method, in terms of efficiency and performance. We make the following specific contributions: i) we extend the shock graph representation to the domain of real images, by generalizing the shock type definitions using local, appearance-based criteria; ii) we then use the rules of a Shock Grammar to guide our search for medial points, drastically reducing run time when compared to other methods, which exhaustively consider all points in the input image;iii) we remove the need for typical post-processing steps including thinning, non-maximum suppression, and grouping, by adhering to the Shock Grammar rules while deriving the medial axis solution; iv) finally, we raise some fundamental concerns with the evaluation scheme used in previous work and propose a more appropriate alternative for assessing the performance of medial axis extraction from scenes. Our experiments on the BMAX500 and SK-LARGE datasets demonstrate the effectiveness of our approach. We outperform the present state-of-the-art, excelling particularly in the high-precision regime, while running an order of magnitude faster and requiring no post-processing.

* Accepted to CVPR 2020

Via

Access Paper or Ask Questions

Geometric Disentanglement for Generative Latent Shape Models

Aug 18, 2019

Tristan Aumentado-Armstrong, Stavros Tsogkas, Allan Jepson, Sven Dickinson

Figure 1 for Geometric Disentanglement for Generative Latent Shape Models

Figure 2 for Geometric Disentanglement for Generative Latent Shape Models

Figure 3 for Geometric Disentanglement for Generative Latent Shape Models

Figure 4 for Geometric Disentanglement for Generative Latent Shape Models

Abstract:Representing 3D shape is a fundamental problem in artificial intelligence, which has numerous applications within computer vision and graphics. One avenue that has recently begun to be explored is the use of latent representations of generative models. However, it remains an open problem to learn a generative model of shape that is interpretable and easily manipulated, particularly in the absence of supervised labels. In this paper, we propose an unsupervised approach to partitioning the latent space of a variational autoencoder for 3D point clouds in a natural way, using only geometric information. Our method makes use of tools from spectral differential geometry to separate intrinsic and extrinsic shape information, and then considers several hierarchical disentanglement penalties for dividing the latent space in this manner, including a novel one that penalizes the Jacobian of the latent representation of the decoded output with respect to the latent encoding. We show that the resulting representation exhibits intuitive and interpretable behavior, enabling tasks such as pose transfer and pose-aware shape retrieval that cannot easily be performed by models with an entangled representation.

* ICCV 2019

Via

Access Paper or Ask Questions

DeepFlux for Skeletons in the Wild

Nov 30, 2018

Yukang Wang, Yongchao Xu, Stavros Tsogkas, Xiang Bai, Sven Dickinson, Kaleem Siddiqi

Figure 1 for DeepFlux for Skeletons in the Wild

Figure 2 for DeepFlux for Skeletons in the Wild

Figure 3 for DeepFlux for Skeletons in the Wild

Figure 4 for DeepFlux for Skeletons in the Wild

Abstract:Computing object skeletons in natural images is challenging, owing to large variations in object appearance and scale, and the complexity of handling background clutter. Many recent methods frame object skeleton detection as a binary pixel classification problem, which is similar in spirit to learning-based edge detection, as well as to semantic segmentation methods. In the present article, we depart from this strategy by training a CNN to predict a two-dimensional vector field, which maps each scene point to a candidate skeleton pixel, in the spirit of flux-based skeletonization algorithms. This "image context flux" representation has two major advantages over previous approaches. First, it explicitly encodes the relative position of skeletal pixels to semantically meaningful entities, such as the image points in their spatial context, and hence also the implied object boundaries. Second, since the skeleton detection context is a region-based vector field, it is better able to cope with object parts of large width. We evaluate the proposed method on three benchmark datasets for skeleton detection and two for symmetry detection, achieving consistently superior performance over state-of-the-art methods.

* 10 pages

Via

Access Paper or Ask Questions

Scene Categorization from Contours: Medial Axis Based Salience Measures

Nov 26, 2018

Morteza Rezanejad, Gabriel Downs, John Wilder, Dirk B. Walther, Allan Jepson, Sven Dickinson, Kaleem Siddiqi

Figure 1 for Scene Categorization from Contours: Medial Axis Based Salience Measures

Figure 2 for Scene Categorization from Contours: Medial Axis Based Salience Measures

Figure 3 for Scene Categorization from Contours: Medial Axis Based Salience Measures

Figure 4 for Scene Categorization from Contours: Medial Axis Based Salience Measures

Abstract:The computer vision community has witnessed recent advances in scene categorization from images, with the state-of-the art systems now achieving impressive recognition rates on challenging benchmarks such as the Places365 dataset. Such systems have been trained on photographs which include color, texture and shading cues. The geometry of shapes and surfaces, as conveyed by scene contours, is not explicitly considered for this task. Remarkably, humans can accurately recognize natural scenes from line drawings, which consist solely of contour-based shape cues. Here we report the first computer vision study on scene categorization of line drawings derived from popular databases including an artist scene database, MIT67, and Places365. Specifically, we use off-the-shelf pre-trained CNNs to perform scene classification given only contour information as input and find performance levels well above chance. We also show that medial-axis based contour salience methods can be used to select more informative subsets of contour pixels and that the variation in CNN classification performance on various choices for these subsets is qualitatively similar to that observed in human performance. Moreover, when the salience measures are used to weight the contours, as opposed to pruning them, we find that these weights boost our CNN performance above that for unweighted contour input. That is, the medial axis based salience weights appear to add useful information that is not available when CNNs are trained to use contours alone.

Via

Access Paper or Ask Questions