Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nils Blank

Interpretable Affordance Detection on 3D Point Clouds with Probabilistic Prototypes

Apr 25, 2025

Maximilian Xiling Li, Korbinian Rudolf, Nils Blank, Rudolf Lioutikov

Abstract:Robotic agents need to understand how to interact with objects in their environment, both autonomously and during human-robot interactions. Affordance detection on 3D point clouds, which identifies object regions that allow specific interactions, has traditionally relied on deep learning models like PointNet++, DGCNN, or PointTransformerV3. However, these models operate as black boxes, offering no insight into their decision-making processes. Prototypical Learning methods, such as ProtoPNet, provide an interpretable alternative to black-box models by employing a "this looks like that" case-based reasoning approach. However, they have been primarily applied to image-based tasks. In this work, we apply prototypical learning to models for affordance detection on 3D point clouds. Experiments on the 3D-AffordanceNet benchmark dataset show that prototypical models achieve competitive performance with state-of-the-art black-box models and offer inherent interpretability. This makes prototypical models a promising candidate for human-robot interaction scenarios that require increased trust and safety.

Via

Access Paper or Ask Questions

Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models

Oct 23, 2024

Nils Blank, Moritz Reuss, Marcel Rühle, Ömer Erdinç Yağmurlu, Fabian Wenzel, Oier Mees, Rudolf Lioutikov

Figure 1 for Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models

Figure 2 for Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models

Figure 3 for Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models

Figure 4 for Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models

Abstract:A central challenge towards developing robots that can relate human language to their perception and actions is the scarcity of natural language annotations in diverse robot datasets. Moreover, robot policies that follow natural language instructions are typically trained on either templated language or expensive human-labeled instructions, hindering their scalability. To this end, we introduce NILS: Natural language Instruction Labeling for Scalability. NILS automatically labels uncurated, long-horizon robot data at scale in a zero-shot manner without any human intervention. NILS combines pretrained vision-language foundation models in order to detect objects in a scene, detect object-centric changes, segment tasks from large datasets of unlabelled interaction data and ultimately label behavior datasets. Evaluations on BridgeV2, Fractal, and a kitchen play dataset show that NILS can autonomously annotate diverse robot demonstrations of unlabeled and unstructured datasets while alleviating several shortcomings of crowdsourced human annotations, such as low data quality and diversity. We use NILS to label over 115k trajectories obtained from over 430 hours of robot data. We open-source our auto-labeling code and generated annotations on our website: http://robottasklabeling.github.io.

* Project Website at https://robottasklabeling.github.io/

Via

Access Paper or Ask Questions

HyperPg -- Prototypical Gaussians on the Hypersphere for Interpretable Deep Learning

Oct 11, 2024

Maximilian Xiling Li, Korbinian Franz Rudolf, Nils Blank, Rudolf Lioutikov

Figure 1 for HyperPg -- Prototypical Gaussians on the Hypersphere for Interpretable Deep Learning

Figure 2 for HyperPg -- Prototypical Gaussians on the Hypersphere for Interpretable Deep Learning

Figure 3 for HyperPg -- Prototypical Gaussians on the Hypersphere for Interpretable Deep Learning

Figure 4 for HyperPg -- Prototypical Gaussians on the Hypersphere for Interpretable Deep Learning

Abstract:Prototype Learning methods provide an interpretable alternative to black-box deep learning models. Approaches such as ProtoPNet learn, which part of a test image "look like" known prototypical parts from training images, combining predictive power with the inherent interpretability of case-based reasoning. However, existing approaches have two main drawbacks: A) They rely solely on deterministic similarity scores without statistical confidence. B) The prototypes are learned in a black-box manner without human input. This work introduces HyperPg, a new prototype representation leveraging Gaussian distributions on a hypersphere in latent space, with learnable mean and variance. HyperPg prototypes adapt to the spread of clusters in the latent space and output likelihood scores. The new architecture, HyperPgNet, leverages HyperPg to learn prototypes aligned with human concepts from pixel-level annotations. Consequently, each prototype represents a specific concept such as color, image texture, or part of the image subject. A concept extraction pipeline built on foundation models provides pixel-level annotations, significantly reducing human labeling effort. Experiments on CUB-200-2011 and Stanford Cars datasets demonstrate that HyperPgNet outperforms other prototype learning architectures while using fewer parameters and training steps. Additionally, the concept-aligned HyperPg prototypes are learned transparently, enhancing model interpretability.

Via

Access Paper or Ask Questions