Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Osman Ülger

Auto-Vocabulary Segmentation for LiDAR Points

Jun 13, 2024

Weijie Wei, Osman Ülger, Fatemeh Karimi Najadasl, Theo Gevers, Martin R. Oswald

Figure 1 for Auto-Vocabulary Segmentation for LiDAR Points

Figure 2 for Auto-Vocabulary Segmentation for LiDAR Points

Figure 3 for Auto-Vocabulary Segmentation for LiDAR Points

Figure 4 for Auto-Vocabulary Segmentation for LiDAR Points

Abstract:Existing perception methods for autonomous driving fall short of recognizing unknown entities not covered in the training data. Open-vocabulary methods offer promising capabilities in detecting any object but are limited by user-specified queries representing target classes. We propose AutoVoc3D, a framework for automatic object class recognition and open-ended segmentation. Evaluation on nuScenes showcases AutoVoc3D's ability to generate precise semantic classes and accurate point-wise segmentation. Moreover, we introduce Text-Point Semantic Similarity, a new metric to assess the semantic similarity between text and point cloud without eliminating novel classes.

* Accepted by CVPR 2024 OpenSun3D Workshop

Via

Access Paper or Ask Questions

Self-Guided Open-Vocabulary Semantic Segmentation

Dec 07, 2023

Osman Ülger, Maksymilian Kulicki, Yuki Asano, Martin R. Oswald

Figure 1 for Self-Guided Open-Vocabulary Semantic Segmentation

Figure 2 for Self-Guided Open-Vocabulary Semantic Segmentation

Figure 3 for Self-Guided Open-Vocabulary Semantic Segmentation

Figure 4 for Self-Guided Open-Vocabulary Semantic Segmentation

Abstract:Vision-Language Models (VLMs) have emerged as promising tools for open-ended image understanding tasks, including open vocabulary segmentation. Yet, direct application of such VLMs to segmentation is non-trivial, since VLMs are trained with image-text pairs and naturally lack pixel-level granularity. Recent works have made advancements in bridging this gap, often by leveraging the shared image-text space in which the image and a provided text prompt are represented. In this paper, we challenge the capabilities of VLMs further and tackle open-vocabulary segmentation without the need for any textual input. To this end, we propose a novel Self-Guided Semantic Segmentation (Self-Seg) framework. Self-Seg is capable of automatically detecting relevant class names from clustered BLIP embeddings and using these for accurate semantic segmentation. In addition, we propose an LLM-based Open-Vocabulary Evaluator (LOVE) to effectively assess predicted open-vocabulary class names. We achieve state-of-the-art results on Pascal VOC, ADE20K and CityScapes for open-vocabulary segmentation without given class names, as well as competitive performance with methods where class names are given. All code and data will be released.

Via

Access Paper or Ask Questions

Relational Prior Knowledge Graphs for Detection and Instance Segmentation

Oct 11, 2023

Osman Ülger, Yu Wang, Ysbrand Galama, Sezer Karaoglu, Theo Gevers, Martin R. Oswald

Figure 1 for Relational Prior Knowledge Graphs for Detection and Instance Segmentation

Figure 2 for Relational Prior Knowledge Graphs for Detection and Instance Segmentation

Figure 3 for Relational Prior Knowledge Graphs for Detection and Instance Segmentation

Figure 4 for Relational Prior Knowledge Graphs for Detection and Instance Segmentation

Abstract:Humans have a remarkable ability to perceive and reason about the world around them by understanding the relationships between objects. In this paper, we investigate the effectiveness of using such relationships for object detection and instance segmentation. To this end, we propose a Relational Prior-based Feature Enhancement Model (RP-FEM), a graph transformer that enhances object proposal features using relational priors. The proposed architecture operates on top of scene graphs obtained from initial proposals and aims to concurrently learn relational context modeling for object detection and instance segmentation. Experimental evaluations on COCO show that the utilization of scene graphs, augmented with relational priors, offer benefits for object detection and instance segmentation. RP-FEM demonstrates its capacity to suppress improbable class predictions within the image while also preventing the model from generating duplicate predictions, leading to improvements over the baseline model on which it is built.

* Published in ICCV2023 SG2RL Workshop

Via

Access Paper or Ask Questions

Multi-Task Edge Prediction in Temporally-Dynamic Video Graphs

Dec 06, 2022

Osman Ülger, Julian Wiederer, Mohsen Ghafoorian, Vasileios Belagiannis, Pascal Mettes

Abstract:Graph neural networks have shown to learn effective node representations, enabling node-, link-, and graph-level inference. Conventional graph networks assume static relations between nodes, while relations between entities in a video often evolve over time, with nodes entering and exiting dynamically. In such temporally-dynamic graphs, a core problem is inferring the future state of spatio-temporal edges, which can constitute multiple types of relations. To address this problem, we propose MTD-GNN, a graph network for predicting temporally-dynamic edges for multiple types of relations. We propose a factorized spatio-temporal graph attention layer to learn dynamic node representations and present a multi-task edge prediction loss that models multiple relations simultaneously. The proposed architecture operates on top of scene graphs that we obtain from videos through object detection and spatio-temporal linking. Experimental evaluations on ActionGenome and CLEVRER show that modeling multiple relations in our temporally-dynamic graph network can be mutually beneficial, outperforming existing static and spatio-temporal graph neural networks, as well as state-of-the-art predicate classification methods.

* BMVC2022

Via

Access Paper or Ask Questions