Abstract:Understanding the structural organisation of 3D indoor scenes in terms of rooms is often accomplished via floorplan extraction. Robotic tasks such as planning and navigation require a semantic understanding of the scene as well. This is typically achieved via object-level semantic segmentation. However, such methods struggle to segment out topological regions like "kitchen" in the scene. In this work, we introduce a two-step pipeline. First, we extract a topological map, i.e., floorplan of the indoor scene using a novel multi-channel occupancy representation. Then, we generate CLIP-aligned features and semantic labels for every room instance based on the objects it contains using a self-attention transformer. Our language-topology alignment supports natural language querying, e.g., a "place to cook" locates the "kitchen". We outperform the current state-of-the-art on room segmentation by ~20% and room classification by ~12%. Our detailed qualitative analysis and ablation studies provide insights into the problem of joint structural and semantic 3D scene understanding.
Abstract:In this paper we present a novel framework for unsupervised topological clustering resulting in improved loop. In this paper we present a novel framework for unsupervised topological clustering resulting in improved loop detection and closure for SLAM. A navigating mobile robot clusters its traversal into visually similar topologies where each cluster (topology) contains a set of similar looking images typically observed from spatially adjacent locations. Each such set of spatially adjacent and visually similar grouping of images constitutes a topology obtained without any supervision. We formulate a hierarchical loop discovery strategy that first detects loops at the level of topologies and subsequently at the level of images between the looped topologies. We show over a number of traversals across different Habitat environments that such a hierarchical pipeline significantly improves SOTA image based loop detection and closure methods. Further, as a consequence of improved loop detection, we enhance the loop closure and backend SLAM performance. Such a rendering of a traversal into topological segments is beneficial for downstream tasks such as navigation that can now build a topological graph where spatially adjacent topological clusters are connected by an edge and navigate over such topological graphs.