Abstract:A single egocentric image typically captures only a small portion of the floor, yet a complete metric traversability map of the surroundings would better serve applications such as indoor navigation. We introduce FlatLands, a dataset and benchmark for single-view bird's-eye view (BEV) floor completion. The dataset contains 270,575 observations from 17,656 real metric indoor scenes drawn from six existing datasets, with aligned observation, visibility, validity, and ground-truth BEV maps, and the benchmark includes both in- and out-of-distribution evaluation protocols. We compare training-free approaches, deterministic models, ensembles, and stochastic generative models. Finally, we instantiate the task as an end-to-end monocular RGB-to-floormaps pipeline. FlatLands provides a rigorous testbed for uncertainty-aware indoor mapping and generative completion for embodied navigation.




Abstract:Can we detect an object that is not visible in an image? This study introduces the novel task of 2D and 3D unobserved object detection for predicting the location of objects that are occluded or lie outside the image frame. We adapt several state-of-the-art pre-trained generative models to solve this task, including 2D and 3D diffusion models and vision--language models, and show that they can be used to infer the presence of objects that are not directly observed. To benchmark this task, we propose a suite of metrics that captures different aspects of performance. Our empirical evaluations on indoor scenes from the RealEstate10k dataset with COCO object categories demonstrate results that motivate the use of generative models for the unobserved object detection task. The current work presents a promising step towards compelling applications like visual search and probabilistic planning that can leverage object detection beyond what can be directly observed.




Abstract:In this paper, we analyse the performance of the closed-loop Whiplash gradient descent algorithm for L-smooth convex cost functions. Using numerical experiments, we study the algorithm's performance for convex cost functions, for different condition numbers. We analyse the convergence of the momentum sequence using symplectic integration and introduce the concept of relaxation sequences which analyses the non-classical character of the whiplash method. Under the additional assumption of invexity, we establish a momentum-driven adaptive convergence rate. Furthermore, we introduce an energy method for predicting the convergence rate with convex cost functions for closed-loop inertial gradient dynamics, using an integral anchored energy function and a novel lower bound asymptotic notation, by exploiting the bounded nature of the solutions. Using this, we establish a polynomial convergence rate for the whiplash inertial gradient system, for a family of scalar quadratic cost functions and an exponential rate for a quadratic scalar cost function.