Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tjark Behrens

StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space

Dec 11, 2025

Tjark Behrens, Anton Obukhov, Bingxin Ke, Fabio Tosi, Matteo Poggi, Konrad Schindler

Abstract:We introduce StereoSpace, a diffusion-based framework for monocular-to-stereo synthesis that models geometry purely through viewpoint conditioning, without explicit depth or warping. A canonical rectified space and the conditioning guide the generator to infer correspondences and fill disocclusions end-to-end. To ensure fair and leakage-free evaluation, we introduce an end-to-end protocol that excludes any ground truth or proxy geometry estimates at test time. The protocol emphasizes metrics reflecting downstream relevance: iSQoE for perceptual comfort and MEt3R for geometric consistency. StereoSpace surpasses other methods from the warp & inpaint, latent-warping, and warped-conditioning categories, achieving sharp parallax and strong robustness on layered and non-Lambertian scenes. This establishes viewpoint-conditioned diffusion as a scalable, depth-free solution for stereo generation.

* Project page: https://hf.co/spaces/prs-eth/stereospace_web

Via

Access Paper or Ask Questions

Spot-On: A Mixed Reality Interface for Multi-Robot Cooperation

May 28, 2025

Tim Engelbracht, Petar Lukovic, Tjark Behrens, Kai Lascheit, René Zurbrügg, Marc Pollefeys, Hermann Blum, Zuria Bauer

Figure 1 for Spot-On: A Mixed Reality Interface for Multi-Robot Cooperation

Figure 2 for Spot-On: A Mixed Reality Interface for Multi-Robot Cooperation

Figure 3 for Spot-On: A Mixed Reality Interface for Multi-Robot Cooperation

Figure 4 for Spot-On: A Mixed Reality Interface for Multi-Robot Cooperation

Abstract:Recent progress in mixed reality (MR) and robotics is enabling increasingly sophisticated forms of human-robot collaboration. Building on these developments, we introduce a novel MR framework that allows multiple quadruped robots to operate in semantically diverse environments via a MR interface. Our system supports collaborative tasks involving drawers, swing doors, and higher-level infrastructure such as light switches. A comprehensive user study verifies both the design and usability of our app, with participants giving a "good" or "very good" rating in almost all cases. Overall, our approach provides an effective and intuitive framework for MR-based multi-robot collaboration in complex, real-world scenarios.

Via

Access Paper or Ask Questions

Lost & Found: Updating Dynamic 3D Scene Graphs from Egocentric Observations

Nov 28, 2024

Tjark Behrens, René Zurbrügg, Marc Pollefeys, Zuria Bauer, Hermann Blum

Figure 1 for Lost & Found: Updating Dynamic 3D Scene Graphs from Egocentric Observations

Figure 2 for Lost & Found: Updating Dynamic 3D Scene Graphs from Egocentric Observations

Figure 3 for Lost & Found: Updating Dynamic 3D Scene Graphs from Egocentric Observations

Figure 4 for Lost & Found: Updating Dynamic 3D Scene Graphs from Egocentric Observations

Abstract:Recent approaches have successfully focused on the segmentation of static reconstructions, thereby equipping downstream applications with semantic 3D understanding. However, the world in which we live is dynamic, characterized by numerous interactions between the environment and humans or robotic agents. Static semantic maps are unable to capture this information, and the naive solution of rescanning the environment after every change is both costly and ineffective in tracking e.g. objects being stored away in drawers. With Lost & Found we present an approach that addresses this limitation. Based solely on egocentric recordings with corresponding hand position and camera pose estimates, we are able to track the 6DoF poses of the moving object within the detected interaction interval. These changes are applied online to a transformable scene graph that captures object-level relations. Compared to state-of-the-art object pose trackers, our approach is more reliable in handling the challenging egocentric viewpoint and the lack of depth information. It outperforms the second-best approach by 34% and 56% for translational and orientational error, respectively, and produces visibly smoother 6DoF object trajectories. In addition, we illustrate how the acquired interaction information in the dynamic scene graph can be employed in the context of robotic applications that would otherwise be unfeasible: We show how our method allows to command a mobile manipulator through teach & repeat, and how information about prior interaction allows a mobile manipulator to retrieve an object hidden in a drawer. Code, videos and corresponding data are accessible at https://behretj.github.io/LostAndFound.

* Webpage: https://behretj.github.io/LostAndFound

Via

Access Paper or Ask Questions