Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Karhan Kayan

Princeton365: A Diverse Dataset with Accurate Camera Pose

Jun 10, 2025

Karhan Kayan, Stamatis Alexandropoulos, Rishabh Jain, Yiming Zuo, Erich Liang, Jia Deng

Abstract:We introduce Princeton365, a large-scale diverse dataset of 365 videos with accurate camera pose. Our dataset bridges the gap between accuracy and data diversity in current SLAM benchmarks by introducing a novel ground truth collection framework that leverages calibration boards and a 360-camera. We collect indoor, outdoor, and object scanning videos with synchronized monocular and stereo RGB video outputs as well as IMU. We further propose a new scene scale-aware evaluation metric for SLAM based on the the optical flow induced by the camera pose estimation error. In contrast to the current metrics, our new metric allows for comparison between the performance of SLAM methods across scenes as opposed to existing metrics such as Average Trajectory Error (ATE), allowing researchers to analyze the failure modes of their methods. We also propose a challenging Novel View Synthesis benchmark that covers cases not covered by current NVS benchmarks, such as fully non-Lambertian scenes with 360-degree camera trajectories. Please visit https://princeton365.cs.princeton.edu for the dataset, code, videos, and submission.

Via

Access Paper or Ask Questions

Towards Foundation Models for 3D Vision: How Close Are We?

Oct 14, 2024

Yiming Zuo, Karhan Kayan, Maggie Wang, Kevin Jeon, Jia Deng, Thomas L. Griffiths

Figure 1 for Towards Foundation Models for 3D Vision: How Close Are We?

Figure 2 for Towards Foundation Models for 3D Vision: How Close Are We?

Figure 3 for Towards Foundation Models for 3D Vision: How Close Are We?

Figure 4 for Towards Foundation Models for 3D Vision: How Close Are We?

Abstract:Building a foundation model for 3D vision is a complex challenge that remains unsolved. Towards that goal, it is important to understand the 3D reasoning capabilities of current models as well as identify the gaps between these models and humans. Therefore, we construct a new 3D visual understanding benchmark that covers fundamental 3D vision tasks in the Visual Question Answering (VQA) format. We evaluate state-of-the-art Vision-Language Models (VLMs), specialized models, and human subjects on it. Our results show that VLMs generally perform poorly, while the specialized models are accurate but not robust, failing under geometric perturbations. In contrast, human vision continues to be the most reliable 3D visual system. We further demonstrate that neural networks align more closely with human 3D vision mechanisms compared to classical computer vision methods, and Transformer-based networks such as ViT align more closely with human 3D vision mechanisms than CNNs. We hope our study will benefit the future development of foundation models for 3D vision.

Via

Access Paper or Ask Questions

Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation

Jun 17, 2024

Alexander Raistrick, Lingjie Mei, Karhan Kayan, David Yan, Yiming Zuo, Beining Han, Hongyu Wen, Meenal Parakh, Stamatis Alexandropoulos, Lahav Lipson(+2 more)

Figure 1 for Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation

Figure 2 for Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation

Figure 3 for Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation

Figure 4 for Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation

Abstract:We introduce Infinigen Indoors, a Blender-based procedural generator of photorealistic indoor scenes. It builds upon the existing Infinigen system, which focuses on natural scenes, but expands its coverage to indoor scenes by introducing a diverse library of procedural indoor assets, including furniture, architecture elements, appliances, and other day-to-day objects. It also introduces a constraint-based arrangement system, which consists of a domain-specific language for expressing diverse constraints on scene composition, and a solver that generates scene compositions that maximally satisfy the constraints. We provide an export tool that allows the generated 3D objects and scenes to be directly used for training embodied agents in real-time simulators such as Omniverse and Unreal. Infinigen Indoors is open-sourced under the BSD license. Please visit https://infinigen.org for code and videos.

* Accepted to CVPR 2024

Via

Access Paper or Ask Questions

Infinite Photorealistic Worlds using Procedural Generation

Jun 26, 2023

Alexander Raistrick, Lahav Lipson, Zeyu Ma, Lingjie Mei, Mingzhe Wang, Yiming Zuo, Karhan Kayan, Hongyu Wen, Beining Han, Yihan Wang(+5 more)

Figure 1 for Infinite Photorealistic Worlds using Procedural Generation

Figure 2 for Infinite Photorealistic Worlds using Procedural Generation

Figure 3 for Infinite Photorealistic Worlds using Procedural Generation

Figure 4 for Infinite Photorealistic Worlds using Procedural Generation

Abstract:We introduce Infinigen, a procedural generator of photorealistic 3D scenes of the natural world. Infinigen is entirely procedural: every asset, from shape to texture, is generated from scratch via randomized mathematical rules, using no external source and allowing infinite variation and composition. Infinigen offers broad coverage of objects and scenes in the natural world including plants, animals, terrains, and natural phenomena such as fire, cloud, rain, and snow. Infinigen can be used to generate unlimited, diverse training data for a wide range of computer vision tasks including object detection, semantic segmentation, optical flow, and 3D reconstruction. We expect Infinigen to be a useful resource for computer vision research and beyond. Please visit https://infinigen.org for videos, code and pre-generated data.

* Accepted to CVPR 2023, Camera Ready Version. Update 06/26/23: Change the open-source license to BSD

Via

Access Paper or Ask Questions