Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sili Chen

Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

Jan 21, 2025

Sili Chen, Hengkai Guo, Shengnan Zhu, Feihu Zhang, Zilong Huang, Jiashi Feng, Bingyi Kang

Abstract:Depth Anything has achieved remarkable success in monocular depth estimation with strong generalization ability. However, it suffers from temporal inconsistency in videos, hindering its practical applications. Various methods have been proposed to alleviate this issue by leveraging video generation models or introducing priors from optical flow and camera poses. Nonetheless, these methods are only applicable to short videos (< 10 seconds) and require a trade-off between quality and computational efficiency. We propose Video Depth Anything for high-quality, consistent depth estimation in super-long videos (over several minutes) without sacrificing efficiency. We base our model on Depth Anything V2 and replace its head with an efficient spatial-temporal head. We design a straightforward yet effective temporal consistency loss by constraining the temporal depth gradient, eliminating the need for additional geometric priors. The model is trained on a joint dataset of video depth and unlabeled images, similar to Depth Anything V2. Moreover, a novel key-frame-based strategy is developed for long video inference. Experiments show that our model can be applied to arbitrarily long videos without compromising quality, consistency, or generalization ability. Comprehensive evaluations on multiple video benchmarks demonstrate that our approach sets a new state-of-the-art in zero-shot video depth estimation. We offer models of different scales to support a range of scenarios, with our smallest model capable of real-time performance at 30 FPS.

Via

Access Paper or Ask Questions

MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction

Nov 02, 2024

Wang Zhao, Jiachen Liu, Sheng Zhang, Yishu Li, Sili Chen, Sharon X Huang, Yong-Jin Liu, Hengkai Guo

Figure 1 for MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction

Figure 2 for MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction

Figure 3 for MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction

Figure 4 for MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction

Abstract:This paper presents a generalizable 3D plane detection and reconstruction framework named MonoPlane. Unlike previous robust estimator-based works (which require multiple images or RGB-D input) and learning-based works (which suffer from domain shift), MonoPlane combines the best of two worlds and establishes a plane reconstruction pipeline based on monocular geometric cues, resulting in accurate, robust and scalable 3D plane detection and reconstruction in the wild. Specifically, we first leverage large-scale pre-trained neural networks to obtain the depth and surface normals from a single image. These monocular geometric cues are then incorporated into a proximity-guided RANSAC framework to sequentially fit each plane instance. We exploit effective 3D point proximity and model such proximity via a graph within RANSAC to guide the plane fitting from noisy monocular depths, followed by image-level multi-plane joint optimization to improve the consistency among all plane instances. We further design a simple but effective pipeline to extend this single-view solution to sparse-view 3D plane reconstruction. Extensive experiments on a list of datasets demonstrate our superior zero-shot generalizability over baselines, achieving state-of-the-art plane reconstruction performance in a transferring setting. Our code is available at https://github.com/thuzhaowang/MonoPlane .

* IROS 2024 (oral)

Via

Access Paper or Ask Questions

Noise-resistant Deep Learning for Object Classification in 3D Point Clouds Using a Point Pair Descriptor

Apr 05, 2018

Dmytro Bobkov, Sili Chen, Ruiqing Jian, Muhammad Iqbal, Eckehard Steinbach

Figure 1 for Noise-resistant Deep Learning for Object Classification in 3D Point Clouds Using a Point Pair Descriptor

Figure 2 for Noise-resistant Deep Learning for Object Classification in 3D Point Clouds Using a Point Pair Descriptor

Figure 3 for Noise-resistant Deep Learning for Object Classification in 3D Point Clouds Using a Point Pair Descriptor

Figure 4 for Noise-resistant Deep Learning for Object Classification in 3D Point Clouds Using a Point Pair Descriptor

Abstract:Object retrieval and classification in point cloud data is challenged by noise, irregular sampling density and occlusion. To address this issue, we propose a point pair descriptor that is robust to noise and occlusion and achieves high retrieval accuracy. We further show how the proposed descriptor can be used in a 4D convolutional neural network for the task of object classification. We propose a novel 4D convolutional layer that is able to learn class-specific clusters in the descriptor histograms. Finally, we provide experimental validation on 3 benchmark datasets, which confirms the superiority of the proposed approach.

* IEEE Robotics and Automation Letters 2018 Volume 3, Issue 2 IEEE Robotics and Automation Letters IEEE Robotics and Automation Letters
* 8 pages

Via

Access Paper or Ask Questions