Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fangwen Shu

Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard Video Codecs

Jan 06, 2025

Soonbin Lee, Fangwen Shu, Yago Sanchez, Thomas Schierl, Cornelius Hellge

Abstract:3D Gaussian Splatting is a recognized method for 3D scene representation, known for its high rendering quality and speed. However, its substantial data requirements present challenges for practical applications. In this paper, we introduce an efficient compression technique that significantly reduces storage overhead by using compact representation. We propose a unified architecture that combines point cloud data and feature planes through a progressive tri-plane structure. Our method utilizes 2D feature planes, enabling continuous spatial representation. To further optimize these representations, we incorporate entropy modeling in the frequency domain, specifically designed for standard video codecs. We also propose channel-wise bit allocation to achieve a better trade-off between bitrate consumption and feature plane representation. Consequently, our model effectively leverages spatial correlations within the feature planes to enhance rate-distortion performance using standard, non-differentiable video codecs. Experimental results demonstrate that our method outperforms existing methods in data compactness while maintaining high rendering quality. Our project page is available at https://fraunhoferhhi.github.io/CodecGS

Via

Access Paper or Ask Questions

ECRF: Entropy-Constrained Neural Radiance Fields Compression with Frequency Domain Optimization

Nov 23, 2023

Soonbin Lee, Fangwen Shu, Yago Sanchez, Thomas Schierl, Cornelius Hellge

Abstract:Explicit feature-grid based NeRF models have shown promising results in terms of rendering quality and significant speed-up in training. However, these methods often require a significant amount of data to represent a single scene or object. In this work, we present a compression model that aims to minimize the entropy in the frequency domain in order to effectively reduce the data size. First, we propose using the discrete cosine transform (DCT) on the tensorial radiance fields to compress the feature-grid. This feature-grid is transformed into coefficients, which are then quantized and entropy encoded, following a similar approach to the traditional video coding pipeline. Furthermore, to achieve a higher level of sparsity, we propose using an entropy parameterization technique for the frequency domain, specifically for DCT coefficients of the feature-grid. Since the transformed coefficients are optimized during the training phase, the proposed model does not require any fine-tuning or additional information. Our model only requires a lightweight compression pipeline for encoding and decoding, making it easier to apply volumetric radiance field methods for real-world applications. Experimental results demonstrate that our proposed frequency domain entropy model can achieve superior compression performance across various datasets. The source code will be made publicly available.

* 10 pages, 6 figures, 4 tables

Via

Access Paper or Ask Questions

Structure PLP-SLAM: Efficient Sparse Mapping and Localization using Point, Line and Plane for Monocular, RGB-D and Stereo Cameras

Jul 19, 2022

Fangwen Shu, Jiaxuan Wang, Alain Pagani, Didier Stricker

Figure 1 for Structure PLP-SLAM: Efficient Sparse Mapping and Localization using Point, Line and Plane for Monocular, RGB-D and Stereo Cameras

Figure 2 for Structure PLP-SLAM: Efficient Sparse Mapping and Localization using Point, Line and Plane for Monocular, RGB-D and Stereo Cameras

Figure 3 for Structure PLP-SLAM: Efficient Sparse Mapping and Localization using Point, Line and Plane for Monocular, RGB-D and Stereo Cameras

Figure 4 for Structure PLP-SLAM: Efficient Sparse Mapping and Localization using Point, Line and Plane for Monocular, RGB-D and Stereo Cameras

Abstract:This paper demonstrates a visual SLAM system that utilizes point and line cloud for robust camera localization, simultaneously, with an embedded piece-wise planar reconstruction (PPR) module which in all provides a structural map. To build a scale consistent map in parallel with tracking, such as employing a single camera brings the challenge of reconstructing geometric primitives with scale ambiguity, and further introduces the difficulty in graph optimization of bundle adjustment (BA). We address these problems by proposing several run-time optimizations on the reconstructed lines and planes. The system is then extended with depth and stereo sensors based on the design of the monocular framework. The results show that our proposed SLAM tightly incorporates the semantic features to boost both frontend tracking as well as backend optimization. We evaluate our system exhaustively on various datasets, and open-source our code for the community (https://github.com/PeterFWS/Structure-PLP-SLAM).

* The pre-print version, v2 add supplementary materials, code open-source: https://github.com/PeterFWS/Structure-PLP-SLAM

Via

Access Paper or Ask Questions

PlaneRecNet: Multi-Task Learning with Cross-Task Consistency for Piece-Wise Plane Detection and Reconstruction from a Single RGB Image

Oct 21, 2021

Yaxu Xie, Fangwen Shu, Jason Rambach, Alain Pagani, Didier Stricker

Figure 1 for PlaneRecNet: Multi-Task Learning with Cross-Task Consistency for Piece-Wise Plane Detection and Reconstruction from a Single RGB Image

Figure 2 for PlaneRecNet: Multi-Task Learning with Cross-Task Consistency for Piece-Wise Plane Detection and Reconstruction from a Single RGB Image

Figure 3 for PlaneRecNet: Multi-Task Learning with Cross-Task Consistency for Piece-Wise Plane Detection and Reconstruction from a Single RGB Image

Figure 4 for PlaneRecNet: Multi-Task Learning with Cross-Task Consistency for Piece-Wise Plane Detection and Reconstruction from a Single RGB Image

Abstract:Piece-wise 3D planar reconstruction provides holistic scene understanding of man-made environments, especially for indoor scenarios. Most recent approaches focused on improving the segmentation and reconstruction results by introducing advanced network architectures but overlooked the dual characteristics of piece-wise planes as objects and geometric models. Different from other existing approaches, we start from enforcing cross-task consistency for our multi-task convolutional neural network, PlaneRecNet, which integrates a single-stage instance segmentation network for piece-wise planar segmentation and a depth decoder to reconstruct the scene from a single RGB image. To achieve this, we introduce several novel loss functions (geometric constraint) that jointly improve the accuracy of piece-wise planar segmentation and depth estimation. Meanwhile, a novel Plane Prior Attention module is used to guide depth estimation with the awareness of plane instances. Exhaustive experiments are conducted in this work to validate the effectiveness and efficiency of our method.

* accepted to BMVC 2021, code opensource: https://github.com/EryiXie/PlaneRecNet

Via

Access Paper or Ask Questions

Visual SLAM with Graph-Cut Optimized Multi-Plane Reconstruction

Aug 09, 2021

Fangwen Shu, Yaxu Xie, Jason Rambach, Alain Pagani, Didier Stricker

Figure 1 for Visual SLAM with Graph-Cut Optimized Multi-Plane Reconstruction

Figure 2 for Visual SLAM with Graph-Cut Optimized Multi-Plane Reconstruction

Figure 3 for Visual SLAM with Graph-Cut Optimized Multi-Plane Reconstruction

Figure 4 for Visual SLAM with Graph-Cut Optimized Multi-Plane Reconstruction

Abstract:This paper presents a semantic planar SLAM system that improves pose estimation and mapping using cues from an instance planar segmentation network. While the mainstream approaches are using RGB-D sensors, employing a monocular camera with such a system still faces challenges such as robust data association and precise geometric model fitting. In the majority of existing work, geometric model estimation problems such as homography estimation and piece-wise planar reconstruction (PPR) are usually solved by standard (greedy) RANSAC separately and sequentially. However, setting the inlier-outlier threshold is difficult in absence of information about the scene (i.e. the scale). In this work, we revisit these problems and argue that two mentioned geometric models (homographies/3D planes) can be solved by minimizing an energy function that exploits the spatial coherence, i.e. with graph-cut optimization, which also tackles the practical issue when the output of a trained CNN is inaccurate. Moreover, we propose an adaptive parameter setting strategy based on our experiments, and report a comprehensive evaluation on various open-source datasets.

* accepted to ISMAR-Adjunct 2021

Via

Access Paper or Ask Questions

PlaneSegNet: Fast and Robust Plane Estimation Using a Single-stage Instance Segmentation CNN

Mar 29, 2021

Yaxu Xie, Jason Rambach, Fangwen Shu, Didier Stricker

Figure 1 for PlaneSegNet: Fast and Robust Plane Estimation Using a Single-stage Instance Segmentation CNN

Figure 2 for PlaneSegNet: Fast and Robust Plane Estimation Using a Single-stage Instance Segmentation CNN

Figure 3 for PlaneSegNet: Fast and Robust Plane Estimation Using a Single-stage Instance Segmentation CNN

Figure 4 for PlaneSegNet: Fast and Robust Plane Estimation Using a Single-stage Instance Segmentation CNN

Abstract:Instance segmentation of planar regions in indoor scenes benefits visual SLAM and other applications such as augmented reality (AR) where scene understanding is required. Existing methods built upon two-stage frameworks show satisfactory accuracy but are limited by low frame rates. In this work, we propose a real-time deep neural architecture that estimates piece-wise planar regions from a single RGB image. Our model employs a variant of a fast single-stage CNN architecture to segment plane instances. Considering the particularity of the target detected, we propose Fast Feature Non-maximum Suppression (FF-NMS) to reduce the suppression errors resulted from overlapping bounding boxes of planes. We also utilize a Residual Feature Augmentation module in the Feature Pyramid Network (FPN). Our method achieves significantly higher frame-rates and comparable segmentation accuracy against two-stage methods. We automatically label over 70,000 images as ground truth from the Stanford 2D-3D-Semantics dataset. Moreover, we incorporate our method with a state-of-the-art planar SLAM and validate its benefits.

* accepted to ICRA 2021

Via

Access Paper or Ask Questions

SLAM in the Field: An Evaluation of Monocular Mapping and Localization on Challenging Dynamic Agricultural Environment

Nov 06, 2020

Fangwen Shu, Paul Lesur, Yaxu Xie, Alain Pagani, Didier Stricker

Figure 1 for SLAM in the Field: An Evaluation of Monocular Mapping and Localization on Challenging Dynamic Agricultural Environment

Figure 2 for SLAM in the Field: An Evaluation of Monocular Mapping and Localization on Challenging Dynamic Agricultural Environment

Figure 3 for SLAM in the Field: An Evaluation of Monocular Mapping and Localization on Challenging Dynamic Agricultural Environment

Figure 4 for SLAM in the Field: An Evaluation of Monocular Mapping and Localization on Challenging Dynamic Agricultural Environment

Abstract:This paper demonstrates a system capable of combining a sparse, indirect, monocular visual SLAM, with both offline and real-time Multi-View Stereo (MVS) reconstruction algorithms. This combination overcomes many obstacles encountered by autonomous vehicles or robots employed in agricultural environments, such as overly repetitive patterns, need for very detailed reconstructions, and abrupt movements caused by uneven roads. Furthermore, the use of a monocular SLAM makes our system much easier to integrate with an existing device, as we do not rely on a LiDAR (which is expensive and power consuming), or stereo camera (whose calibration is sensitive to external perturbation e.g. camera being displaced). To the best of our knowledge, this paper presents the first evaluation results for monocular SLAM, and our work further explores unsupervised depth estimation on this specific application scenario by simulating RGB-D SLAM to tackle the scale ambiguity, and shows our approach produces reconstructions that are helpful to various agricultural tasks. Moreover, we highlight that our experiments provide meaningful insight to improve monocular SLAM systems under agricultural settings.

* accepted to WACV 2021, acknowledgment added

Via

Access Paper or Ask Questions