Abstract:We present a novel framework for enhancing the visual fidelity and consistency of text-guided 3D Gaussian Splatting (3DGS) editing. Existing editing approaches face two critical challenges: inconsistent geometric reconstructions across multiple viewpoints, particularly in challenging camera positions, and ineffective utilization of depth information during image manipulation, resulting in over-texture artifacts and degraded object boundaries. To address these limitations, we introduce: 1) A complementary information mutual learning network that enhances depth map estimation from 3DGS, enabling precise depth-conditioned 3D editing while preserving geometric structures. 2) A wavelet consensus attention mechanism that effectively aligns latent codes during the diffusion denoising process, ensuring multi-view consistency in the edited results. Through extensive experimentation, our method demonstrates superior performance in rendering quality and view consistency compared to state-of-the-art approaches. The results validate our framework as an effective solution for text-guided editing of 3D scenes.
Abstract:Transformer-based low-light enhancement methods have yielded promising performance by effectively capturing long-range dependencies in a global context. However, their elevated computational demand limits the scalability of multiple iterations in deep unfolding networks, and hence they have difficulty in flexibly balancing interpretability and distortion. To address this issue, we propose a novel Low-Light Enhancement method via relighting-guided Mamba with a deep unfolding network (LLEMamba), whose theoretical interpretability and fidelity are guaranteed by Retinex optimization and Mamba deep priors, respectively. Specifically, our LLEMamba first constructs a Retinex model with deep priors, embedding the iterative optimization process based on the Alternating Direction Method of Multipliers (ADMM) within a deep unfolding network. Unlike Transformer, to assist the deep unfolding framework with multiple iterations, the proposed LLEMamba introduces a novel Mamba architecture with lower computational complexity, which not only achieves light-dependent global visual context for dark images during reflectance relight but also optimizes to obtain more stable closed-form solutions. Experiments on the benchmarks show that LLEMamba achieves superior quantitative evaluations and lower distortion visual results compared to existing state-of-the-art methods.