Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Huan Fu

FreeInv: Free Lunch for Improving DDIM Inversion

Mar 29, 2025

Yuxiang Bao, Huijie Liu, Xun Gao, Huan Fu, Guoliang Kang

Abstract:Naive DDIM inversion process usually suffers from a trajectory deviation issue, i.e., the latent trajectory during reconstruction deviates from the one during inversion. To alleviate this issue, previous methods either learn to mitigate the deviation or design cumbersome compensation strategy to reduce the mismatch error, exhibiting substantial time and computation cost. In this work, we present a nearly free-lunch method (named FreeInv) to address the issue more effectively and efficiently. In FreeInv, we randomly transform the latent representation and keep the transformation the same between the corresponding inversion and reconstruction time-step. It is motivated from a statistical perspective that an ensemble of DDIM inversion processes for multiple trajectories yields a smaller trajectory mismatch error on expectation. Moreover, through theoretical analysis and empirical study, we show that FreeInv performs an efficient ensemble of multiple trajectories. FreeInv can be freely integrated into existing inversion-based image and video editing techniques. Especially for inverting video sequences, it brings more significant fidelity and efficiency improvements. Comprehensive quantitative and qualitative evaluation on PIE benchmark and DAVIS dataset shows that FreeInv remarkably outperforms conventional DDIM inversion, and is competitive among previous state-of-the-art inversion methods, with superior computation efficiency.

Via

Access Paper or Ask Questions

Uncertainty Quantification in Stereo Matching

Dec 24, 2024

Wenxiao Cai, Dongting Hu, Ruoyan Yin, Jiankang Deng, Huan Fu, Wankou Yang, Mingming Gong

Figure 1 for Uncertainty Quantification in Stereo Matching

Figure 2 for Uncertainty Quantification in Stereo Matching

Figure 3 for Uncertainty Quantification in Stereo Matching

Figure 4 for Uncertainty Quantification in Stereo Matching

Abstract:Stereo matching plays a crucial role in various applications, where understanding uncertainty can enhance both safety and reliability. Despite this, the estimation and analysis of uncertainty in stereo matching have been largely overlooked. Previous works often provide limited interpretations of uncertainty and struggle to separate it effectively into data (aleatoric) and model (epistemic) components. This disentanglement is essential, as it allows for a clearer understanding of the underlying sources of error, enhancing both prediction confidence and decision-making processes. In this paper, we propose a new framework for stereo matching and its uncertainty quantification. We adopt Bayes risk as a measure of uncertainty and estimate data and model uncertainty separately. Experiments are conducted on four stereo benchmarks, and the results demonstrate that our method can estimate uncertainty accurately and efficiently. Furthermore, we apply our uncertainty method to improve prediction accuracy by selecting data points with small uncertainties, which reflects the accuracy of our estimated uncertainty. The codes are publicly available at https://github.com/RussRobin/Uncertainty.

Via

Access Paper or Ask Questions

Multiscale Representation for Real-Time Anti-Aliasing Neural Rendering

Apr 20, 2023

Dongting Hu, Zhenkai Zhang, Tingbo Hou, Tongliang Liu, Huan Fu, Mingming Gong

Abstract:The rendering scheme in neural radiance field (NeRF) is effective in rendering a pixel by casting a ray into the scene. However, NeRF yields blurred rendering results when the training images are captured at non-uniform scales, and produces aliasing artifacts if the test images are taken in distant views. To address this issue, Mip-NeRF proposes a multiscale representation as a conical frustum to encode scale information. Nevertheless, this approach is only suitable for offline rendering since it relies on integrated positional encoding (IPE) to query a multilayer perceptron (MLP). To overcome this limitation, we propose mip voxel grids (Mip-VoG), an explicit multiscale representation with a deferred architecture for real-time anti-aliasing rendering. Our approach includes a density Mip-VoG for scene geometry and a feature Mip-VoG with a small MLP for view-dependent color. Mip-VoG encodes scene scale using the level of detail (LOD) derived from ray differentials and uses quadrilinear interpolation to map a queried 3D location to its features and density from two neighboring downsampled voxel grids. To our knowledge, our approach is the first to offer multiscale training and real-time anti-aliasing rendering simultaneously. We conducted experiments on multiscale datasets, and the results show that our approach outperforms state-of-the-art real-time rendering baselines.

Via

Access Paper or Ask Questions

NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction

Mar 04, 2023

Bowen Cai, Jinchi Huang, Rongfei Jia, Chengfei Lv, Huan Fu

Abstract:This paper studies implicit surface reconstruction leveraging differentiable ray casting. Previous works such as IDR and NeuS overlook the spatial context in 3D space when predicting and rendering the surface, thereby may fail to capture sharp local topologies such as small holes and structures. To mitigate the limitation, we propose a flexible neural implicit representation leveraging hierarchical voxel grids, namely Neural Deformable Anchor (NeuDA), for high-fidelity surface reconstruction. NeuDA maintains the hierarchical anchor grids where each vertex stores a 3D position (or anchor) instead of the direct embedding (or feature). We optimize the anchor grids such that different local geometry structures can be adaptively encoded. Besides, we dig into the frequency encoding strategies and introduce a simple hierarchical positional encoding method for the hierarchical anchor structure to flexibly exploit the properties of high-frequency and low-frequency geometry and appearance. Experiments on both the DTU and BlendedMVS datasets demonstrate that NeuDA can produce promising mesh surfaces.

* Accepted to CVPR 2023

Via

Access Paper or Ask Questions

3D Scene Creation and Rendering via Rough Meshes: A Lighting Transfer Avenue

Dec 04, 2022

Yujie Li, Bowen Cai, Yuqin Liang, Rongfei Jia, Binqiang Zhao, Mingming Gong, Huan Fu

Abstract:This paper studies how to flexibly integrate reconstructed 3D models into practical 3D modeling pipelines such as 3D scene creation and rendering. Due to the technical difficulty, one can only obtain rough 3D models (R3DMs) for most real objects using existing 3D reconstruction techniques. As a result, physically-based rendering (PBR) would render low-quality images or videos for scenes that are constructed by R3DMs. One promising solution would be representing real-world objects as Neural Fields such as NeRFs, which are able to generate photo-realistic renderings of an object under desired viewpoints. However, a drawback is that the synthesized views through Neural Fields Rendering (NFR) cannot reflect the simulated lighting details on R3DMs in PBR pipelines, especially when object interactions in the 3D scene creation cause local shadows. To solve this dilemma, we propose a lighting transfer network (LighTNet) to bridge NFR and PBR, such that they can benefit from each other. LighTNet reasons about a simplified image composition model, remedies the uneven surface issue caused by R3DMs, and is empowered by several perceptual-motivated constraints and a new Lab angle loss which enhances the contrast between lighting strength and colors. Comparisons demonstrate that LighTNet is superior in synthesizing impressive lighting, and is promising in pushing NFR further in practical 3D modeling workflows. Project page: https://3d-front-future.github.io/LighTNet .

Via

Access Paper or Ask Questions

Ray Priors through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation

May 12, 2022

Jian Zhang, Yuanqing Zhang, Huan Fu, Xiaowei Zhou, Bowen Cai, Jinchi Huang, Rongfei Jia, Binqiang Zhao, Xing Tang

Figure 1 for Ray Priors through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation

Figure 2 for Ray Priors through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation

Figure 3 for Ray Priors through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation

Figure 4 for Ray Priors through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation

Abstract:Neural Radiance Fields (NeRF) have emerged as a potent paradigm for representing scenes and synthesizing photo-realistic images. A main limitation of conventional NeRFs is that they often fail to produce high-quality renderings under novel viewpoints that are significantly different from the training viewpoints. In this paper, instead of exploiting few-shot image synthesis, we study the novel view extrapolation setting that (1) the training images can well describe an object, and (2) there is a notable discrepancy between the training and test viewpoints' distributions. We present RapNeRF (RAy Priors) as a solution. Our insight is that the inherent appearances of a 3D surface's arbitrary visible projections should be consistent. We thus propose a random ray casting policy that allows training unseen views using seen views. Furthermore, we show that a ray atlas pre-computed from the observed rays' viewing directions could further enhance the rendering quality for extrapolated views. A main limitation is that RapNeRF would remove the strong view-dependent effects because it leverages the multi-view consistency property.

Via

Access Paper or Ask Questions

Modeling Indirect Illumination for Inverse Rendering

Apr 14, 2022

Yuanqing Zhang, Jiaming Sun, Xingyi He, Huan Fu, Rongfei Jia, Xiaowei Zhou

Figure 1 for Modeling Indirect Illumination for Inverse Rendering

Figure 2 for Modeling Indirect Illumination for Inverse Rendering

Figure 3 for Modeling Indirect Illumination for Inverse Rendering

Figure 4 for Modeling Indirect Illumination for Inverse Rendering

Abstract:Recent advances in implicit neural representations and differentiable rendering make it possible to simultaneously recover the geometry and materials of an object from multi-view RGB images captured under unknown static illumination. Despite the promising results achieved, indirect illumination is rarely modeled in previous methods, as it requires expensive recursive path tracing which makes the inverse rendering computationally intractable. In this paper, we propose a novel approach to efficiently recovering spatially-varying indirect illumination. The key insight is that indirect illumination can be conveniently derived from the neural radiance field learned from input images instead of being estimated jointly with direct illumination and materials. By properly modeling the indirect illumination and visibility of direct illumination, interreflection- and shadow-free albedo can be recovered. The experiments on both synthetic and real data demonstrate the superior performance of our approach compared to previous work and its capability to synthesize realistic renderings under novel viewpoints and illumination. Our code and data are available at https://zju3dv.github.io/invrender/.

Via

Access Paper or Ask Questions

Exploiting Diverse Characteristics and Adversarial Ambivalence for Domain Adaptive Segmentation

Jan 07, 2021

Bowen Cai, Huan Fu, Rongfei Jia, Binqiang Zhao, Hua Li, Yinghui Xu

Figure 1 for Exploiting Diverse Characteristics and Adversarial Ambivalence for Domain Adaptive Segmentation

Figure 2 for Exploiting Diverse Characteristics and Adversarial Ambivalence for Domain Adaptive Segmentation

Figure 3 for Exploiting Diverse Characteristics and Adversarial Ambivalence for Domain Adaptive Segmentation

Figure 4 for Exploiting Diverse Characteristics and Adversarial Ambivalence for Domain Adaptive Segmentation

Abstract:Adapting semantic segmentation models to new domains is an important but challenging problem. Recently enlightening progress has been made, but the performance of existing methods are unsatisfactory on real datasets where the new target domain comprises of heterogeneous sub-domains (e.g., diverse weather characteristics). We point out that carefully reasoning about the multiple modalities in the target domain can improve the robustness of adaptation models. To this end, we propose a condition-guided adaptation framework that is empowered by a special attentive progressive adversarial training (APAT) mechanism and a novel self-training policy. The APAT strategy progressively performs condition-specific alignment and attentive global feature matching. The new self-training scheme exploits the adversarial ambivalences of easy and hard adaptation regions and the correlations among target sub-domains effectively. We evaluate our method (DCAA) on various adaptation scenarios where the target images vary in weather conditions. The comparisons against baselines and the state-of-the-art approaches demonstrate the superiority of DCAA over the competitors.

* Accepted to AAAI 2021

Via

Access Paper or Ask Questions

3D-FRONT: 3D Furnished Rooms with layOuts and semaNTics

Nov 18, 2020

Huan Fu, Bowen Cai, Lin Gao, Lingxiao Zhang, Cao Li, Zengqi Xun, Chengyue Sun, Yiyun Fei, Yu Zheng, Ying Li(+10 more)

Figure 1 for 3D-FRONT: 3D Furnished Rooms with layOuts and semaNTics

Figure 2 for 3D-FRONT: 3D Furnished Rooms with layOuts and semaNTics

Figure 3 for 3D-FRONT: 3D Furnished Rooms with layOuts and semaNTics

Figure 4 for 3D-FRONT: 3D Furnished Rooms with layOuts and semaNTics

Abstract:We introduce 3D-FRONT (3D Furnished Rooms with layOuts and semaNTics), a new, large-scale, and comprehensive repository of synthetic indoor scenes highlighted by professionally designed layouts and a large number of rooms populated by high-quality textured 3D models with style compatibility. From layout semantics down to texture details of individual objects, our dataset is freely available to the academic community and beyond. Currently, 3D-FRONT contains 18,797 rooms diversely furnished by 3D objects, far surpassing all publicly available scene datasets. In addition, the 7,302 furniture objects all come with high-quality textures. While the floorplans and layout designs are directly sourced from professional creations, the interior designs in terms of furniture styles, color, and textures have been carefully curated based on a recommender system we develop to attain consistent styles as expert designs. Furthermore, we release Trescope, a light-weight rendering tool, to support benchmark rendering of 2D images and annotations from 3D-FRONT. We demonstrate two applications, interior scene synthesis and texture synthesis, that are especially tailored to the strengths of our new dataset. The project page is at: https://tianchi.aliyun.com/specials/promotion/alibaba-3d-scene-dataset.

* Project page: https://tianchi.aliyun.com/specials/promotion/alibaba-3d-scene-dataset

Via

Access Paper or Ask Questions

Hard Example Generation by Texture Synthesis for Cross-domain Shape Similarity Learning

Oct 27, 2020

Huan Fu, Shunming Li, Rongfei Jia, Mingming Gong, Binqiang Zhao, Dacheng Tao

Figure 1 for Hard Example Generation by Texture Synthesis for Cross-domain Shape Similarity Learning

Figure 2 for Hard Example Generation by Texture Synthesis for Cross-domain Shape Similarity Learning

Figure 3 for Hard Example Generation by Texture Synthesis for Cross-domain Shape Similarity Learning

Figure 4 for Hard Example Generation by Texture Synthesis for Cross-domain Shape Similarity Learning

Abstract:Image-based 3D shape retrieval (IBSR) aims to find the corresponding 3D shape of a given 2D image from a large 3D shape database. The common routine is to map 2D images and 3D shapes into an embedding space and define (or learn) a shape similarity measure. While metric learning with some adaptation techniques seems to be a natural solution to shape similarity learning, the performance is often unsatisfactory for fine-grained shape retrieval. In the paper, we identify the source of the poor performance and propose a practical solution to this problem. We find that the shape difference between a negative pair is entangled with the texture gap, making metric learning ineffective in pushing away negative pairs. To tackle this issue, we develop a geometry-focused multi-view metric learning framework empowered by texture synthesis. The synthesis of textures for 3D shape models creates hard triplets, which suppress the adverse effects of rich texture in 2D images, thereby push the network to focus more on discovering geometric characteristics. Our approach shows state-of-the-art performance on a recently released large-scale 3D-FUTURE[1] repository, as well as three widely studied benchmarks, including Pix3D[2], Stanford Cars[3], and Comp Cars[4]. Codes will be made publicly available at: https://github.com/3D-FRONT-FUTURE/IBSR-texture

* Accepted to NeurlPS 2020

Via

Access Paper or Ask Questions