Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wenjing Bian

CatFree3D: Category-agnostic 3D Object Detection with Diffusion

Aug 22, 2024

Wenjing Bian, Zirui Wang, Andrea Vedaldi

Figure 1 for CatFree3D: Category-agnostic 3D Object Detection with Diffusion

Figure 2 for CatFree3D: Category-agnostic 3D Object Detection with Diffusion

Figure 3 for CatFree3D: Category-agnostic 3D Object Detection with Diffusion

Figure 4 for CatFree3D: Category-agnostic 3D Object Detection with Diffusion

Abstract:Image-based 3D object detection is widely employed in applications such as autonomous vehicles and robotics, yet current systems struggle with generalisation due to complex problem setup and limited training data. We introduce a novel pipeline that decouples 3D detection from 2D detection and depth prediction, using a diffusion-based approach to improve accuracy and support category-agnostic detection. Additionally, we introduce the Normalised Hungarian Distance (NHD) metric for an accurate evaluation of 3D detection results, addressing the limitations of traditional IoU and GIoU metrics. Experimental results demonstrate that our method achieves state-of-the-art accuracy and strong generalisation across various object categories and datasets.

* Project page: https://bianwenjing.github.io/CatFree3D

Via

Access Paper or Ask Questions

CrossScore: Towards Multi-View Image Evaluation and Scoring

Apr 22, 2024

Zirui Wang, Wenjing Bian, Omkar Parkhi, Yuheng Ren, Victor Adrian Prisacariu

Abstract:We introduce a novel cross-reference image quality assessment method that effectively fills the gap in the image assessment landscape, complementing the array of established evaluation schemes -- ranging from full-reference metrics like SSIM, no-reference metrics such as NIQE, to general-reference metrics including FID, and Multi-modal-reference metrics, e.g., CLIPScore. Utilising a neural network with the cross-attention mechanism and a unique data collection pipeline from NVS optimisation, our method enables accurate image quality assessment without requiring ground truth references. By comparing a query image against multiple views of the same scene, our method addresses the limitations of existing metrics in novel view synthesis (NVS) and similar tasks where direct reference images are unavailable. Experimental results show that our method is closely correlated to the full-reference metric SSIM, while not requiring ground truth references.

* Project page see https://crossscore.active.vision

Via

Access Paper or Ask Questions

PoRF: Pose Residual Field for Accurate Neural Surface Reconstruction

Oct 12, 2023

Jia-Wang Bian, Wenjing Bian, Victor Adrian Prisacariu, Philip Torr

Abstract:Neural surface reconstruction is sensitive to the camera pose noise, even if state-of-the-art pose estimators like COLMAP or ARKit are used. More importantly, existing Pose-NeRF joint optimisation methods have struggled to improve pose accuracy in challenging real-world scenarios. To overcome the challenges, we introduce the pose residual field (\textbf{PoRF}), a novel implicit representation that uses an MLP for regressing pose updates. This is more robust than the conventional pose parameter optimisation due to parameter sharing that leverages global information over the entire sequence. Furthermore, we propose an epipolar geometry loss to enhance the supervision that leverages the correspondences exported from COLMAP results without the extra computational overhead. Our method yields promising results. On the DTU dataset, we reduce the rotation error by 78\% for COLMAP poses, leading to the decreased reconstruction Chamfer distance from 3.48mm to 0.85mm. On the MobileBrick dataset that contains casually captured unbounded 360-degree videos, our method refines ARKit poses and improves the reconstruction F1 score from 69.18 to 75.67, outperforming that with the dataset provided ground-truth pose (75.14). These achievements demonstrate the efficacy of our approach in refining camera poses and improving the accuracy of neural surface reconstruction in real-world scenarios.

* Under review

Via

Access Paper or Ask Questions

NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior

Dec 14, 2022

Wenjing Bian, Zirui Wang, Kejie Li, Jia-Wang Bian, Victor Adrian Prisacariu

Abstract:Training a Neural Radiance Field (NeRF) without pre-computed camera poses is challenging. Recent advances in this direction demonstrate the possibility of jointly optimising a NeRF and camera poses in forward-facing scenes. However, these methods still face difficulties during dramatic camera movement. We tackle this challenging problem by incorporating undistorted monocular depth priors. These priors are generated by correcting scale and shift parameters during training, with which we are then able to constrain the relative poses between consecutive frames. This constraint is achieved using our proposed novel loss functions. Experiments on real-world indoor and outdoor scenes show that our method can handle challenging camera trajectories and outperforms existing methods in terms of novel view rendering quality and pose estimation accuracy.

Via

Access Paper or Ask Questions

Ray-ONet: Efficient 3D Reconstruction From A Single RGB Image

Jul 05, 2021

Wenjing Bian, Zirui Wang, Kejie Li, Victor Adrian Prisacariu

Figure 1 for Ray-ONet: Efficient 3D Reconstruction From A Single RGB Image

Figure 2 for Ray-ONet: Efficient 3D Reconstruction From A Single RGB Image

Figure 3 for Ray-ONet: Efficient 3D Reconstruction From A Single RGB Image

Figure 4 for Ray-ONet: Efficient 3D Reconstruction From A Single RGB Image

Abstract:We propose Ray-ONet to reconstruct detailed 3D models from monocular images efficiently. By predicting a series of occupancy probabilities along a ray that is back-projected from a pixel in the camera coordinate, our method Ray-ONet improves the reconstruction accuracy in comparison with Occupancy Networks (ONet), while reducing the network inference complexity to O($N^2$). As a result, Ray-ONet achieves state-of-the-art performance on the ShapeNet benchmark with more than 20$\times$ speed-up at $128^3$ resolution and maintains a similar memory footprint during inference.

Via

Access Paper or Ask Questions