Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yifu Wang

OpenGV 2.0: Motion prior-assisted calibration and SLAM with vehicle-mounted surround-view systems

Mar 05, 2025

Kun Huang, Yifu Wang, Si'ao Zhang, Zhirui Wang, Zhanpeng Ouyang, Zhenghua Yu, Laurent Kneip

Abstract:The present paper proposes optimization-based solutions to visual SLAM with a vehicle-mounted surround-view camera system. Owing to their original use-case, such systems often only contain a single camera facing into either direction and very limited overlap between fields of view. Our novelty consist of three optimization modules targeting at practical online calibration of exterior orientations from simple two-view geometry, reliable front-end initialization of relative displacements, and accurate back-end optimization using a continuous-time trajectory model. The commonality between the proposed modules is given by the fact that all three of them exploit motion priors that are related to the inherent non-holonomic characteristics of passenger vehicle motion. In contrast to prior related art, the proposed modules furthermore excel in terms of bypassing partial unobservabilities in the transformation variables that commonly occur for Ackermann-motion. As a further contribution, the modules are built into a novel surround-view camera SLAM system that specifically targets deployment on Ackermann vehicles operating in urban environments. All modules are studied in the context of in-depth ablation studies, and the practical validity of the entire framework is supported by a successful application to challenging, large-scale publicly available online datasets. Note that upon acceptance, the entire framework is scheduled for open-source release as part of an extension of the OpenGV library.

Via

Access Paper or Ask Questions

T$^3$-S2S: Training-free Triplet Tuning for Sketch to Scene Generation

Dec 18, 2024

Zhenhong Sun, Yifu Wang, Yonhon Ng, Yunfei Duan, Daoyi Dong, Hongdong Li, Pan Ji

Abstract:Scene generation is crucial to many computer graphics applications. Recent advances in generative AI have streamlined sketch-to-image workflows, easing the workload for artists and designers in creating scene concept art. However, these methods often struggle for complex scenes with multiple detailed objects, sometimes missing small or uncommon instances. In this paper, we propose a Training-free Triplet Tuning for Sketch-to-Scene (T3-S2S) generation after reviewing the entire cross-attention mechanism. This scheme revitalizes the existing ControlNet model, enabling effective handling of multi-instance generations, involving prompt balance, characteristics prominence, and dense tuning. Specifically, this approach enhances keyword representation via the prompt balance module, reducing the risk of missing critical instances. It also includes a characteristics prominence module that highlights TopK indices in each channel, ensuring essential features are better represented based on token sketches. Additionally, it employs dense tuning to refine contour details in the attention map, compensating for instance-related regions. Experiments validate that our triplet tuning approach substantially improves the performance of existing sketch-to-image models. It consistently generates detailed, multi-instance 2D images, closely adhering to the input prompts and enhancing visual quality in complex multi-instance scenes. Code is available at https://github.com/chaos-sun/t3s2s.git.

Via

Access Paper or Ask Questions

Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches

Aug 08, 2024

Yongzhi Xu, Yonhon Ng, Yifu Wang, Inkyu Sa, Yunfei Duan, Yang Li, Pan Ji, Hongdong Li

Figure 1 for Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches

Figure 2 for Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches

Figure 3 for Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches

Figure 4 for Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches

Abstract:3D Content Generation is at the heart of many computer graphics applications, including video gaming, film-making, virtual and augmented reality, etc. This paper proposes a novel deep-learning based approach for automatically generating interactive and playable 3D game scenes, all from the user's casual prompts such as a hand-drawn sketch. Sketch-based input offers a natural, and convenient way to convey the user's design intention in the content creation process. To circumvent the data-deficient challenge in learning (i.e. the lack of large training data of 3D scenes), our method leverages a pre-trained 2D denoising diffusion model to generate a 2D image of the scene as the conceptual guidance. In this process, we adopt the isometric projection mode to factor out unknown camera poses while obtaining the scene layout. From the generated isometric image, we use a pre-trained image understanding method to segment the image into meaningful parts, such as off-ground objects, trees, and buildings, and extract the 2D scene layout. These segments and layouts are subsequently fed into a procedural content generation (PCG) engine, such as a 3D video game engine like Unity or Unreal, to create the 3D scene. The resulting 3D scene can be seamlessly integrated into a game development environment and is readily playable. Extensive tests demonstrate that our method can efficiently generate high-quality and interactive 3D game scenes with layouts that closely follow the user's intention.

* Project Page: https://xrvisionlabs.github.io/Sketch2Scene/

Via

Access Paper or Ask Questions

Cross-Modal Semi-Dense 6-DoF Tracking of an Event Camera in Challenging Conditions

Jan 16, 2024

Yi-Fan Zuo, Wanting Xu, Xia Wang, Yifu Wang, Laurent Kneip

Abstract:Vision-based localization is a cost-effective and thus attractive solution for many intelligent mobile platforms. However, its accuracy and especially robustness still suffer from low illumination conditions, illumination changes, and aggressive motion. Event-based cameras are bio-inspired visual sensors that perform well in HDR conditions and have high temporal resolution, and thus provide an interesting alternative in such challenging scenarios. While purely event-based solutions currently do not yet produce satisfying mapping results, the present work demonstrates the feasibility of purely event-based tracking if an alternative sensor is permitted for mapping. The method relies on geometric 3D-2D registration of semi-dense maps and events, and achieves highly reliable and accurate cross-modal tracking results. Practically relevant scenarios are given by depth camera-supported tracking or map-based localization with a semi-dense map prior created by a regular image-based visual SLAM or structure-from-motion system. Conventional edge-based 3D-2D alignment is extended by a novel polarity-aware registration that makes use of signed time-surface maps (STSM) obtained from event streams. We furthermore introduce a novel culling strategy for occluded points. Both modifications increase the speed of the tracker and its robustness against occlusions or large view-point variations. The approach is validated on many real datasets covering the above-mentioned challenging conditions, and compared against similar solutions realised with regular cameras.

* accepted by IEEE Transactions on Robotics (T-RO). arXiv admin note: text overlap with arXiv:2202.02556

Via

Access Paper or Ask Questions

MAVIS: Multi-Camera Augmented Visual-Inertial SLAM using SE2(3) Based Exact IMU Pre-integration

Sep 18, 2023

Yifu Wang, Yonhon Ng, Inkyu Sa, Alvaro Parra, Cristian Rodriguez, Tao Jun Lin, Hongdong Li

Abstract:We present a novel optimization-based Visual-Inertial SLAM system designed for multiple partially overlapped camera systems, named MAVIS. Our framework fully exploits the benefits of wide field-of-view from multi-camera systems, and the metric scale measurements provided by an inertial measurement unit (IMU). We introduce an improved IMU pre-integration formulation based on the exponential function of an automorphism of SE_2(3), which can effectively enhance tracking performance under fast rotational motion and extended integration time. Furthermore, we extend conventional front-end tracking and back-end optimization module designed for monocular or stereo setup towards multi-camera systems, and introduce implementation details that contribute to the performance of our system in challenging scenarios. The practical validity of our approach is supported by our experiments on public datasets. Our MAVIS won the first place in all the vision-IMU tracks (single and multi-session SLAM) on Hilti SLAM Challenge 2023 with 1.7 times the score compared to the second place.

* video link: https://youtu.be/Q_jZSjhNFfg

Via

Access Paper or Ask Questions

Revisiting Event-based Video Frame Interpolation

Jul 24, 2023

Jiaben Chen, Yichen Zhu, Dongze Lian, Jiaqi Yang, Yifu Wang, Renrui Zhang, Xinhang Liu, Shenhan Qian, Laurent Kneip, Shenghua Gao

Figure 1 for Revisiting Event-based Video Frame Interpolation

Figure 2 for Revisiting Event-based Video Frame Interpolation

Figure 3 for Revisiting Event-based Video Frame Interpolation

Figure 4 for Revisiting Event-based Video Frame Interpolation

Abstract:Dynamic vision sensors or event cameras provide rich complementary information for video frame interpolation. Existing state-of-the-art methods follow the paradigm of combining both synthesis-based and warping networks. However, few of those methods fully respect the intrinsic characteristics of events streams. Given that event cameras only encode intensity changes and polarity rather than color intensities, estimating optical flow from events is arguably more difficult than from RGB information. We therefore propose to incorporate RGB information in an event-guided optical flow refinement strategy. Moreover, in light of the quasi-continuous nature of the time signals provided by event cameras, we propose a divide-and-conquer strategy in which event-based intermediate frame synthesis happens incrementally in multiple simplified stages rather than in a single, long stage. Extensive experiments on both synthetic and real-world datasets show that these modifications lead to more reliable and realistic intermediate frame results than previous video frame interpolation methods. Our findings underline that a careful consideration of event characteristics such as high temporal density and elevated noise benefits interpolation accuracy.

* Accepted by IROS2023 Project Site: https://jiabenchen.github.io/revisit_event

Via

Access Paper or Ask Questions

Cross-modal Place Recognition in Image Databases using Event-based Sensors

Jul 03, 2023

Xiang Ji, Jiaxin Wei, Yifu Wang, Huiliang Shang, Laurent Kneip

Abstract:Visual place recognition is an important problem towards global localization in many robotics tasks. One of the biggest challenges is that it may suffer from illumination or appearance changes in surrounding environments. Event cameras are interesting alternatives to frame-based sensors as their high dynamic range enables robust perception in difficult illumination conditions. However, current event-based place recognition methods only rely on event information, which restricts downstream applications of VPR. In this paper, we present the first cross-modal visual place recognition framework that is capable of retrieving regular images from a database given an event query. Our method demonstrates promising results with respect to the state-of-the-art frame-based and event-based methods on the Brisbane-Event-VPR dataset under different scenarios. We also verify the effectiveness of the combination of retrieval and classification, which can boost performance by a large margin.

Via

Access Paper or Ask Questions

Globally-Optimal Contrast Maximisation for Event Cameras

Jun 10, 2022

Xin Peng, Ling Gao, Yifu Wang, Laurent Kneip

Figure 1 for Globally-Optimal Contrast Maximisation for Event Cameras

Figure 2 for Globally-Optimal Contrast Maximisation for Event Cameras

Figure 3 for Globally-Optimal Contrast Maximisation for Event Cameras

Figure 4 for Globally-Optimal Contrast Maximisation for Event Cameras

Abstract:Event cameras are bio-inspired sensors that perform well in challenging illumination conditions and have high temporal resolution. However, their concept is fundamentally different from traditional frame-based cameras. The pixels of an event camera operate independently and asynchronously. They measure changes of the logarithmic brightness and return them in the highly discretised form of time-stamped events indicating a relative change of a certain quantity since the last event. New models and algorithms are needed to process this kind of measurements. The present work looks at several motion estimation problems with event cameras. The flow of the events is modelled by a general homographic warping in a space-time volume, and the objective is formulated as a maximisation of contrast within the image of warped events. Our core contribution consists of deriving globally optimal solutions to these generally non-convex problems, which removes the dependency on a good initial guess plaguing existing methods. Our methods rely on branch-and-bound optimisation and employ novel and efficient, recursive upper and lower bounds derived for six different contrast estimation functions. The practical validity of our approach is demonstrated by a successful application to three different event camera motion estimation problems.

* IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022
* arXiv admin note: substantial text overlap with arXiv:2203.03914

Via

Access Paper or Ask Questions

Globally-Optimal Event Camera Motion Estimation

Mar 08, 2022

Xin Peng, Yifu Wang, Ling Gao, Laurent Kneip

Figure 1 for Globally-Optimal Event Camera Motion Estimation

Figure 2 for Globally-Optimal Event Camera Motion Estimation

Figure 3 for Globally-Optimal Event Camera Motion Estimation

Figure 4 for Globally-Optimal Event Camera Motion Estimation

Abstract:Event cameras are bio-inspired sensors that perform well in HDR conditions and have high temporal resolution. However, different from traditional frame-based cameras, event cameras measure asynchronous pixel-level brightness changes and return them in a highly discretised format, hence new algorithms are needed. The present paper looks at fronto-parallel motion estimation of an event camera. The flow of the events is modeled by a general homographic warping in a space-time volume, and the objective is formulated as a maximisation of contrast within the image of unwarped events. However, in stark contrast to prior art, we derive a globally optimal solution to this generally non-convex problem, and thus remove the dependency on a good initial guess. Our algorithm relies on branch-and-bound optimisation for which we derive novel, recursive upper and lower bounds for six different contrast estimation functions. The practical validity of our approach is supported by a highly successful application to AGV motion estimation with a downward facing event camera, a challenging scenario in which the sensor experiences fronto-parallel motion in front of noisy, fast moving textures.

* European Conference on Computer Vision (ECCV), 2020

Via

Access Paper or Ask Questions

Accurate calibration of multi-perspective cameras from a generalization of the hand-eye constraint

Feb 08, 2022

Yifu Wang, Wenqing Jiang, Kun Huang, Soren Schwertfeger, Laurent Kneip

Figure 1 for Accurate calibration of multi-perspective cameras from a generalization of the hand-eye constraint

Figure 2 for Accurate calibration of multi-perspective cameras from a generalization of the hand-eye constraint

Figure 3 for Accurate calibration of multi-perspective cameras from a generalization of the hand-eye constraint

Figure 4 for Accurate calibration of multi-perspective cameras from a generalization of the hand-eye constraint

Abstract:Multi-perspective cameras are quickly gaining importance in many applications such as smart vehicles and virtual or augmented reality. However, a large system size or absence of overlap in neighbouring fields-of-view often complicate their calibration. We present a novel solution which relies on the availability of an external motion capture system. Our core contribution consists of an extension to the hand-eye calibration problem which jointly solves multi-eye-to-base problems in closed form. We furthermore demonstrate its equivalence to the multi-eye-in-hand problem. The practical validity of our approach is supported by our experiments, indicating that the method is highly efficient and accurate, and outperforms existing closed-form alternatives.

* accepted in the 2022 IEEE International Conference on Robotics and Automation (ICRA), Philadelphia (PA), USA

Via

Access Paper or Ask Questions