Abstract:Visual motion estimation is a well-studied challenge in autonomous navigation. Recent work has focused on addressing multimotion estimation in highly dynamic environments. These environments not only comprise multiple, complex motions but also tend to exhibit significant occlusion. Estimating third-party motions simultaneously with the sensor egomotion is difficult because an object's observed motion consists of both its true motion and the sensor motion. Most previous works in multimotion estimation simplify this problem by relying on appearance-based object detection or application-specific motion constraints. These approaches are effective in specific applications and environments but do not generalize well to the full multimotion estimation problem (MEP). This paper presents Multimotion Visual Odometry (MVO), a multimotion estimation pipeline that estimates the full SE(3) trajectory of every motion in the scene, including the sensor egomotion, without relying on appearance-based information. MVO extends the traditional visual odometry (VO) pipeline with multimotion segmentation and tracking techniques. It uses physically founded motion priors to extrapolate motions through temporary occlusions and identify the reappearance of motions through motion closure. Evaluations on real-world data from the Oxford Multimotion Dataset (OMD) and the KITTI Vision Benchmark Suite demonstrate that MVO achieves good estimation accuracy compared to similar approaches and is applicable to a variety of multimotion estimation challenges.
Abstract:Visual motion estimation is an integral and well-studied challenge in autonomous navigation. Recent work has focused on addressing multimotion estimation, which is especially challenging in highly dynamic environments. Such environments not only comprise multiple, complex motions but also tend to exhibit significant occlusion. Previous work in multiple object tracking focuses on maintaining the integrity of object tracks but usually relies on specific appearance-based descriptors or constrained motion models. These approaches are very effective in specific applications but do not generalize to the full multimotion estimation problem. This paper extends the multimotion visual odometry (MVO) pipeline to estimate multiple motions through occlusion, including the camera egomotion, by employing physically founded motion priors. This allows the pipeline to consistently estimate the full trajectory of every motion in a scene and recognize when temporarily occluded motions become unoccluded. The estimation performance of the pipeline is evaluated on real-world data from the Oxford Multimotion Dataset.
Abstract:Datasets advance research by posing challenging new problems and providing standardized methods of algorithm comparison. High-quality datasets exist for many important problems in robotics and computer vision, including egomotion estimation and motion/scene segmentation, but not for techniques that estimate every motion in a scene. Metric evaluation of these multimotion estimation techniques requires datasets consisting of multiple, complex motions that also contain ground truth for every moving body. The Oxford Multimotion Dataset provides a number of multimotion estimation problems of varying complexity. It includes both complex problems that challenge existing algorithms as well as a number of simpler problems to support development. These include observations from both static and dynamic sensors, a varying number of moving bodies, and a variety of different 3D motions. It also provides a number of experiments designed to isolate specific challenges of the multimotion problem, including rotation about the optical axis and occlusion. In total, the Oxford Multimotion Dataset contains over 110 minutes of multimotion data consisting of stereo and RGB-D camera images, IMU data, and Vicon ground-truth trajectories. The dataset culminates in a complex toy car segment representative of many challenging real-world scenarios. This paper describes each experiment with a focus on its relevance to the multimotion estimation problem.
Abstract:Estimating motion from images is a well-studied problem in computer vision and robotics. Previous work has developed techniques to estimate the motion of a moving camera in a largely static environment (e.g., visual odometry) and to segment or track motions in a dynamic scene using known camera motions (e.g., multiple object tracking). It is more challenging to estimate the unknown motion of the camera and the dynamic scene simultaneously. Most previous work requires a priori object models (e.g., tracking-by-detection), motion constraints (e.g., planar motion), or fails to estimate the full SE(3) motions of the scene (e.g., scene flow). While these approaches work well in specific application domains, they are not generalizable to unconstrained motions. This paper extends the traditional visual odometry (VO) pipeline to estimate the full SE(3) motion of both a stereo/RGB-D camera and the dynamic scene. This multimotion visual odometry (MVO) pipeline requires no a priori knowledge of the environment or the dynamic objects. Its performance is evaluated on a real-world dynamic dataset with ground truth for all motions from a motion capture system.