Abstract:Visible images have been widely used for indoor motion estimation. Thermal images, in contrast, are more challenging to be used in motion estimation since they typically have lower resolution, less texture, and more noise. In this paper, a novel dataset for evaluating the performance of multi-spectral motion estimation systems is presented. The dataset includes both multi-spectral and dense depth images with accurate ground-truth camera poses provided by a motion capture system. All the sequences are recorded from a handheld multi-spectral device, which consists of a standard visible-light camera, a long-wave infrared camera, and a depth camera. The multi-spectral images, including both color and thermal images in full sensor resolution (640 $\times$ 480), are obtained from the hardware-synchronized standard and long-wave infrared camera at 32Hz. The depth images are captured by a Microsoft Kinect2 and can have benefits for learning cross-modalities stereo matching. In addition to the sequences with bright illumination, the dataset also contains scenes with dim or varying illumination. The full dataset, including both raw data and calibration data with detailed specifications of data format, is publicly available.
Abstract:Multi-spectral sensors consisting of a standard (visible-light) camera and a long-wave infrared camera can simultaneously provide both visible and thermal images. Since thermal images are independent from environmental illumination, they can help to overcome certain limitations of standard cameras under complicated illumination conditions. However, due to the difference in the information source of the two types of cameras, their images usually share very low texture similarity. Hence, traditional texture-based feature matching methods cannot be directly applied to obtain stereo correspondences. To tackle this problem, a multi-spectral visual odometry method without explicit stereo matching is proposed in this paper. Bundle adjustment of multi-view stereo is performed on the visible and the thermal images using direct image alignment. Scale drift can be avoided by additional temporal observations of map points with the fixed-baseline stereo. Experimental results indicate that the proposed method can provide accurate visual odometry results with recovered metric scale. Moreover, the proposed method can also provide a metric 3D reconstruction in semi-dense density with multi-spectral information, which is not available from existing multi-spectral methods.