Multi-spectral sensors consisting of a standard (visible-light) camera and a long-wave infrared camera can simultaneously provide both visible and thermal images. Since thermal images are independent from environmental illumination, they can help to overcome certain limitations of standard cameras under complicated illumination conditions. However, due to the difference in the information source of the two types of cameras, their images usually share very low texture similarity. Hence, traditional texture-based feature matching methods cannot be directly applied to obtain stereo correspondences. To tackle this problem, a multi-spectral visual odometry method without explicit stereo matching is proposed in this paper. Bundle adjustment of multi-view stereo is performed on the visible and the thermal images using direct image alignment. Scale drift can be avoided by additional temporal observations of map points with the fixed-baseline stereo. Experimental results indicate that the proposed method can provide accurate visual odometry results with recovered metric scale. Moreover, the proposed method can also provide a metric 3D reconstruction in semi-dense density with multi-spectral information, which is not available from existing multi-spectral methods.