In this paper, we present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras, which has a 360 degrees coverage of stereo observations of the environment. For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation, which are faster and more accurate than the existing networks. Second, we integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency. Using the estimated depth map, we reproject keypoints onto each other view, which leads to a better and more efficient feature matching process. Finally, we fuse the omnidirectional depth maps and the estimated rig poses into the truncated signed distance function (TSDF) volume to acquire a 3D map. We evaluate our method on synthetic datasets with ground-truth and real-world sequences of challenging environments, and the extensive experiments show that the proposed system generates excellent reconstruction results in both synthetic and real-world environments.