Navigation applications relying on the Global Navigation Satellite System (GNSS) are limited in indoor environments and GNSS-denied outdoor terrains such as dense urban or forests. In this paper, we present a novel accurate, robust and low-cost GNSS-independent navigation system, which is composed of a monocular camera and Ultra-wideband (UWB) transceivers. Visual techniques have gained excellent results when computing the incremental motion of the sensor, and UWB methods have proved to provide promising localization accuracy due to the high time resolution of the UWB ranging signals. However, the monocular visual techniques with scale ambiguity are not suitable for applications requiring metric results, and UWB methods assume that the positions of the UWB transceiver anchor are pre-calibrated and known, thus precluding their application in unknown and challenging environments. To this end, we advocate leveraging the monocular camera and UWB to create a map of visual features and UWB anchors. We propose a visual-UWB Simultaneous Localization and Mapping (SLAM) algorithm which tightly combines visual and UWB measurements to form a joint non-linear optimization problem on Lie-Manifold. The 6 Degrees of Freedom (DoF) state of the vehicles and the map are estimated by minimizing the UWB ranging errors and landmark reprojection errors. Our navigation system starts with an exploratory task which performs the real-time visual-UWB SLAM to obtain the global map, then the navigation task by reusing this global map. The tasks can be performed by different vehicles in terms of equipped sensors and payload capability in a heterogeneous team. We validate our system on the public datasets, achieving typical centimeter accuracy and 0.1% scale error.