Cellular-connected unmanned aerial vehicle (UAV) is a promising technology to unlock the full potential of UAVs in the future. However, how to achieve ubiquitous three-dimensional (3D) communication coverage for the UAVs in the sky is a new challenge. In this paper, we tackle this challenge by a new coverage-aware navigation approach, which exploits the UAV's controllable mobility to design its navigation/trajectory to avoid the cellular BSs' coverage holes while accomplishing their missions. We formulate an UAV trajectory optimization problem to minimize the weighted sum of its mission completion time and expected communication outage duration, and propose a new solution approach based on the technique of deep reinforcement learning (DRL). To further improve the performance, we propose a new framework called simultaneous navigation and radio mapping (SNARM), where the UAV's signal measurement is used not only for training the deep Q network (DQN) directly, but also to create a radio map that is able to predict the outage probabilities at all locations in the area of interest. This thus enables the generation of simulated UAV trajectories and predicting their expected returns, which are then used to further train the DQN via Dyna technique, thus greatly improving the learning efficiency.