Ultra-Wideband (UWB) is one of the key technologies empowering the Internet of Thing (IoT) concept to perform reliable, energy-efficient, and highly accurate monitoring, screening, and localization in indoor environments. Performance of UWB-based localization systems, however, can significantly degrade because of Non Line of Sight (NLoS) connections between a mobile user and UWB beacons. To mitigate the destructive effects of NLoS connections, we target development of a Reinforcement Learning (RL) anchor selection framework that can efficiently cope with the dynamic nature of indoor environments. Existing RL models in this context, however, lack the ability to generalize well to be used in a new setting. Moreover, it takes a long time for the conventional RL models to reach the optimal policy. To tackle these challenges, we propose the Jump-start RL-based Uwb NOde selection (JUNO) framework, which performs real-time location predictions without relying on complex NLoS identification/mitigation methods. The effectiveness of the proposed JUNO framework is evaluated in term of the location error, where the mobile user moves randomly through an ultra-dense indoor environment with a high chance of establishing NLoS connections. Simulation results corroborate the effectiveness of the proposed framework in comparison to its state-of-the-art counterparts.