Recent advancements in Internet of Things (IoTs) have brought about a surge of interest in indoor positioning for the purpose of providing reliable, accurate, and energy-efficient indoor navigation/localization systems. Ultra Wide Band (UWB) technology has been emerged as a potential candidate to satisfy the aforementioned requirements. Although UWB technology can enhance the accuracy of indoor positioning due to the use of a wide-frequency spectrum, there are key challenges ahead for its efficient implementation. On the one hand, achieving high precision in positioning relies on the identification/mitigation Non Line of Sight (NLoS) links, leading to a significant increase in the complexity of the localization framework. On the other hand, UWB beacons have a limited battery life, which is especially problematic in practical circumstances with certain beacons located in strategic positions. To address these challenges, we introduce an efficient node selection framework to enhance the location accuracy without using complex NLoS mitigation methods, while maintaining a balance between the remaining battery life of UWB beacons. Referred to as the Deep Q-Learning Energy-optimized LoS/NLoS (DQLEL) UWB node selection framework, the mobile user is autonomously trained to determine the optimal pair of UWB beacons to be localized based on the 2-D Time Difference of Arrival (TDoA) framework. The effectiveness of the proposed DQLEL framework is evaluated in terms of the link condition, the deviation of the remaining battery life of UWB beacons, location error, and cumulative rewards. Based on the simulation results, the proposed DQLEL framework significantly outperformed its counterparts across the aforementioned aspects.