Achieving low duty cycle operation in low-power wireless networks in urban environments is complicated by the complex and variable dynamics of external interference and fading. We explore the use of reinforcement learning for achieving low power consumption for the task of optimal selection of channels. The learning relies on a hybrid of passive channel sampling for dealing with external interference and active channel sampling for dealing with fading. Our solution, Passive-Active Multi-armed bandit for LoRa (PAMLR, pronounced "Pamela"), balances the two types of samples to achieve energy-efficient channel selection: active channel measurements are tuned to an appropriately low level to update noise thresholds, and to compensate passive channel measurements are tuned to an appropriately high level for selecting the top-most channels from channel exploration using the noise thresholds. The rates of both types of samples are adapted in response to channel dynamics. Based on extensive testing in multiple environments in different cities, we validate that PAMLR can maintain excellent communication quality, as demonstrated by a low SNR regret compared to the optimal channel allocation policy, while substantially minimizing the energy cost associated with channel measurements.