Modern communication systems need to fulfill multiple and often conflicting objectives at the same time. In particular, new applications require high reliability while operating at low transmit powers. Moreover, reliability constraints may vary over time depending on the current state of the system. One solution to address this problem is to use joint transmissions from a number of base stations (BSs) to meet the reliability requirements. However, this approach is inefficient when considering the overall total transmit power. In this work, we propose a reinforcement learning-based power allocation scheme for an unmanned aerial vehicle (UAV) communication system with varying communication reliability requirements. In particular, the proposed scheme aims to minimize the total transmit power of all BSs while achieving an outage probability that is less than a tolerated threshold. This threshold varies over time, e.g., when the UAV enters a critical zone with high-reliability requirements. Our results show that the proposed learning scheme uses dynamic power allocation to meet varying reliability requirements, thus effectively conserving energy.