In this paper, we investigate the age-of-information (AoI) of a power domain non-orthogonal multiple access (NOMA) network, where multiple internet-of-things (IoT) devices transmit to a common gateway in a grant-free random fashion. More specifically, we consider a framed setup composed of multiple time slots, and resort to the $Q$-learning algorithm to properly define, in a distributed manner, the time slot and the power level each IoT device transmits within a frame. In the proposed AoI-QL-NOMA scheme, the $Q$-learning reward is adapted with the aim of minimizing the average AoI of the network, while only requiring a single feedback bit per time slot, in a frame basis. Our results show that AoI-QL-NOMA significantly improves the AoI performance compared to some recently proposed schemes, without significantly reducing the network throughput.