Abstract:For orthogonal multiple access (OMA) systems, the number of served user equipments (UEs) is limited to the number of available orthogonal resources. On the other hand, non-orthogonal multiple access (NOMA) schemes allow multiple UEs to use the same orthogonal resource. This extra degree of freedom introduces new challenges for resource allocation. Buffer state information (BSI), like the size and age of packets waiting for transmission, can be used to improve scheduling in OMA systems. In this paper, we investigate the impact of BSI on the performance of a centralized scheduler in an uplink multi-carrier NOMA scenario with UEs having various data rate and latency requirements. To handle the large combinatorial space of allocating UEs to the resources, we propose a novel scheduler based on actor-critic reinforcement learning incorporating BSI. Training and evaluation are carried out using Nokia's "wireless suite". We propose various novel techniques to both stabilize and speed up training. The proposed scheduler outperforms benchmark schedulers.