Abstract:This paper proposes an adaptive channel contention mechanism to optimize the queuing performance of a distributed millimeter wave (mmWave) uplink system with the capability of environment and mobility sensing. The mobile agents determine their back-off timer parameters according to their local knowledge of the uplink queue lengths, channel quality, and future channel statistics, where the channel prediction relies on the environment and mobility sensing. The optimization of queuing performance with this adaptive channel contention mechanism is formulated as a decentralized multi-agent Markov decision process (MDP). Although the channel contention actions are determined locally at the mobile agents, the optimization of local channel contention policies of all mobile agents is conducted in a centralized manner according to the system statistics before the scheduling. In the solution, the local policies are approximated by analytical models, and the optimization of their parameters becomes a stochastic optimization problem along an adaptive Markov chain. An unbiased gradient estimation is proposed so that the local policies can be optimized efficiently via the stochastic gradient descent method. It is demonstrated by simulation that the proposed gradient estimation is significantly more efficient in optimization than the existing methods, e.g., simultaneous perturbation stochastic approximation (SPSA).
Abstract:In this paper, a reinforcement-learning-based scheduling framework is proposed and implemented to optimize the application-layer quality-of-service (QoS) of a practical wireless local area network (WLAN) suffering from unknown interference. Particularly, application-layer tasks of file delivery and delay-sensitive communication, e.g., screen projection, in a WLAN with enhanced distributed channel access (EDCA) mechanism, are jointly scheduled by adjusting the contention window sizes and application-layer throughput limitation, such that their QoS, including the throughput of file delivery and the round trip time of the delay-sensitive communication, can be optimized. Due to the unknown interference and vendor-dependent implementation of the network interface card, the relation between the scheduling policy and the system QoS is unknown. Hence, a reinforcement learning method is proposed, in which a novel Q-network is trained to map from the historical scheduling parameters and QoS observations to the current scheduling action. It is demonstrated on a testbed that the proposed framework can achieve a significantly better QoS than the conventional EDCA mechanism.