Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Actively Tracking the Optimal Arm in Non-Stationary Environments with Mandatory Probing

May 20, 2022

Gourab Ghatak

Figure 1 for Actively Tracking the Optimal Arm in Non-Stationary Environments with Mandatory Probing

Figure 2 for Actively Tracking the Optimal Arm in Non-Stationary Environments with Mandatory Probing

Figure 3 for Actively Tracking the Optimal Arm in Non-Stationary Environments with Mandatory Probing

Share this with someone who'll enjoy it:

Abstract:We study a novel multi-armed bandit (MAB) setting which mandates the agent to probe all the arms periodically in a non-stationary environment. In particular, we develop \texttt{TS-GE} that balances the regret guarantees of classical Thompson sampling (TS) with the broadcast probing (BP) of all the arms simultaneously in order to actively detect a change in the reward distributions. Once a system-level change is detected, the changed arm is identified by an optional subroutine called group exploration (GE) which scales as $\log_2(K)$ for a $K-$armed bandit setting. We characterize the probability of missed detection and the probability of false-alarm in terms of the environment parameters. The latency of change-detection is upper bounded by $\sqrt{T}$ while within a period of $\sqrt{T}$, all the arms are probed at least once. We highlight the conditions in which the regret guarantee of \texttt{TS-GE} outperforms that of the state-of-the-art algorithms, in particular, \texttt{ADSWITCH} and \texttt{M-UCB}. Furthermore, unlike the existing bandit algorithms, \texttt{TS-GE} can be deployed for applications such as timely status updates, critical control, and wireless energy transfer, which are essential features of next-generation wireless communication networks. We demonstrate the efficacy of \texttt{TS-GE} by employing it in a n industrial internet-of-things (IIoT) network designed for simultaneous wireless information and power transfer (SWIPT).

View paper on

Share this with someone who'll enjoy it:

Title:Actively Tracking the Optimal Arm in Non-Stationary Environments with Mandatory Probing

Paper and Code