Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yaming Yu

Thompson Sampling in Dynamic Systems for Contextual Bandit Problems

Oct 17, 2013

Tianbing Xu, Yaming Yu, John Turner, Amelia Regan

Figure 1 for Thompson Sampling in Dynamic Systems for Contextual Bandit Problems

Figure 2 for Thompson Sampling in Dynamic Systems for Contextual Bandit Problems

Figure 3 for Thompson Sampling in Dynamic Systems for Contextual Bandit Problems

Figure 4 for Thompson Sampling in Dynamic Systems for Contextual Bandit Problems

Abstract:We consider the multiarm bandit problems in the timevarying dynamic system for rich structural features. For the nonlinear dynamic model, we propose the approximate inference for the posterior distributions based on Laplace Approximation. For the context bandit problems, Thompson Sampling is adopted based on the underlying posterior distributions of the parameters. More specifically, we introduce the discount decays on the previous samples impact and analyze the different decay rates with the underlying sample dynamics. Consequently, the exploration and exploitation is adaptively tradeoff according to the dynamics in the system.

* 22 pages, 10 figures

Via

Access Paper or Ask Questions