Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kapilan Balagopalan

Fixing the Loose Brake: Exponential-Tailed Stopping Time in Best Arm Identification

Nov 04, 2024

Kapilan Balagopalan, Tuan Ngo Nguyen, Yao Zhao, Kwang-Sung Jun

Figure 1 for Fixing the Loose Brake: Exponential-Tailed Stopping Time in Best Arm Identification

Figure 2 for Fixing the Loose Brake: Exponential-Tailed Stopping Time in Best Arm Identification

Abstract:The best arm identification problem requires identifying the best alternative (i.e., arm) in active experimentation using the smallest number of experiments (i.e., arm pulls), which is crucial for cost-efficient and timely decision-making processes. In the fixed confidence setting, an algorithm must stop data-dependently and return the estimated best arm with a correctness guarantee. Since this stopping time is random, we desire its distribution to have light tails. Unfortunately, many existing studies focus on high probability or in expectation bounds on the stopping time, which allow heavy tails and, for high probability bounds, even not stopping at all. We first prove that this never-stopping event can indeed happen for some popular algorithms. Motivated by this, we propose algorithms that provably enjoy an exponential-tailed stopping time, which improves upon the polynomial tail bound reported by Kalyanakrishnan et al. (2012). The first algorithm is based on a fixed budget algorithm called Sequential Halving along with a doubling trick. The second algorithm is a meta algorithm that takes in any fixed confidence algorithm with a high probability stopping guarantee and turns it into one that enjoys an exponential-tailed stopping time. Our results imply that there is much more to be desired for contemporary fixed confidence algorithms.

Via

Access Paper or Ask Questions

Minimum Empirical Divergence for Sub-Gaussian Linear Bandits

Oct 31, 2024

Kapilan Balagopalan, Kwang-Sung Jun

Figure 1 for Minimum Empirical Divergence for Sub-Gaussian Linear Bandits

Figure 2 for Minimum Empirical Divergence for Sub-Gaussian Linear Bandits

Figure 3 for Minimum Empirical Divergence for Sub-Gaussian Linear Bandits

Figure 4 for Minimum Empirical Divergence for Sub-Gaussian Linear Bandits

Abstract:We propose a novel linear bandit algorithm called LinMED (Linear Minimum Empirical Divergence), which is a linear extension of the MED algorithm that was originally designed for multi-armed bandits. LinMED is a randomized algorithm that admits a closed-form computation of the arm sampling probabilities, unlike the popular randomized algorithm called linear Thompson sampling. Such a feature proves useful for off-policy evaluation where the unbiased evaluation requires accurately computing the sampling probability. We prove that LinMED enjoys a near-optimal regret bound of $d\sqrt{n}$ up to logarithmic factors where $d$ is the dimension and $n$ is the time horizon. We further show that LinMED enjoys a $\frac{d^2}{\Delta}\left(\log^2(n)\right)\log\left(\log(n)\right)$ problem-dependent regret where $\Delta$ is the smallest sub-optimality gap, which is lower than $\frac{d^2}{\Delta}\log^3(n)$ of the standard algorithm OFUL (Abbasi-yadkori et al., 2011). Our empirical study shows that LinMED has a competitive performance with the state-of-the-art algorithms.

Via

Access Paper or Ask Questions