Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yaxiong Liu

Pure exploration in multi-armed bandits with low rank structure using oblivious sampler

Jun 28, 2023

Yaxiong Liu, Atsuyoshi Nakamura, Kohei Hatano, Eiji Takimoto

Abstract:In this paper, we consider the low rank structure of the reward sequence of the pure exploration problems. Firstly, we propose the separated setting in pure exploration problem, where the exploration strategy cannot receive the feedback of its explorations. Due to this separation, it requires that the exploration strategy to sample the arms obliviously. By involving the kernel information of the reward vectors, we provide efficient algorithms for both time-varying and fixed cases with regret bound $O(d\sqrt{(\ln N)/n})$. Then, we show the lower bound to the pure exploration in multi-armed bandits with low rank sequence. There is an $O(\sqrt{\ln N})$ gap between our upper bound and the lower bound.

* 15 pages

Via

Access Paper or Ask Questions

A generalised log-determinant regularizer for online semi-definite programming and its applications

Dec 11, 2020

Yaxiong Liu, Ken-ichiro Moridomi, Kohei Hatano, Eiji Takimoto

Abstract:We consider a variant of online semi-definite programming problem (OSDP): The decision space consists of semi-definite matrices with bounded $\Gamma$-trace norm, which is a generalization of trace norm defined by a positive definite matrix $\Gamma.$ To solve this problem, we utilise the follow-the-regularized-leader algorithm with a $\Gamma$-dependent log-determinant regularizer. Then we apply our generalised setting and our proposed algorithm to online matrix completion(OMC) and online similarity prediction with side information. In particular, we reduce the online matrix completion problem to the generalised OSDP problem, and the side information is represented as the $\Gamma$ matrix. Hence, due to our regret bound for the generalised OSDP, we obtain an optimal mistake bound for the OMC by removing the logarithmic factor.

* 18 pages; typos correction

Via

Access Paper or Ask Questions

Improved algorithms for online load balancing

Jul 21, 2020

Yaxiong Liu, Kohei Hatano, Eiji Takimoto

Abstract:We consider an online load balancing problem and its extensions in the framework of repeated games. On each round, the player chooses a distribution (task allocation) over $K$ servers, and then the environment reveals the load of each server, which determines the computation time of each server for processing the task assigned. After all rounds, the cost of the player is measured by some norm of the cumulative computation-time vector. The cost is the makespan if the norm is $L_\infty$-norm. The goal is to minimize the regret, i.e., minimizing the player's cost relative to the cost of the best fixed distribution in hindsight. We propose algorithms for general norms and prove their regret bounds. In particular, for $L_\infty$-norm, our regret bound matches the best known bound and the proposed algorithm runs in polynomial time per trial involving linear programming and second order programming, whereas no polynomial time algorithm was previously known to achieve the bound.

* 16 pages; typos corrected

Via

Access Paper or Ask Questions