Abstract:We consider a long-term average profit maximizing admission control problem in an M/M/1 queuing system with a known arrival rate but an unknown service rate. With a fixed reward collected upon service completion and a cost per unit of time enforced on customers waiting in the queue, a dispatcher decides upon arrivals whether to admit the arriving customer or not based on the full history of observations of the queue-length of the system. \cite[Econometrica]{Naor} showed that if all the parameters of the model are known, then it is optimal to use a static threshold policy - admit if the queue-length is less than a predetermined threshold and otherwise not. We propose a learning-based dispatching algorithm and characterize its regret with respect to optimal dispatch policies for the full information model of \cite{Naor}. We show that the algorithm achieves an $O(1)$ regret when all optimal thresholds with full information are non-zero, and achieves an $O(\ln^{3+\epsilon}(N))$ regret in the case that an optimal threshold with full information is $0$ (i.e., an optimal policy is to reject all arrivals), where $N$ is the number of arrivals and $\epsilon>0$.
Abstract:For the problem of prediction with expert advice in the adversarial setting with geometric stopping, we compute the exact leading order expansion for the long time behavior of the value function. Then, we use this expansion to prove that as conjectured in Gravin et al. [12], the comb strategies are indeed asymptotically optimal for the adversary in the case of 4 experts.