Abstract:Thompson Sampling has recently been shown to be optimal in the Bernoulli Multi-Armed Bandit setting[Kaufmann et al., 2012]. This bandit problem assumes stationary distributions for the rewards. It is often unrealistic to model the real world as a stationary distribution. In this paper we derive and evaluate algorithms using Thompson Sampling for a Switching Multi-Armed Bandit Problem. We propose a Thompson Sampling strategy equipped with a Bayesian change point mechanism to tackle this problem. We develop algorithms for a variety of cases with constant switching rate: when switching occurs all arms change (Global Switching), switching occurs independently for each arm (Per-Arm Switching), when the switching rate is known and when it must be inferred from data. This leads to a family of algorithms we collectively term Change-Point Thompson Sampling (CTS). We show empirical results of the algorithm in 4 artificial environments, and 2 derived from real world data; news click-through[Yahoo!, 2011] and foreign exchange data[Dukascopy, 2012], comparing them to some other bandit algorithms. In real world data CTS is the most effective.
Abstract:In binary-transaction data-mining, traditional frequent itemset mining often produces results which are not straightforward to interpret. To overcome this problem, probability models are often used to produce more compact and conclusive results, albeit with some loss of accuracy. Bayesian statistics have been widely used in the development of probability models in machine learning in recent years and these methods have many advantages, including their abilities to avoid overfitting. In this paper, we develop two Bayesian mixture models with the Dirichlet distribution prior and the Dirichlet process (DP) prior to improve the previous non-Bayesian mixture model developed for transaction dataset mining. We implement the inference of both mixture models using two methods: a collapsed Gibbs sampling scheme and a variational approximation algorithm. Experiments in several benchmark problems have shown that both mixture models achieve better performance than a non-Bayesian mixture model. The variational algorithm is the faster of the two approaches while the Gibbs sampling method achieves a more accurate results. The Dirichlet process mixture model can automatically grow to a proper complexity for a better approximation. Once the model is built, it can be very fast to query and run analysis on (typically 10 times faster than Eclat, as we will show in the experiment section). However, these approaches also show that mixture models underestimate the probabilities of frequent itemsets. Consequently, these models have a higher sensitivity but a lower specificity.
Abstract:In this paper a novelty filter is introduced which allows a robot operating in an un structured environment to produce a self-organised model of its surroundings and to detect deviations from the learned model. The environment is perceived using the rob ot's 16 sonar sensors. The algorithm produces a novelty measure for each sensor scan relative to the model it has learned. This means that it highlights stimuli which h ave not been previously experienced. The novelty filter proposed uses a model of hab ituation. Habituation is a decrement in behavioural response when a stimulus is pre sented repeatedly. Robot experiments are presented which demonstrate the reliable o peration of the filter in a number of environments.
Abstract:Recognising new or unusual features of an environment is an ability which is potentially very useful to a robot. This paper demonstrates an algorithm which achieves this task by learning an internal representation of `normality' from sonar scans taken as a robot explores the environment. This model of the environment is used to evaluate the novelty of each sonar scan presented to it with relation to the model. Stimuli which have not been seen before, and therefore have more novelty, are highlighted by the filter. The filter has the ability to forget about features which have been learned, so that stimuli which are seen only rarely recover their response over time. A number of robot experiments are presented which demonstrate the operation of the filter.
Abstract:The ability of a robot to detect and respond to changes in its environment is potentially very useful, as it draws attention to new and potentially important features. We describe an algorithm for learning to filter out previously experienced stimuli to allow further concentration on novel features. The algorithm uses a model of habituation, a biological process which causes a decrement in response with repeated presentation. Experiments with a mobile robot are presented in which the robot detects the most novel stimulus and turns towards it (`neotaxis').