Abstract:Hierarchical Task Network (HTN) planning usually requires a domain engineer to provide manual input about how to decompose a planning problem. Even HTN-MAKER, a well-known method-learning algorithm, requires a domain engineer to annotate the tasks with information about what to learn. We introduce CURRICULAMA, an HTN method learning algorithm that completely automates the learning process. It uses landmark analysis to compose annotated tasks and leverages curriculum learning to order the learning of methods from simpler to more complex. This eliminates the need for manual input, resolving a core issue with HTN-MAKER. We prove CURRICULAMA's soundness, and show experimentally that it has a substantially similar convergence rate in learning a complete set of methods to HTN-MAKER.
Abstract:The most common representation formalisms for automated planning are descriptive models that abstractly describe what the actions do and are tailored for effciently computing the next state(s) in a state-transition system. However, real-world acting requires operational models that describe how to do things, with rich control structures for closed-loop online decision-making in a dynamic environment. To use a different action model for planning than the one used for acting causes problems with combining acting and planning, in particular for the development and consistency verification of the different models. As an alternative, we define and implement an integrated acting-and-planning system in which both planning and acting use the same operational models, which are written in a general-purpose hierarchical task-oriented language offering rich control structures. The acting component, called Reactive Acting Engine (RAE), is inspired by the well-known PRS system, except that instead of being purely reactive, it can get advice from the planner. Our planner uses a UCT-like Monte Carlo Tree Search procedure, called UPOM (UCT Procedure for Operational Models), whose rollouts are simulations of the actor's operational models. We also present learning strategies for use with RAE and UPOM that acquire, from online acting experiences and/or simulated planning results, a mapping from decision contexts to method instances as well as a heuristic function to guide UPOM. Our experimental results show that UPOM and our learning strategies significantly improve the acting efficiency and robustness of RAE. We discuss the asymptotic convergence of UPOM by mapping its search space to an MDP.
Abstract:We present new planning and learning algorithms for RAE, the Refinement Acting Engine. RAE uses hierarchical operational models to perform tasks in dynamically changing environments. Our planning procedure, UPOM, does a UCT-like search in the space of operational models in order to find a near-optimal method to use for the task and context at hand. Our learning strategies acquire, from online acting experiences and/or simulated planning results, a mapping from decision contexts to method instances as well as a heuristic function to guide UPOM. Our experimental results show that UPOM and our learning strategies significantly improve RAE's performance in four test domains using two different metrics: efficiency and success ratio.
Abstract:In the field of Artificial Intelligence, traditional approaches to choosing moves in games involve the we of the minimax algorithm. However, recent research results indicate that minimizing may not always be the best approach. In this paper we summarize the results of some measurements on several model games with several different evaluation functions. These measurements, which are presented in detail in [NPT], show that there are some new algorithms that can make significantly better use of evaluation function values than the minimax algorithm does.
Abstract:The discovery that the minimax decision rule performs poorly in some games has sparked interest in possible alternatives to minimax. Until recently, the only games in which minimax was known to perform poorly were games which were mainly of theoretical interest. However, this paper reports results showing poor performance of minimax in a more common game called kalah. For the kalah games tested, a non-minimax decision rule called the product rule performs significantly better than minimax. This paper also discusses a possible way to predict whether or not minimax will perform well in a game when compared to product. A parameter called the rate of heuristic flaw (rhf) has been found to correlate positively with the. performance of product against minimax. Both analytical and experimental results are given that appear to support the predictive power of rhf.