Abstract:Policy gradient (PG) is a reinforcement learning (RL) approach that optimizes a parameterized policy model for an expected return using gradient ascent. Given a well-parameterized policy model, such as a neural network model, with appropriate initial parameters, the PG algorithms work well even when environment does not have the Markov property. Otherwise, they can be trapped on a plateau or suffer from peakiness effects. As another successful RL approach, algorithms based on Monte-Carlo Tree Search (MCTS), which include AlphaZero, have obtained groundbreaking results especially on the board game playing domain. They are also suitable to be applied to non-Markov decision processes. However, since the standard MCTS does not have the ability to learn state representation, the size of the tree-search space can be too large to search. In this work, we examine a mixture policy of PG and MCTS to complement each other's difficulties and take advantage of them. We derive conditions for asymptotic convergence with results of a two-timescale stochastic approximation and propose an algorithm that satisfies these conditions. The effectivity of the proposed methods is verified through numerical experiments on non-Markov decision processes.
Abstract:Large and acute economic shocks such as the 2007-2009 financial crisis and the current COVID-19 infections rapidly change the economic environment. In such a situation, the importance of real-time economic analysis using alternative datais emerging. Alternative data such as search query and location data are closer to real-time and richer than official statistics that are typically released once a month in an aggregated form. We take advantage of spatio-temporal granularity of alternative data and propose a mixed-FrequencyAggregate Learning (MF-AGL)model that predicts economic indicators for the smaller areas in real-time. We apply the model for the real-world problem; prediction of the number of job applicants which is closely related to the unemployment rates. We find that the proposed model predicts (i) the regional heterogeneity of the labor market condition and (ii) the rapidly changing economic status. The model can be applied to various tasks, especially economic analysis