Abstract:This paper focuses on home energy management systems (HEMS) in buildings that have controllable HVAC systems and use phase change material (PCM) as an energy storage system. In this setting, optimally operating a HVAC system is a challenge, because of the nonlinear and non-convex characteristics of the PCM, which makes the corresponding optimization problem impractical with commonly used methods in HEMS. Instead, we use dynamic programming (DP) to deal with the nonlinear features of PCM. However, DP suffers from the curse of dimensionality. Given this drawback, this paper proposes a novel methodology to reduce the computational burden of the DP algorithm in HEMS optimisation with PCM, while maintaining the quality of the solution. Specifically, the method incorporates approaches from sequential decision making in artificial intelligence, including macro-action and multi-time scale abstractions, coupled with an underlying state-space approximation to reduce state-space and action-space size. The method is demonstrated on an energy management problem for a typical residential building located in Sydney for four seasonal weather conditions. Our results demonstrate that the proposed method performs well with an attractive computational cost. In particular, it has a significant speed-up over directly applying DP to the problem, of up to 12900 times faster.
Abstract:Game theory's prescriptive power typically relies on full rationality and/or self-play interactions. In contrast, this work sets aside these fundamental premises and focuses instead on heterogeneous autonomous interactions between two or more agents. Specifically, we introduce a new and concise representation for repeated adversarial (constant-sum) games that highlight the necessary features that enable an automated planing agent to reason about how to score above the game's Nash equilibrium, when facing heterogeneous adversaries. To this end, we present TeamUP, a model-based RL algorithm designed for learning and planning such an abstraction. In essence, it is somewhat similar to R-max with a cleverly engineered reward shaping that treats exploration as an adversarial optimization problem. In practice, it attempts to find an ally with which to tacitly collude (in more than two-player games) and then collaborates on a joint plan of actions that can consistently score a high utility in adversarial repeated games. We use the inaugural Lemonade Stand Game Tournament to demonstrate the effectiveness of our approach, and find that TeamUP is the best performing agent, demoting the Tournament's actual winning strategy into second place. In our experimental analysis, we show hat our strategy successfully and consistently builds collaborations with many different heterogeneous (and sometimes very sophisticated) adversaries.
Abstract:Potential games and decentralised partially observable MDPs (Dec-POMDPs) are two commonly used models of multi-agent interaction, for static optimisation and sequential decisionmaking settings, respectively. In this paper we introduce filtered fictitious play for solving repeated potential games in which each player's observations of others' actions are perturbed by random noise, and use this algorithm to construct an online learning method for solving Dec-POMDPs. Specifically, we prove that noise in observations prevents standard fictitious play from converging to Nash equilibrium in potential games, which also makes fictitious play impractical for solving Dec-POMDPs. To combat this, we derive filtered fictitious play, and provide conditions under which it converges to a Nash equilibrium in potential games with noisy observations. We then use filtered fictitious play to construct a solver for Dec-POMDPs, and demonstrate our new algorithm's performance in a box pushing problem. Our results show that we consistently outperform the state-of-the-art Dec-POMDP solver by an average of 100% across the range of noise in the observation function.