Abstract:A multiagent sequential decision problem has been seen in many critical applications including urban transportation, autonomous driving cars, military operations, etc. Its widely known solution, namely multiagent reinforcement learning, has evolved tremendously in recent years. Among them, the solution paradigm of modeling other agents attracts our interest, which is different from traditional value decomposition or communication mechanisms. It enables agents to understand and anticipate others' behaviors and facilitates their collaboration. Inspired by recent research on the legibility that allows agents to reveal their intentions through their behavior, we propose a multiagent active legibility framework to improve their performance. The legibility-oriented framework allows agents to conduct legible actions so as to help others optimise their behaviors. In addition, we design a series of problem domains that emulate a common scenario and best characterize the legibility in multiagent reinforcement learning. The experimental results demonstrate that the new framework is more efficient and costs less training time compared to several multiagent reinforcement learning algorithms.
Abstract:Optimizing students' learning strategies is a crucial component in intelligent tutoring systems. Previous research has demonstrated the effectiveness of devising personalized learning strategies for students by modelling their learning processes through partially observable Markov decision process (POMDP). However, the research holds the assumption that the student population adheres to a uniform cognitive pattern. While this assumption simplifies the POMDP modelling process, it evidently deviates from a real-world scenario, thus reducing the precision of inducing individual students' learning strategies. In this article, we propose the homomorphic POMDP (H-POMDP) model to accommodate multiple cognitive patterns and present the parameter learning approach to automatically construct the H-POMDP model. Based on the H-POMDP model, we are able to represent different cognitive patterns from the data and induce more personalized learning strategies for individual students. We conduct experiments to show that, in comparison to the general POMDP approach, the H-POMDP model demonstrates better precision when modelling mixed data from multiple cognitive patterns. Moreover, the learning strategies derived from H-POMDPs exhibit better personalization in the performance evaluation.
Abstract:Modelling other agents' behaviors plays an important role in decision models for interactions among multiple agents. To optimise its own decisions, a subject agent needs to model what other agents act simultaneously in an uncertain environment. However, modelling insufficiency occurs when the agents are competitive and the subject agent can not get full knowledge about other agents. Even when the agents are collaborative, they may not share their true behaviors due to their privacy concerns. In this article, we investigate into diversifying behaviors of other agents in the subject agent's decision model prior to their interactions. Starting with prior knowledge about other agents' behaviors, we use a linear reduction technique to extract representative behavioral features from the known behaviors. We subsequently generate their new behaviors by expanding the features and propose two diversity measurements to select top-K behaviors. We demonstrate the performance of the new techniques in two well-studied problem domains. This research will contribute to intelligent systems dealing with unknown unknowns in an open artificial intelligence world.