Abstract:Developing effective Multi-Agent Systems (MAS) is critical for many applications requiring collaboration and coordination with humans. Despite the rapid advance of Multi-Agent Deep Reinforcement Learning (MADRL) in cooperative MAS, one major challenge is the simultaneous learning and interaction of independent agents in dynamic environments in the presence of stochastic rewards. State-of-the-art MADRL models struggle to perform well in Coordinated Multi-agent Object Transportation Problems (CMOTPs), wherein agents must coordinate with each other and learn from stochastic rewards. In contrast, humans often learn rapidly to adapt to nonstationary environments that require coordination among people. In this paper, motivated by the demonstrated ability of cognitive models based on Instance-Based Learning Theory (IBLT) to capture human decisions in many dynamic decision making tasks, we propose three variants of Multi-Agent IBL models (MAIBL). The idea of these MAIBL algorithms is to combine the cognitive mechanisms of IBLT and the techniques of MADRL models to deal with coordination MAS in stochastic environments from the perspective of independent learners. We demonstrate that the MAIBL models exhibit faster learning and achieve better coordination in a dynamic CMOTP task with various settings of stochastic rewards compared to current MADRL models. We discuss the benefits of integrating cognitive insights into MADRL models.
Abstract:Temporal credit assignment is crucial for learning and skill development in natural and artificial intelligence. While computational methods like the TD approach in reinforcement learning have been proposed, it's unclear if they accurately represent how humans handle feedback delays. Cognitive models intend to represent the mental steps by which humans solve problems and perform a number of tasks, but limited research in cognitive science has addressed the credit assignment problem in humans and cognitive models. Our research uses a cognitive model based on a theory of decisions from experience, Instance-Based Learning Theory (IBLT), to test different credit assignment mechanisms in a goal-seeking navigation task with varying levels of decision complexity. Instance-Based Learning (IBL) models simulate the process of making sequential choices with different credit assignment mechanisms, including a new IBL-TD model that combines the IBL decision mechanism with the TD approach. We found that (1) An IBL model that gives equal credit assignment to all decisions is able to match human performance better than other models, including IBL-TD and Q-learning; (2) IBL-TD and Q-learning models underperform compared to humans initially, but eventually, they outperform humans; (3) humans are influenced by decision complexity, while models are not. Our study provides insights into the challenges of capturing human behavior and the potential opportunities to use these models in future AI systems to support human activities.
Abstract:Computational cognitive modeling is a useful methodology to explore and validate theories of human cognitive processes. Often cognitive models are used to simulate the process by which humans perform a task or solve a problem and to make predictions about human behavior. Cognitive models based on Instance-Based Learning (IBL) Theory rely on a formal computational algorithm for dynamic decision making and on a memory mechanism from a well-known cognitive architecture, ACT-R. To advance the computational theory of human decision making and to demonstrate the usefulness of cognitive models in diverse domains, we must address a practical computational problem, the curse of exponential growth, that emerges from memory-based tabular computations. When more observations accumulate, there is an exponential growth of the memory of instances that leads directly to an exponential slow down of the computational time. In this paper, we propose a new Speedy IBL implementation that innovates the mathematics of vectorization and parallel computation over the traditional loop-based approach. Through the implementation of IBL models in many decision games of increasing complexity, we demonstrate the applicability of the regular IBL models and the advantages of their Speedy implementation. Decision games vary in their complexity of decision features and in the number of agents involved in the decision process. The results clearly illustrate that Speedy IBL addresses the curse of exponential growth of memory, reducing the computational time significantly, while maintaining the same level of performance than the traditional implementation of IBL models.