Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mingxi Tan

Policy Diversity for Cooperative Agents

Aug 28, 2023

Mingxi Tan, Andong Tian, Ludovic Denoyer

Abstract:Standard cooperative multi-agent reinforcement learning (MARL) methods aim to find the optimal team cooperative policy to complete a task. However there may exist multiple different ways of cooperating, which usually are very needed by domain experts. Therefore, identifying a set of significantly different policies can alleviate the task complexity for them. Unfortunately, there is a general lack of effective policy diversity approaches specifically designed for the multi-agent domain. In this work, we propose a method called Moment-Matching Policy Diversity to alleviate this problem. This method can generate different team policies to varying degrees by formalizing the difference between team policies as the difference in actions of selected agents in different policies. Theoretically, we show that our method is a simple way to implement a constrained optimization problem that regularizes the difference between two trajectory distributions by using the maximum mean discrepancy. The effectiveness of our approach is demonstrated on a challenging team-based shooter.

Via

Access Paper or Ask Questions

Regularized Contrastive Learning of Semantic Search

Sep 27, 2022

Mingxi Tan, Alexis Rolland, Andong Tian

Figure 1 for Regularized Contrastive Learning of Semantic Search

Figure 2 for Regularized Contrastive Learning of Semantic Search

Figure 3 for Regularized Contrastive Learning of Semantic Search

Figure 4 for Regularized Contrastive Learning of Semantic Search

Abstract:Semantic search is an important task which objective is to find the relevant index from a database for query. It requires a retrieval model that can properly learn the semantics of sentences. Transformer-based models are widely used as retrieval models due to their excellent ability to learn semantic representations. in the meantime, many regularization methods suitable for them have also been proposed. In this paper, we propose a new regularization method: Regularized Contrastive Learning, which can help transformer-based models to learn a better representation of sentences. It firstly augments several different semantic representations for every sentence, then take them into the contrastive objective as regulators. These contrastive regulators can overcome overfitting issues and alleviate the anisotropic problem. We firstly evaluate our approach on 7 semantic search benchmarks with the outperforming pre-trained model SRoBERTA. The results show that our method is more effective for learning a superior sentence representation. Then we evaluate our approach on 2 challenging FAQ datasets, Cough and Faqir, which have long query and index. The results of our experiments demonstrate that our method outperforms baseline methods.

* 13 pages, 3 figures, paper accepted by NLPCC 2022

Via

Access Paper or Ask Questions

Regularized Soft Actor-Critic for Behavior Transfer Learning

Sep 27, 2022

Mingxi Tan, Andong Tian, Ludovic Denoyer

Figure 1 for Regularized Soft Actor-Critic for Behavior Transfer Learning

Figure 2 for Regularized Soft Actor-Critic for Behavior Transfer Learning

Figure 3 for Regularized Soft Actor-Critic for Behavior Transfer Learning

Abstract:Existing imitation learning methods mainly focus on making an agent effectively mimic a demonstrated behavior, but do not address the potential contradiction between the behavior style and the objective of a task. There is a general lack of efficient methods that allow an agent to partially imitate a demonstrated behavior to varying degrees, while completing the main objective of a task. In this paper we propose a method called Regularized Soft Actor-Critic which formulates the main task and the imitation task under the Constrained Markov Decision Process framework (CMDP). The main task is defined as the maximum entropy objective used in Soft Actor-Critic (SAC) and the imitation task is defined as a constraint. We evaluate our method on continuous control tasks relevant to video games applications.

* 13 pages, 5 figures, paper accepted by IEEE CoG2022

Via

Access Paper or Ask Questions