Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andy Su

PoBRL: Optimizing Multi-Document Summarization by Blending Reinforcement Learning Policies

May 18, 2021

Andy Su, Difei Su, John M. Mulvey, H. Vincent Poor

Figure 1 for PoBRL: Optimizing Multi-Document Summarization by Blending Reinforcement Learning Policies

Figure 2 for PoBRL: Optimizing Multi-Document Summarization by Blending Reinforcement Learning Policies

Figure 3 for PoBRL: Optimizing Multi-Document Summarization by Blending Reinforcement Learning Policies

Figure 4 for PoBRL: Optimizing Multi-Document Summarization by Blending Reinforcement Learning Policies

Abstract:We propose a novel reinforcement learning based framework PoBRL for solving multi-document summarization. PoBRL jointly optimizes over the following three objectives necessary for a high-quality summary: importance, relevance, and length. Our strategy decouples this multi-objective optimization into different subproblems that can be solved individually by reinforcement learning. Utilizing PoBRL, we then blend each learned policies together to produce a summary that is a concise and complete representation of the original input. Our empirical analysis shows state-of-the-art performance on several multi-document datasets. Human evaluation also shows that our method produces high-quality output.

Via

Access Paper or Ask Questions

ConQUR: Mitigating Delusional Bias in Deep Q-learning

Feb 27, 2020

Andy Su, Jayden Ooi, Tyler Lu, Dale Schuurmans, Craig Boutilier

Figure 1 for ConQUR: Mitigating Delusional Bias in Deep Q-learning

Figure 2 for ConQUR: Mitigating Delusional Bias in Deep Q-learning

Figure 3 for ConQUR: Mitigating Delusional Bias in Deep Q-learning

Figure 4 for ConQUR: Mitigating Delusional Bias in Deep Q-learning

Abstract:Delusional bias is a fundamental source of error in approximate Q-learning. To date, the only techniques that explicitly address delusion require comprehensive search using tabular value estimates. In this paper, we develop efficient methods to mitigate delusional bias by training Q-approximators with labels that are "consistent" with the underlying greedy policy class. We introduce a simple penalization scheme that encourages Q-labels used across training batches to remain (jointly) consistent with the expressible policy class. We also propose a search framework that allows multiple Q-approximators to be generated and tracked, thus mitigating the effect of premature (implicit) policy commitments. Experimental results demonstrate that these methods can improve the performance of Q-learning in a variety of Atari games, sometimes dramatically.

Via

Access Paper or Ask Questions