Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael Dann

Harnessing Network Effect for Fake News Mitigation: Selecting Debunkers via Self-Imitation Learning

Jan 28, 2024

Xiaofei Xu, Ke Deng, Michael Dann, Xiuzhen Zhang

Abstract:This study aims to minimize the influence of fake news on social networks by deploying debunkers to propagate true news. This is framed as a reinforcement learning problem, where, at each stage, one user is selected to propagate true news. A challenging issue is episodic reward where the "net" effect of selecting individual debunkers cannot be discerned from the interleaving information propagation on social networks, and only the collective effect from mitigation efforts can be observed. Existing Self-Imitation Learning (SIL) methods have shown promise in learning from episodic rewards, but are ill-suited to the real-world application of fake news mitigation because of their poor sample efficiency. To learn a more effective debunker selection policy for fake news mitigation, this study proposes NAGASIL - Negative sampling and state Augmented Generative Adversarial Self-Imitation Learning, which consists of two improvements geared towards fake news mitigation: learning from negative samples, and an augmented state representation to capture the "real" environment state by integrating the current observed state with the previous state-action pairs from the same campaign. Experiments on two social networks show that NAGASIL yields superior performance to standard GASIL and state-of-the-art fake news mitigation models.

* 10 pages, full version of this paper is accepted by AAAI'24

Via

Access Paper or Ask Questions

SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement Learning

Mar 09, 2022

Andrew Chester, Michael Dann, Fabio Zambetta, John Thangarajah

Figure 1 for SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement Learning

Figure 2 for SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement Learning

Figure 3 for SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement Learning

Figure 4 for SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement Learning

Abstract:Model-based reinforcement learning algorithms are typically more sample efficient than their model-free counterparts, especially in sparse reward problems. Unfortunately, many interesting domains are too complex to specify the complete models required by traditional model-based approaches. Learning a model takes a large number of environment samples, and may not capture critical information if the environment is hard to explore. If we could specify an incomplete model and allow the agent to learn how best to use it, we could take advantage of our partial understanding of many domains. Existing hybrid planning and learning systems which address this problem often impose highly restrictive assumptions on the sorts of models which can be used, limiting their applicability to a wide range of domains. In this work we propose SAGE, an algorithm combining learning and planning to exploit a previously unusable class of incomplete models. This combines the strengths of symbolic planning and neural learning approaches in a novel way that outperforms competing methods on variations of taxi world and Minecraft.

* 11 pages, 8 figures, 3 tables

Via

Access Paper or Ask Questions

Adapting to Reward Progressivity via Spectral Reinforcement Learning

Apr 29, 2021

Michael Dann, John Thangarajah

Figure 1 for Adapting to Reward Progressivity via Spectral Reinforcement Learning

Figure 2 for Adapting to Reward Progressivity via Spectral Reinforcement Learning

Figure 3 for Adapting to Reward Progressivity via Spectral Reinforcement Learning

Figure 4 for Adapting to Reward Progressivity via Spectral Reinforcement Learning

Abstract:In this paper we consider reinforcement learning tasks with progressive rewards; that is, tasks where the rewards tend to increase in magnitude over time. We hypothesise that this property may be problematic for value-based deep reinforcement learning agents, particularly if the agent must first succeed in relatively unrewarding regions of the task in order to reach more rewarding regions. To address this issue, we propose Spectral DQN, which decomposes the reward into frequencies such that the high frequencies only activate when large rewards are found. This allows the training loss to be balanced so that it gives more even weighting across small and large reward regions. In two domains with extreme reward progressivity, where standard value-based methods struggle significantly, Spectral DQN is able to make much farther progress. Moreover, when evaluated on a set of six standard Atari games that do not overtly favour the approach, Spectral DQN remains more than competitive: While it underperforms one of the benchmarks in a single game, it comfortably surpasses the benchmarks in three games. These results demonstrate that the approach is not overfit to its target problem, and suggest that Spectral DQN may have advantages beyond addressing reward progressivity.

* 16 pages, 8 figures, 3 tables, accepted as a conference paper at ICLR 2021

Via

Access Paper or Ask Questions