Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shanchao Yang

Diverse Policy Optimization for Structured Action Space

Feb 23, 2023

Wenhao Li, Baoxiang Wang, Shanchao Yang, Hongyuan Zha

Abstract:Enhancing the diversity of policies is beneficial for robustness, exploration, and transfer in reinforcement learning (RL). In this paper, we aim to seek diverse policies in an under-explored setting, namely RL tasks with structured action spaces with the two properties of composability and local dependencies. The complex action structure, non-uniform reward landscape, and subtle hyperparameter tuning due to the properties of structured actions prevent existing approaches from scaling well. We propose a simple and effective RL method, Diverse Policy Optimization (DPO), to model the policies in structured action space as the energy-based models (EBM) by following the probabilistic RL framework. A recently proposed novel and powerful generative model, GFlowNet, is introduced as the efficient, diverse EBM-based policy sampler. DPO follows a joint optimization framework: the outer layer uses the diverse policies sampled by the GFlowNet to update the EBM-based policies, which supports the GFlowNet training in the inner layer. Experiments on ATSC and Battle benchmarks demonstrate that DPO can efficiently discover surprisingly diverse policies in challenging scenarios and substantially outperform existing state-of-the-art methods.

* 18 pages, 16 figures, AAMAS 2023 camera ready

Via

Access Paper or Ask Questions

Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient

Oct 18, 2021

Shanchao Yang, Kaili Ma, Baoxiang Wang, Hongyuan Zha

Figure 1 for Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient

Figure 2 for Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient

Figure 3 for Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient

Figure 4 for Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient

Abstract:Improving the resilience of a network protects the system from natural disasters and malicious attacks. This is typically achieved by introducing new edges, which however may reach beyond the maximum number of connections a node could sustain. Many studies then resort to the degree-preserving operation of rewiring, which swaps existing edges $AC, BD$ to new edges $AB, CD$. A significant line of studies focuses on this technique for theoretical and practical results while leaving three limitations: network utility loss, local optimality, and transductivity. In this paper, we propose ResiNet, a reinforcement learning (RL)-based framework to discover resilient network topologies against various disasters and attacks. ResiNet is objective agnostic which allows the utility to be balanced by incorporating it into the objective function. The local optimality, typically seen in greedy algorithms, is addressed by casting the cumulative resilience gain into a sequential decision process of step-wise rewiring. The transductivity, which refers to the necessity to run a computationally intensive optimization for each input graph, is lifted by our variant of RL with auto-regressive permutation-invariant variable action space. ResiNet is armed by our technical innovation, Filtration enhanced GNN (FireGNN), which distinguishes graphs with minor differences. It is thus possible for ResiNet to capture local structure changes and adapt its decision among consecutive graphs, which is known to be infeasible for GNN. Extensive experiments demonstrate that with a small number of rewiring operations, ResiNet achieves a near-optimal resilience gain on multiple graphs while balancing the utility, with a large margin compared to existing approaches.

Via

Access Paper or Ask Questions

Learn to Generate Time Series Conditioned Graphs with Generative Adversarial Nets

Mar 03, 2020

Shanchao Yang, Jing Liu, Kai Wu, Mingming Li

Figure 1 for Learn to Generate Time Series Conditioned Graphs with Generative Adversarial Nets

Figure 2 for Learn to Generate Time Series Conditioned Graphs with Generative Adversarial Nets

Figure 3 for Learn to Generate Time Series Conditioned Graphs with Generative Adversarial Nets

Figure 4 for Learn to Generate Time Series Conditioned Graphs with Generative Adversarial Nets

Abstract:Deep learning based approaches have been utilized to model and generate graphs subjected to different distributions recently. However, they are typically unsupervised learning based and unconditioned generative models or simply conditioned on the graph-level contexts, which are not associated with rich semantic node-level contexts. Differently, in this paper, we are interested in a novel problem named Time Series Conditioned Graph Generation: given an input multivariate time series, we aim to infer a target relation graph modeling the underlying interrelationships between time series with each node corresponding to each time series. For example, we can study the interrelationships between genes in a gene regulatory network of a certain disease conditioned on their gene expression data recorded as time series. To achieve this, we propose a novel Time Series conditioned Graph Generation-Generative Adversarial Networks (TSGG-GAN) to handle challenges of rich node-level context structures conditioning and measuring similarities directly between graphs and time series. Extensive experiments on synthetic and real-word gene regulatory networks datasets demonstrate the effectiveness and generalizability of the proposed TSGG-GAN.

Via

Access Paper or Ask Questions