Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shuo Wu

Observational Learning with a Budget

Apr 28, 2025

Shuo Wu, Pawan Poojary, Randall Berry

Abstract:We consider a model of Bayesian observational learning in which a sequence of agents receives a private signal about an underlying binary state of the world. Each agent makes a decision based on its own signal and its observations of previous agents. A central planner seeks to improve the accuracy of these signals by allocating a limited budget to enhance signal quality across agents. We formulate and analyze the budget allocation problem and propose two optimal allocation strategies. At least one of these strategies is shown to maximize the probability of achieving a correct information cascade.

* Submitted to ISIT 2025 Conference, 11 pages, 4 figures

Via

Access Paper or Ask Questions

When to Trust Your Data: Enhancing Dyna-Style Model-Based Reinforcement Learning With Data Filter

Oct 16, 2024

Yansong Li, Zeyu Dong, Ertai Luo, Yu Wu, Shuo Wu, Shuo Han

Figure 1 for When to Trust Your Data: Enhancing Dyna-Style Model-Based Reinforcement Learning With Data Filter

Figure 2 for When to Trust Your Data: Enhancing Dyna-Style Model-Based Reinforcement Learning With Data Filter

Figure 3 for When to Trust Your Data: Enhancing Dyna-Style Model-Based Reinforcement Learning With Data Filter

Figure 4 for When to Trust Your Data: Enhancing Dyna-Style Model-Based Reinforcement Learning With Data Filter

Abstract:Reinforcement learning (RL) algorithms can be divided into two classes: model-free algorithms, which are sample-inefficient, and model-based algorithms, which suffer from model bias. Dyna-style algorithms combine these two approaches by using simulated data from an estimated environmental model to accelerate model-free training. However, their efficiency is compromised when the estimated model is inaccurate. Previous works address this issue by using model ensembles or pretraining the estimated model with data collected from the real environment, increasing computational and sample complexity. To tackle this issue, we introduce an out-of-distribution (OOD) data filter that removes simulated data from the estimated model that significantly diverges from data collected in the real environment. We show theoretically that this technique enhances the quality of simulated data. With the help of the OOD data filter, the data simulated from the estimated model better mimics the data collected by interacting with the real model. This improvement is evident in the critic updates compared to using the simulated data without the OOD data filter. Our experiment integrates the data filter into the model-based policy optimization (MBPO) algorithm. The results demonstrate that our method requires fewer interactions with the real environment to achieve a higher level of optimality than MBPO, even without a model ensemble.

Via

Access Paper or Ask Questions

Robust Reward Design for Markov Decision Processes

Jun 07, 2024

Shuo Wu, Haoxiang Ma, Jie Fu, Shuo Han

Figure 1 for Robust Reward Design for Markov Decision Processes

Figure 2 for Robust Reward Design for Markov Decision Processes

Figure 3 for Robust Reward Design for Markov Decision Processes

Figure 4 for Robust Reward Design for Markov Decision Processes

Abstract:The problem of reward design examines the interaction between a leader and a follower, where the leader aims to shape the follower's behavior to maximize the leader's payoff by modifying the follower's reward function. Current approaches to reward design rely on an accurate model of how the follower responds to reward modifications, which can be sensitive to modeling inaccuracies. To address this issue of sensitivity, we present a solution that offers robustness against uncertainties in modeling the follower, including 1) how the follower breaks ties in the presence of nonunique best responses, 2) inexact knowledge of how the follower perceives reward modifications, and 3) bounded rationality of the follower. Our robust solution is guaranteed to exist under mild conditions and can be obtained numerically by solving a mixed-integer linear program. Numerical experiments on multiple test cases demonstrate that our solution improves robustness compared to the standard approach without incurring significant additional computing costs.

* 50 pages, 8 figures

Via

Access Paper or Ask Questions

Oxygen vacancies modulated VO2 for neurons and Spiking Neural Network construction

Apr 16, 2024

Liang Li, Ting Zhou, Tong Liu, Zhiwei Liu, Yaping Li, Shuo Wu, Shanguang Zhao, Jinglin Zhu, Meiling Liu, Zhihan Lin(+4 more)

Figure 1 for Oxygen vacancies modulated VO2 for neurons and Spiking Neural Network construction

Figure 2 for Oxygen vacancies modulated VO2 for neurons and Spiking Neural Network construction

Figure 3 for Oxygen vacancies modulated VO2 for neurons and Spiking Neural Network construction

Figure 4 for Oxygen vacancies modulated VO2 for neurons and Spiking Neural Network construction

Abstract:Artificial neuronal devices are the basic building blocks for neuromorphic computing systems, which have been motivated by realistic brain emulation. Aiming for these applications, various device concepts have been proposed to mimic the neuronal dynamics and functions. While till now, the artificial neuron devices with high efficiency, high stability and low power consumption are still far from practical application. Due to the special insulator-metal phase transition, Vanadium Dioxide (VO2) has been considered as an idea candidate for neuronal device fabrication. However, its intrinsic insulating state requires the VO2 neuronal device to be driven under large bias voltage, resulting in high power consumption and low frequency. Thus in the current study, we have addressed this challenge by preparing oxygen vacancies modulated VO2 film(VO2-x) and fabricating the VO2-x neuronal devices for Spiking Neural Networks (SNNs) construction. Results indicate the neuron devices can be operated under lower voltage with improved processing speed. The proposed VO2-x based back-propagation SNNs (BP-SNNs) system, trained with the MNIST dataset, demonstrates excellent accuracy in image recognition. Our study not only demonstrates the VO2-x based neurons and SNN system for practical application, but also offers an effective way to optimize the future neuromorphic computing systems by defect engineering strategy.

* 18 pages,4 figures

Via

Access Paper or Ask Questions

ResAtom System: Protein and Ligand Affinity Prediction Model Based on Deep Learning

Apr 17, 2021

Yeji Wang, Shuo Wu, Yanwen Duan, Yong Huang

Figure 1 for ResAtom System: Protein and Ligand Affinity Prediction Model Based on Deep Learning

Figure 2 for ResAtom System: Protein and Ligand Affinity Prediction Model Based on Deep Learning

Figure 3 for ResAtom System: Protein and Ligand Affinity Prediction Model Based on Deep Learning

Figure 4 for ResAtom System: Protein and Ligand Affinity Prediction Model Based on Deep Learning

Abstract:Motivation: Protein-ligand affinity prediction is an important part of structure-based drug design. It includes molecular docking and affinity prediction. Although molecular dynamics can predict affinity with high accuracy at present, it is not suitable for large-scale virtual screening. The existing affinity prediction and evaluation functions based on deep learning mostly rely on experimentally-determined conformations. Results: We build a predictive model of protein-ligand affinity through the ResNet neural network with added attention mechanism. The resulting ResAtom-Score model achieves Pearson's correlation coefficient R = 0.833 on the CASF-2016 benchmark test set. At the same time, we evaluated the performance of a variety of existing scoring functions in combination with ResAtom-Score in the absence of experimentally-determined conformations. The results show that the use of {\Delta}VinaRF20 in combination with ResAtom-Score can achieve affinity prediction close to scoring functions in the presence of experimentally-determined conformations. These results suggest that ResAtom system may be used for in silico screening of small molecule ligands with target proteins in the future. Availability: https://github.com/wyji001/ResAtom

* 28 pages, 7 figures, 2 tables

Via

Access Paper or Ask Questions