Picture for Wenjia Meng

Wenjia Meng

Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline

Add code
May 04, 2024
Viaarxiv icon

Qualitative Measurements of Policy Discrepancy for Return-based Deep Q-Network

Add code
Jul 08, 2018
Figure 1 for Qualitative Measurements of Policy Discrepancy for Return-based Deep Q-Network
Figure 2 for Qualitative Measurements of Policy Discrepancy for Return-based Deep Q-Network
Figure 3 for Qualitative Measurements of Policy Discrepancy for Return-based Deep Q-Network
Figure 4 for Qualitative Measurements of Policy Discrepancy for Return-based Deep Q-Network
Viaarxiv icon

A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning

Add code
Feb 09, 2018
Figure 1 for A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning
Figure 2 for A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning
Figure 3 for A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning
Figure 4 for A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning
Viaarxiv icon

Two-Bit Networks for Deep Learning on Resource-Constrained Embedded Devices

Add code
Jan 04, 2017
Figure 1 for Two-Bit Networks for Deep Learning on Resource-Constrained Embedded Devices
Figure 2 for Two-Bit Networks for Deep Learning on Resource-Constrained Embedded Devices
Viaarxiv icon