Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lepeng Zhang

Multi-State TD Target for Model-Free Reinforcement Learning

May 26, 2024

Wuhao Wang, Zhiyong Chen, Lepeng Zhang

Figure 1 for Multi-State TD Target for Model-Free Reinforcement Learning

Figure 2 for Multi-State TD Target for Model-Free Reinforcement Learning

Figure 3 for Multi-State TD Target for Model-Free Reinforcement Learning

Figure 4 for Multi-State TD Target for Model-Free Reinforcement Learning

Abstract:Temporal difference (TD) learning is a fundamental technique in reinforcement learning that updates value estimates for states or state-action pairs using a TD target. This target represents an improved estimate of the true value by incorporating both immediate rewards and the estimated value of subsequent states. Traditionally, TD learning relies on the value of a single subsequent state. We propose an enhanced multi-state TD (MSTD) target that utilizes the estimated values of multiple subsequent states. Building on this new MSTD concept, we develop complete actor-critic algorithms that include management of replay buffers in two modes, and integrate with deep deterministic policy optimization (DDPG) and soft actor-critic (SAC). Experimental results demonstrate that algorithms employing the MSTD target significantly improve learning performance compared to traditional methods.

* 6 pages, 16 figures

Via

Access Paper or Ask Questions