Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information

Oct 01, 2023

Zhendong Shi, Xiaoli Wei, Ercan E. Kuruoglu

Figure 1 for From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information

Figure 2 for From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information

Figure 3 for From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information

Figure 4 for From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information

Share this with someone who'll enjoy it:

Abstract:The problem of how to take the right actions to make profits in sequential process continues to be difficult due to the quick dynamics and a significant amount of uncertainty in many application scenarios. In such complicated environments, reinforcement learning (RL), a reward-oriented strategy for optimum control, has emerged as a potential technique to address this strategic decision-making issue. However, reinforcement learning also has some shortcomings that make it unsuitable for solving many financial problems, excessive resource consumption, and inability to quickly obtain optimal solutions, making it unsuitable for quantitative trading markets. In this study, we use two methods to overcome the issue with contextual information: contextual Thompson sampling and reinforcement learning under supervision which can accelerate the iterations in search of the best answer. In order to investigate strategic trading in quantitative markets, we merged the earlier financial trading strategy known as constant proportion portfolio insurance (CPPI) into deep deterministic policy gradient (DDPG). The experimental results show that both methods can accelerate the progress of reinforcement learning to obtain the optimal solution.

View paper on

Share this with someone who'll enjoy it:

Title:From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information

Paper and Code