Dynamic Portfolio Management is a domain that concerns the continuous redistribution of assets within a portfolio to maximize the total return in a given period of time. With the recent advancement in machine learning and artificial intelligence, many efforts have been put in designing and discovering efficient algorithmic ways to manage the portfolio. This paper presents two different reinforcement learning agents, policy gradient actor-critic and evolution strategy. The performance of the two agents is compared during backtesting. We also discuss the problem set up from state space design, to state value function approximator and policy control design. We include the short position to give the agent more flexibility during assets redistribution and a constant trading cost of 0.25%. The agent is able to achieve 5% return in 10 days daily trading despite 0.25% trading cost.