With the continuous development of machine learning technology, major e-commerce platforms have launched recommendation systems based on it to serve a large number of customers with different needs more efficiently. Compared with traditional supervised learning, reinforcement learning can better capture the user's state transition in the decision-making process, and consider a series of user actions, not just the static characteristics of the user at a certain moment. In theory, it will have a long-term perspective, producing a more effective recommendation. The special requirements of reinforcement learning for data make it need to rely on an offline virtual system for training. Our project mainly establishes a virtual user environment for offline training. At the same time, we tried to improve a reinforcement learning algorithm based on bi-clustering to expand the action space and recommended path space of the recommendation agent.