Intelligent personal assistant systems for information-seeking conversations are increasingly popular in real-world applications, especially for e-commerce companies. With the development of research in such conversation systems, the pseudo-relevance feedback (PRF) has demonstrated its effectiveness in incorporating relevance signals from external documents. However, the existing studies are either based on heuristic rules or require heavy manual labeling. In this work, we treat the PRF selection as a learning task and proposed a reinforced learning based method that can be trained in an end-to-end manner without any human annotations. More specifically, we proposed a reinforced selector to extract useful PRF terms to enhance response candidates and a BERT based response ranker to rank the PRF-enhanced responses. The performance of the ranker serves as rewards to guide the selector to extract useful PRF terms, and thus boost the task performance. Extensive experiments on both standard benchmark and commercial datasets show the superiority of our reinforced PRF term selector compared with other potential soft or hard selection methods. Both qualitative case studies and quantitative analysis show that our model can not only select meaningful PRF terms to expand response candidates but also achieve the best results compared with all the baseline methods on a variety of evaluation metrics. We have also deployed our method on online production in an e-commerce company, which shows a significant improvement over the existing online ranking system.