Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yidong Zhang

A Deep Q-Network Based on Radial Basis Functions for Multi-Echelon Inventory Management

Jan 29, 2024

Liqiang Cheng, Jun Luo, Weiwei Fan, Yidong Zhang, Yuan Li

Abstract:This paper addresses a multi-echelon inventory management problem with a complex network topology where deriving optimal ordering decisions is difficult. Deep reinforcement learning (DRL) has recently shown potential in solving such problems, while designing the neural networks in DRL remains a challenge. In order to address this, a DRL model is developed whose Q-network is based on radial basis functions. The approach can be more easily constructed compared to classic DRL models based on neural networks, thus alleviating the computational burden of hyperparameter tuning. Through a series of simulation experiments, the superior performance of this approach is demonstrated compared to the simple base-stock policy, producing a better policy in the multi-echelon system and competitive performance in the serial system where the base-stock policy is optimal. In addition, the approach outperforms current DRL approaches.

Via

Access Paper or Ask Questions

Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning

Dec 05, 2019

Jiaxi Liu, Yidong Zhang, Xiaoqing Wang, Yuming Deng, Xingyu Wu

Figure 1 for Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning

Figure 2 for Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning

Figure 3 for Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning

Figure 4 for Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning

Abstract:In this paper we present an end-to-end framework for addressing the problem of dynamic pricing on E-commerce platform using methods based on deep reinforcement learning (DRL). By using four groups of different business data to represent the states of each time period, we model the dynamic pricing problem as a Markov Decision Process (MDP). Compared with the state-of-the-art DRL-based dynamic pricing algorithms, our approaches make the following three contributions. First, we extend the discrete set problem to the continuous price set. Second, instead of using revenue as the reward function directly, we define a new function named difference of revenue conversion rates (DRCR). Third, the cold-start problem of MDP is tackled by pre-training and evaluation using some carefully chosen historical sales data. Our approaches are evaluated by both offline evaluation method using real dataset of Alibaba Inc., and online field experiments on Tmall.com, a major online shopping website owned by Alibaba Inc.. In particular, experiment results suggest that DRCR is a more appropriate reward function than revenue, which is widely used by current literature. In the end, field experiments, which last for months on 1000 stock keeping units (SKUs) of products demonstrate that continuous price sets have better performance than discrete sets and show that our approaches significantly outperformed the manual pricing by operation experts.

* 9 pages, 7 figures

Via

Access Paper or Ask Questions

Context-Based Dynamic Pricing with Online Clustering

Feb 17, 2019

Sentao Miao, Xi Chen, Xiuli Chao, Jiaxi Liu, Yidong Zhang

Figure 1 for Context-Based Dynamic Pricing with Online Clustering

Figure 2 for Context-Based Dynamic Pricing with Online Clustering

Figure 3 for Context-Based Dynamic Pricing with Online Clustering

Figure 4 for Context-Based Dynamic Pricing with Online Clustering

Abstract:We consider a context-based dynamic pricing problem of online products which have low sales. Sales data from Alibaba, a major global online retailer, illustrate the prevalence of low-sale products. For these products, existing single-product dynamic pricing algorithms do not work well due to insufficient data samples. To address this challenge, we propose pricing policies that concurrently perform clustering over products and set individual pricing decisions on the fly. By clustering data and identifying products that have similar demand patterns, we utilize sales data from products within the same cluster to improve demand estimation and allow for better pricing decisions. We evaluate the algorithms using the regret, and the result shows that when product demand functions come from multiple clusters, our algorithms significantly outperform traditional single-product pricing policies. Numerical experiments using a real dataset from Alibaba demonstrate that the proposed policies, compared with several benchmark policies, increase the revenue. The results show that online clustering is an effective approach to tackling dynamic pricing problems associated with low-sale products. Our algorithms were further implemented in a field study at Alibaba with 40 products for 30 consecutive days, and compared to the products which use business-as-usual pricing policy of Alibaba. The results from the field experiment show that the overall revenue increased by 10.14%.

Via

Access Paper or Ask Questions