Abstract:Gradient-based methods are well-suited for derivative-free optimization (DFO), where finite-difference (FD) estimates are commonly used as gradient surrogates. Traditional stochastic approximation methods, such as Kiefer-Wolfowitz (KW) and simultaneous perturbation stochastic approximation (SPSA), typically utilize only two samples per iteration, resulting in imprecise gradient estimates and necessitating diminishing step sizes for convergence. In this paper, we first explore an efficient FD estimate, referred to as correlation-induced FD estimate, which is a batch-based estimate. Then, we propose an adaptive sampling strategy that dynamically determines the batch size at each iteration. By combining these two components, we develop an algorithm designed to enhance DFO in terms of both gradient estimation efficiency and sample efficiency. Furthermore, we establish the consistency of our proposed algorithm and demonstrate that, despite using a batch of samples per iteration, it achieves the same convergence rate as the KW and SPSA methods. Additionally, we propose a novel stochastic line search technique to adaptively tune the step size in practice. Finally, comprehensive numerical experiments confirm the superior empirical performance of the proposed algorithm.
Abstract:Estimating stochastic gradients is pivotal in fields like service systems within operations research. The classical method for this estimation is the finite difference approximation, which entails generating samples at perturbed inputs. Nonetheless, practical challenges persist in determining the perturbation and obtaining an optimal finite difference estimator in the sense of possessing the smallest mean squared error (MSE). To tackle this problem, we propose a double sample-recycling approach in this paper. Firstly, pilot samples are recycled to estimate the optimal perturbation. Secondly, recycling these pilot samples again and generating new samples at the estimated perturbation, lead to an efficient finite difference estimator. We analyze its bias, variance and MSE. Our analyses demonstrate a reduction in asymptotic variance, and in some cases, a decrease in asymptotic bias, compared to the optimal finite difference estimator. Therefore, our proposed estimator consistently coincides with, or even outperforms the optimal finite difference estimator. In numerical experiments, we apply the estimator in several examples, and numerical results demonstrate its robustness, as well as coincidence with the theory presented, especially in the case of small sample sizes.