Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Enlu Zhou

Ranking and Selection with Simultaneous Input Data Collection

Mar 14, 2025

Yuhao Wang, Enlu Zhou

Abstract:In this paper, we propose a general and novel formulation of ranking and selection with the existence of streaming input data. The collection of multiple streams of such data may consume different types of resources, and hence can be conducted simultaneously. To utilize the streaming input data, we aggregate simulation outputs generated under heterogeneous input distributions over time to form a performance estimator. By characterizing the asymptotic behavior of the performance estimators, we formulate two optimization problems to optimally allocate budgets for collecting input data and running simulations. We then develop a multi-stage simultaneous budget allocation procedure and provide its statistical guarantees such as consistency and asymptotic normality. We conduct several numerical studies to demonstrate the competitive performance of the proposed procedure.

Via

Access Paper or Ask Questions

Reusing Historical Trajectories in Natural Policy Gradient via Importance Sampling: Convergence and Convergence Rate

Mar 01, 2024

Yifan Lin, Yuhao Wang, Enlu Zhou

Figure 1 for Reusing Historical Trajectories in Natural Policy Gradient via Importance Sampling: Convergence and Convergence Rate

Figure 2 for Reusing Historical Trajectories in Natural Policy Gradient via Importance Sampling: Convergence and Convergence Rate

Figure 3 for Reusing Historical Trajectories in Natural Policy Gradient via Importance Sampling: Convergence and Convergence Rate

Figure 4 for Reusing Historical Trajectories in Natural Policy Gradient via Importance Sampling: Convergence and Convergence Rate

Abstract:Reinforcement learning provides a mathematical framework for learning-based control, whose success largely depends on the amount of data it can utilize. The efficient utilization of historical trajectories obtained from previous policies is essential for expediting policy optimization. Empirical evidence has shown that policy gradient methods based on importance sampling work well. However, existing literature often neglect the interdependence between trajectories from different iterations, and the good empirical performance lacks a rigorous theoretical justification. In this paper, we study a variant of the natural policy gradient method with reusing historical trajectories via importance sampling. We show that the bias of the proposed estimator of the gradient is asymptotically negligible, the resultant algorithm is convergent, and reusing past trajectories helps improve the convergence rate. We further apply the proposed estimator to popular policy optimization algorithms such as trust region policy optimization. Our theoretical results are verified on classical benchmarks.

Via

Access Paper or Ask Questions

Bayesian Risk-Averse Q-Learning with Streaming Observations

May 18, 2023

Yuhao Wang, Enlu Zhou

Abstract:We consider a robust reinforcement learning problem, where a learning agent learns from a simulated training environment. To account for the model mis-specification between this training environment and the real environment due to lack of data, we adopt a formulation of Bayesian risk MDP (BRMDP) with infinite horizon, which uses Bayesian posterior to estimate the transition model and impose a risk functional to account for the model uncertainty. Observations from the real environment that is out of the agent's control arrive periodically and are utilized by the agent to update the Bayesian posterior to reduce model uncertainty. We theoretically demonstrate that BRMDP balances the trade-off between robustness and conservativeness, and we further develop a multi-stage Bayesian risk-averse Q-learning algorithm to solve BRMDP with streaming observations from real environment. The proposed algorithm learns a risk-averse yet optimal policy that depends on the availability of real-world observations. We provide a theoretical guarantee of strong convergence for the proposed algorithm.

Via

Access Paper or Ask Questions

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

Jun 24, 2022

Yifan Lin, Yuhao Wang, Enlu Zhou

Figure 1 for Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

Figure 2 for Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

Abstract:In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the corresponding reward. In particular, we consider mean-variance as the risk criterion, and the best arm is the one with the largest mean-variance reward. We apply the Thompson Sampling algorithm for the disjoint model, and provide a comprehensive regret analysis for a variant of the proposed algorithm. For $T$ rounds, $K$ actions, and $d$-dimensional feature vectors, we prove a regret bound of $O((1+\rho+\frac{1}{\rho}) d\ln T \ln \frac{K}{\delta}\sqrt{d K T^{1+2\epsilon} \ln \frac{K}{\delta} \frac{1}{\epsilon}})$ that holds with probability $1-\delta$ under the mean-variance criterion with risk tolerance $\rho$, for any $0<\epsilon<\frac{1}{2}$, $0<\delta<1$. The empirical performance of our proposed algorithms is demonstrated via a portfolio selection problem.

Via

Access Paper or Ask Questions

Robust Multi-Objective Bayesian Optimization Under Input Noise

Feb 16, 2022

Samuel Daulton, Sait Cakmak, Maximilian Balandat, Michael A. Osborne, Enlu Zhou, Eytan Bakshy

Figure 1 for Robust Multi-Objective Bayesian Optimization Under Input Noise

Figure 2 for Robust Multi-Objective Bayesian Optimization Under Input Noise

Figure 3 for Robust Multi-Objective Bayesian Optimization Under Input Noise

Figure 4 for Robust Multi-Objective Bayesian Optimization Under Input Noise

Abstract:Bayesian optimization (BO) is a sample-efficient approach for tuning design parameters to optimize expensive-to-evaluate, black-box performance metrics. In many manufacturing processes, the design parameters are subject to random input noise, resulting in a product that is often less performant than expected. Although BO methods have been proposed for optimizing a single objective under input noise, no existing method addresses the practical scenario where there are multiple objectives that are sensitive to input perturbations. In this work, we propose the first multi-objective BO method that is robust to input noise. We formalize our goal as optimizing the multivariate value-at-risk (MVaR), a risk measure of the uncertain objectives. Since directly optimizing MVaR is computationally infeasible in many settings, we propose a scalable, theoretically-grounded approach for optimizing MVaR using random scalarizations. Empirically, we find that our approach significantly outperforms alternative methods and efficiently identifies optimal robust designs that will satisfy specifications across multiple metrics with high probability.

* 41 pages. Code is available at https://github.com/facebookresearch/robust_mobo

Via

Access Paper or Ask Questions

Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably

Feb 07, 2022

Tianyi Liu, Yan Li, Enlu Zhou, Tuo Zhao

Figure 1 for Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably

Abstract:We investigate the role of noise in optimization algorithms for learning over-parameterized models. Specifically, we consider the recovery of a rank one matrix $Y^*\in R^{d\times d}$ from a noisy observation $Y$ using an over-parameterization model. We parameterize the rank one matrix $Y^*$ by $XX^\top$, where $X\in R^{d\times d}$. We then show that under mild conditions, the estimator, obtained by the randomly perturbed gradient descent algorithm using the square loss function, attains a mean square error of $O(\sigma^2/d)$, where $\sigma^2$ is the variance of the observational noise. In contrast, the estimator obtained by gradient descent without random perturbation only attains a mean square error of $O(\sigma^2)$. Our result partially justifies the implicit regularization effect of noise when learning over-parameterized models, and provides new understanding of training over-parameterized neural networks.

Via

Access Paper or Ask Questions

Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

Feb 24, 2021

Tianyi Liu, Yan Li, Song Wei, Enlu Zhou, Tuo Zhao

Figure 1 for Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

Figure 2 for Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

Figure 3 for Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

Figure 4 for Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

Abstract:Numerous empirical evidences have corroborated the importance of noise in nonconvex optimization problems. The theory behind such empirical observations, however, is still largely unknown. This paper studies this fundamental problem through investigating the nonconvex rectangular matrix factorization problem, which has infinitely many global minima due to rotation and scaling invariance. Hence, gradient descent (GD) can converge to any optimum, depending on the initialization. In contrast, we show that a perturbed form of GD with an arbitrary initialization converges to a global optimum that is uniquely determined by the injected noise. Our result implies that the noise imposes implicit bias towards certain optima. Numerical experiments are provided to support our theory.

Via

Access Paper or Ask Questions

Bayesian Optimization of Risk Measures

Jul 16, 2020

Sait Cakmak, Raul Astudillo, Peter Frazier, Enlu Zhou

Figure 1 for Bayesian Optimization of Risk Measures

Figure 2 for Bayesian Optimization of Risk Measures

Figure 3 for Bayesian Optimization of Risk Measures

Figure 4 for Bayesian Optimization of Risk Measures

Abstract:We consider Bayesian optimization of objective functions of the form $\rho[ F(x, W) ]$, where $F$ is a black-box expensive-to-evaluate function and $\rho$ denotes either the VaR or CVaR risk measure, computed with respect to the randomness induced by the environmental random variable $W$. Such problems arise in decision making under uncertainty, such as in portfolio optimization and robust systems design. We propose a family of novel Bayesian optimization algorithms that exploit the structure of the objective function to substantially improve sampling efficiency. Instead of modeling the objective function directly as is typical in Bayesian optimization, these algorithms model $F$ as a Gaussian process, and use the implied posterior on the objective function to decide which points to evaluate. We demonstrate the effectiveness of our approach in a variety of numerical experiments.

* The paper is 13 pages and includes 3 figures. The supplement is an additional 11 pages with 2 figures. The paper is currently under review for Neurips 2020. Updated formatting in v2

Via

Access Paper or Ask Questions

Towards Understanding the Importance of Shortcut Connections in Residual Networks

Sep 11, 2019

Tianyi Liu, Minshuo Chen, Mo Zhou, Simon S. Du, Enlu Zhou, Tuo Zhao

Figure 1 for Towards Understanding the Importance of Shortcut Connections in Residual Networks

Figure 2 for Towards Understanding the Importance of Shortcut Connections in Residual Networks

Figure 3 for Towards Understanding the Importance of Shortcut Connections in Residual Networks

Figure 4 for Towards Understanding the Importance of Shortcut Connections in Residual Networks

Abstract:Residual Network (ResNet) is undoubtedly a milestone in deep learning. ResNet is equipped with shortcut connections between layers, and exhibits efficient training using simple first order algorithms. Despite of the great empirical success, the reason behind is far from being well understood. In this paper, we study a two-layer non-overlapping convolutional ResNet. Training such a network requires solving a non-convex optimization problem with a spurious local optimum. We show, however, that gradient descent combined with proper normalization, avoids being trapped by the spurious local optimum, and converges to a global optimum in polynomial time, when the weight of the first layer is initialized at 0, and that of the second layer is initialized arbitrarily in a ball. Numerical experiments are provided to support our theory.

* Thirty-third Conference on Neural Information Processing Systems, 2019

Via

Access Paper or Ask Questions

Towards Understanding the Importance of Noise in Training Neural Networks

Sep 07, 2019

Mo Zhou, Tianyi Liu, Yan Li, Dachao Lin, Enlu Zhou, Tuo Zhao

Figure 1 for Towards Understanding the Importance of Noise in Training Neural Networks

Figure 2 for Towards Understanding the Importance of Noise in Training Neural Networks

Figure 3 for Towards Understanding the Importance of Noise in Training Neural Networks

Figure 4 for Towards Understanding the Importance of Noise in Training Neural Networks

Abstract:Numerous empirical evidence has corroborated that the noise plays a crucial rule in effective and efficient training of neural networks. The theory behind, however, is still largely unknown. This paper studies this fundamental problem through training a simple two-layer convolutional neural network model. Although training such a network requires solving a nonconvex optimization problem with a spurious local optimum and a global optimum, we prove that perturbed gradient descent and perturbed mini-batch stochastic gradient algorithms in conjunction with noise annealing is guaranteed to converge to a global optimum in polynomial time with arbitrary initialization. This implies that the noise enables the algorithm to efficiently escape from the spurious local optimum. Numerical experiments are provided to support our theory.

* International Conference on Machine Learning (ICML), 2019

Via

Access Paper or Ask Questions