Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Avhishek Chatterjee

Learning Rate Optimization for Deep Neural Networks Using Lipschitz Bandits

Sep 15, 2024

Padma Priyanka, Sheetal Kalyani, Avhishek Chatterjee

Figure 1 for Learning Rate Optimization for Deep Neural Networks Using Lipschitz Bandits

Figure 2 for Learning Rate Optimization for Deep Neural Networks Using Lipschitz Bandits

Figure 3 for Learning Rate Optimization for Deep Neural Networks Using Lipschitz Bandits

Figure 4 for Learning Rate Optimization for Deep Neural Networks Using Lipschitz Bandits

Abstract:Learning rate is a crucial parameter in training of neural networks. A properly tuned learning rate leads to faster training and higher test accuracy. In this paper, we propose a Lipschitz bandit-driven approach for tuning the learning rate of neural networks. The proposed approach is compared with the popular HyperOpt technique used extensively for hyperparameter optimization and the recently developed bandit-based algorithm BLiE. The results for multiple neural network architectures indicate that our method finds a better learning rate using a) fewer evaluations and b) lesser number of epochs per evaluation, when compared to both HyperOpt and BLiE. Thus, the proposed approach enables more efficient training of neural networks, leading to lower training time and lesser computational cost.

Via

Access Paper or Ask Questions

Learning the Influence Graph of a High-Dimensional Markov Process with Memory

Jun 13, 2024

Smita Bagewadi, Avhishek Chatterjee

Abstract:Motivated by multiple applications in social networks, nervous systems, and financial risk analysis, we consider the problem of learning the underlying (directed) influence graph or causal graph of a high-dimensional multivariate discrete-time Markov process with memory. At any discrete time instant, each observed variable of the multivariate process is a binary string of random length, which is parameterized by an unobservable or hidden [0,1]-valued scalar. The hidden scalars corresponding to the variables evolve according to discrete-time linear stochastic dynamics dictated by the underlying influence graph whose nodes are the variables. We extend an existing algorithm for learning i.i.d. graphical models to this Markovian setting with memory and prove that it can learn the influence graph based on the binary observations using logarithmic (in number of variables or nodes) samples when the degree of the influence graph is bounded. The crucial analytical contribution of this work is the derivation of the sample complexity result by upper and lower bounding the rate of convergence of the observed Markov process with memory to its stationary distribution in terms of the parameters of the influence graph.

Via

Access Paper or Ask Questions