Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jacob B. Schroder

Multilevel Initialization for Layer-Parallel Deep Neural Network Training

Dec 19, 2019

Eric C. Cyr, Stefanie Günther, Jacob B. Schroder

Figure 1 for Multilevel Initialization for Layer-Parallel Deep Neural Network Training

Figure 2 for Multilevel Initialization for Layer-Parallel Deep Neural Network Training

Figure 3 for Multilevel Initialization for Layer-Parallel Deep Neural Network Training

Figure 4 for Multilevel Initialization for Layer-Parallel Deep Neural Network Training

Abstract:This paper investigates multilevel initialization strategies for training very deep neural networks with a layer-parallel multigrid solver. The scheme is based on the continuous interpretation of the training problem as a problem of optimal control, in which neural networks are represented as discretizations of time-dependent ordinary differential equations. A key goal is to develop a method able to intelligently initialize the network parameters for the very deep networks enabled by scalable layer-parallel training. To do this, we apply a refinement strategy across the time domain, that is equivalent to refining in the layer dimension. The resulting refinements create deep networks, with good initializations for the network parameters coming from the coarser trained networks. We investigate the effectiveness of such multilevel "nested iteration" strategies for network training, showing supporting numerical evidence of reduced run time for equivalent accuracy. In addition, we study whether the initialization strategies provide a regularizing effect on the overall training process and reduce sensitivity to hyperparameters and randomness in initial network parameters.

Via

Access Paper or Ask Questions

Parallelizing Over Artificial Neural Network Training Runs with Multigrid

Oct 01, 2017

Jacob B. Schroder

Figure 1 for Parallelizing Over Artificial Neural Network Training Runs with Multigrid

Figure 2 for Parallelizing Over Artificial Neural Network Training Runs with Multigrid

Figure 3 for Parallelizing Over Artificial Neural Network Training Runs with Multigrid

Figure 4 for Parallelizing Over Artificial Neural Network Training Runs with Multigrid

Abstract:Artificial neural networks are a popular and effective machine learning technique. Great progress has been made parallelizing the expensive training phase of an individual network, leading to highly specialized pieces of hardware, many based on GPU-type architectures, and more concurrent algorithms such as synthetic gradients. However, the training phase continues to be a bottleneck, where the training data must be processed serially over thousands of individual training runs. This work considers a multigrid reduction in time (MGRIT) algorithm that is able to parallelize over the thousands of training runs and converge to the exact same solution as traditional training would provide. MGRIT was originally developed to provide parallelism for time evolution problems that serially step through a finite number of time-steps. This work recasts the training of a neural network similarly, treating neural network training as an evolution equation that evolves the network weights from one step to the next. Thus, this work concerns distributed computing approaches for neural networks, but is distinct from other approaches which seek to parallelize only over individual training runs. The work concludes with supporting numerical results for two model problems.

* Version 2: - Added more complete references to basic neural network literature - Corrected typos - Condensed results in Section 3 to be more concise - 22 pages

Via

Access Paper or Ask Questions