Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Antoine Labatie

Making EfficientNet More Efficient: Exploring Batch-Independent Normalization, Group Convolutions and Reduced Resolution Training

Jun 15, 2021

Dominic Masters, Antoine Labatie, Zach Eaton-Rosen, Carlo Luschi

Figure 1 for Making EfficientNet More Efficient: Exploring Batch-Independent Normalization, Group Convolutions and Reduced Resolution Training

Figure 2 for Making EfficientNet More Efficient: Exploring Batch-Independent Normalization, Group Convolutions and Reduced Resolution Training

Figure 3 for Making EfficientNet More Efficient: Exploring Batch-Independent Normalization, Group Convolutions and Reduced Resolution Training

Figure 4 for Making EfficientNet More Efficient: Exploring Batch-Independent Normalization, Group Convolutions and Reduced Resolution Training

Abstract:Much recent research has been dedicated to improving the efficiency of training and inference for image classification. This effort has commonly focused on explicitly improving theoretical efficiency, often measured as ImageNet validation accuracy per FLOP. These theoretical savings have, however, proven challenging to achieve in practice, particularly on high-performance training accelerators. In this work, we focus on improving the practical efficiency of the state-of-the-art EfficientNet models on a new class of accelerator, the Graphcore IPU. We do this by extending this family of models in the following ways: (i) generalising depthwise convolutions to group convolutions; (ii) adding proxy-normalized activations to match batch normalization performance with batch-independent statistics; (iii) reducing compute by lowering the training resolution and inexpensively fine-tuning at higher resolution. We find that these three methods improve the practical efficiency for both training and inference. Our code will be made available online.

Via

Access Paper or Ask Questions

Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence

Jun 11, 2021

Antoine Labatie, Dominic Masters, Zach Eaton-Rosen, Carlo Luschi

Figure 1 for Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence

Figure 2 for Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence

Figure 3 for Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence

Figure 4 for Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence

Abstract:We investigate the reasons for the performance degradation incurred with batch-independent normalization. We find that the prototypical techniques of layer normalization and instance normalization both induce the appearance of failure modes in the neural network's pre-activations: (i) layer normalization induces a collapse towards channel-wise constant functions; (ii) instance normalization induces a lack of variability in instance statistics, symptomatic of an alteration of the expressivity. To alleviate failure mode (i) without aggravating failure mode (ii), we introduce the technique "Proxy Normalization" that normalizes post-activations using a proxy distribution. When combined with layer normalization or group normalization, this batch-independent normalization emulates batch normalization's behavior and consistently matches or exceeds its performance.

Via

Access Paper or Ask Questions

Characterizing Well-behaved vs. Pathological Deep Neural Network Architectures

Nov 07, 2018

Antoine Labatie

Figure 1 for Characterizing Well-behaved vs. Pathological Deep Neural Network Architectures

Figure 2 for Characterizing Well-behaved vs. Pathological Deep Neural Network Architectures

Figure 3 for Characterizing Well-behaved vs. Pathological Deep Neural Network Architectures

Figure 4 for Characterizing Well-behaved vs. Pathological Deep Neural Network Architectures

Abstract:We introduce a principled approach, requiring only mild assumptions, for the characterization of deep neural networks at initialization. Our approach applies both to fully-connected and convolutional networks and incorporates the commonly used techniques of batch normalization and skip-connections. Our key insight is to consider the evolution with depth of statistical moments of signal and sensitivity, thereby characterizing the well-behaved or pathological behaviour of input-output mappings encoded by different choices of architecture. We establish: (i) for feedforward networks with and without batch normalization, depth multiplicativity inevitably leads to ill-behaved moments and distributional pathologies; (ii) for residual networks, on the other hand, the mechanism of identity skip-connection induces power-law rather than exponential behaviour, leading to well-behaved moments and no distributional pathology.

Via

Access Paper or Ask Questions