Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hisahiro Suganuma

MRL: Learning to Mix with Attention and Convolutions

Aug 30, 2022

Shlok Mohta, Hisahiro Suganuma, Yoshiki Tanaka

Figure 1 for MRL: Learning to Mix with Attention and Convolutions

Figure 2 for MRL: Learning to Mix with Attention and Convolutions

Figure 3 for MRL: Learning to Mix with Attention and Convolutions

Figure 4 for MRL: Learning to Mix with Attention and Convolutions

Abstract:In this paper, we present a new neural architectural block for the vision domain, named Mixing Regionally and Locally (MRL), developed with the aim of effectively and efficiently mixing the provided input features. We bifurcate the input feature mixing task as mixing at a regional and local scale. To achieve an efficient mix, we exploit the domain-wide receptive field provided by self-attention for regional-scale mixing and convolutional kernels restricted to local scale for local-scale mixing. More specifically, our proposed method mixes regional features associated with local features within a defined region, followed by a local-scale features mix augmented by regional features. Experiments show that this hybridization of self-attention and convolution brings improved capacity, generalization (right inductive bias), and efficiency. Under similar network settings, MRL outperforms or is at par with its counterparts in classification, object detection, and segmentation tasks. We also show that our MRL-based network architecture achieves state-of-the-art performance for H&E histology datasets. We achieved DICE of 0.843, 0.855, and 0.892 for Kumar, CoNSep, and CPM-17 datasets, respectively, while highlighting the versatility offered by the MRL framework by incorporating layers like group convolutions to improve dataset-specific generalization.

Via

Access Paper or Ask Questions

ImageNet/ResNet-50 Training in 224 Seconds

Nov 13, 2018

Hiroaki Mikami, Hisahiro Suganuma, Pongsakorn U-chupala, Yoshiki Tanaka, Yuichi Kageyama

Figure 1 for ImageNet/ResNet-50 Training in 224 Seconds

Figure 2 for ImageNet/ResNet-50 Training in 224 Seconds

Figure 3 for ImageNet/ResNet-50 Training in 224 Seconds

Figure 4 for ImageNet/ResNet-50 Training in 224 Seconds

Abstract:Scaling the distributed deep learning to a massive GPU cluster level is challenging due to the instability of the large mini-batch training and the overhead of the gradient synchronization. We address the instability of the large mini-batch training with batch size control. We address the overhead of the gradient synchronization with 2D-Torus all-reduce. Specifically, 2D-Torus all-reduce arranges GPUs in a logical 2D grid and performs a series of collective operation in different orientations. These two techniques are implemented with Neural Network Libraries (NNL). We have successfully trained ImageNet/ResNet-50 in 224 seconds without significant accuracy loss on ABCI cluster.

Via

Access Paper or Ask Questions