Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yunqiang Li

Differentiable Transportation Pruning

Jul 31, 2023

Yunqiang Li, Jan C. van Gemert, Torsten Hoefler, Bert Moons, Evangelos Eleftheriou, Bram-Ernst Verhoef

Abstract:Deep learning algorithms are increasingly employed at the edge. However, edge devices are resource constrained and thus require efficient deployment of deep neural networks. Pruning methods are a key tool for edge deployment as they can improve storage, compute, memory bandwidth, and energy usage. In this paper we propose a novel accurate pruning technique that allows precise control over the output network size. Our method uses an efficient optimal transportation scheme which we make end-to-end differentiable and which automatically tunes the exploration-exploitation behavior of the algorithm to find accurate sparse sub-networks. We show that our method achieves state-of-the-art performance compared to previous pruning methods on 3 different datasets, using 5 different models, across a wide range of pruning ratios, and with two types of sparsity budgets and pruning granularities.

* ICCV 2023

Via

Access Paper or Ask Questions

Understanding weight-magnitude hyperparameters in training binary networks

Mar 04, 2023

Joris Quist, Yunqiang Li, Jan van Gemert

Abstract:Binary Neural Networks (BNNs) are compact and efficient by using binary weights instead of real-valued weights. Current BNNs use latent real-valued weights during training, where several training hyper-parameters are inherited from real-valued networks. The interpretation of several of these hyperparameters is based on the magnitude of the real-valued weights. For BNNs, however, the magnitude of binary weights is not meaningful, and thus it is unclear what these hyperparameters actually do. One example is weight-decay, which aims to keep the magnitude of real-valued weights small. Other examples are latent weight initialization, the learning rate, and learning rate decay, which influence the magnitude of the real-valued weights. The magnitude is interpretable for real-valued weights, but loses its meaning for binary weights. In this paper we offer a new interpretation of these magnitude-based hyperparameters based on higher-order gradient filtering during network optimization. Our analysis makes it possible to understand how magnitude-based hyperparameters influence the training of binary networks which allows for new optimization filters specifically designed for binary neural networks that are independent of their real-valued interpretation. Moreover, our improved understanding reduces the number of hyperparameters, which in turn eases the hyperparameter tuning effort which may lead to better hyperparameter values for improved accuracy. Code is available at https://github.com/jorisquist/Understanding-WM-HP-in-BNNs

* Conference: ICLR 2023

Via

Access Paper or Ask Questions

Equal Bits: Enforcing Equally Distributed Binary Network Weights

Dec 02, 2021

Yunqiang Li, Silvia L. Pintea, Jan C. van Gemert

Figure 1 for Equal Bits: Enforcing Equally Distributed Binary Network Weights

Figure 2 for Equal Bits: Enforcing Equally Distributed Binary Network Weights

Figure 3 for Equal Bits: Enforcing Equally Distributed Binary Network Weights

Figure 4 for Equal Bits: Enforcing Equally Distributed Binary Network Weights

Abstract:Binary networks are extremely efficient as they use only two symbols to define the network: $\{+1,-1\}$. One can make the prior distribution of these symbols a design choice. The recent IR-Net of Qin et al. argues that imposing a Bernoulli distribution with equal priors (equal bit ratios) over the binary weights leads to maximum entropy and thus minimizes information loss. However, prior work cannot precisely control the binary weight distribution during training, and therefore cannot guarantee maximum entropy. Here, we show that quantizing using optimal transport can guarantee any bit ratio, including equal ratios. We investigate experimentally that equal bit ratios are indeed preferable and show that our method leads to optimization benefits. We show that our quantization method is effective when compared to state-of-the-art binarization methods, even when using binary weight pruning.

Via

Access Paper or Ask Questions

Deep Unsupervised Image Hashing by Maximizing Bit Entropy

Dec 22, 2020

Yunqiang Li, Jan van Gemert

Figure 1 for Deep Unsupervised Image Hashing by Maximizing Bit Entropy

Figure 2 for Deep Unsupervised Image Hashing by Maximizing Bit Entropy

Figure 3 for Deep Unsupervised Image Hashing by Maximizing Bit Entropy

Figure 4 for Deep Unsupervised Image Hashing by Maximizing Bit Entropy

Abstract:Unsupervised hashing is important for indexing huge image or video collections without having expensive annotations available. Hashing aims to learn short binary codes for compact storage and efficient semantic retrieval. We propose an unsupervised deep hashing layer called Bi-half Net that maximizes entropy of the binary codes. Entropy is maximal when both possible values of the bit are uniformly (half-half) distributed. To maximize bit entropy, we do not add a term to the loss function as this is difficult to optimize and tune. Instead, we design a new parameter-free network layer to explicitly force continuous image features to approximate the optimal half-half bit distribution. This layer is shown to minimize a penalized term of the Wasserstein distance between the learned continuous image features and the optimal half-half bit distribution. Experimental results on the image datasets Flickr25k, Nus-wide, Cifar-10, Mscoco, Mnist and the video datasets Ucf-101 and Hmdb-51 show that our approach leads to compact codes and compares favorably to the current state-of-the-art.

* 9 pages, 6 figures, 3 tables

Via

Access Paper or Ask Questions

Zoom-CAM: Generating Fine-grained Pixel Annotations from Image Labels

Oct 16, 2020

Xiangwei Shi, Seyran Khademi, Yunqiang Li, Jan van Gemert

Figure 1 for Zoom-CAM: Generating Fine-grained Pixel Annotations from Image Labels

Figure 2 for Zoom-CAM: Generating Fine-grained Pixel Annotations from Image Labels

Figure 3 for Zoom-CAM: Generating Fine-grained Pixel Annotations from Image Labels

Figure 4 for Zoom-CAM: Generating Fine-grained Pixel Annotations from Image Labels

Abstract:Current weakly supervised object localization and segmentation rely on class-discriminative visualization techniques to generate pseudo-labels for pixel-level training. Such visualization methods, including class activation mapping (CAM) and Grad-CAM, use only the deepest, lowest resolution convolutional layer, missing all information in intermediate layers. We propose Zoom-CAM: going beyond the last lowest resolution layer by integrating the importance maps over all activations in intermediate layers. Zoom-CAM captures fine-grained small-scale objects for various discriminative class instances, which are commonly missed by the baseline visualization methods. We focus on generating pixel-level pseudo-labels from class labels. The quality of our pseudo-labels evaluated on the ImageNet localization task exhibits more than 2.8% improvement on top-1 error. For weakly supervised semantic segmentation our generated pseudo-labels improve a state of the art model by 1.1%.

* ICPR 2020

Via

Access Paper or Ask Questions

WeightAlign: Normalizing Activations by Weight Alignment

Oct 14, 2020

Xiangwei Shi, Yunqiang Li, Xin Liu, Jan van Gemert

Figure 1 for WeightAlign: Normalizing Activations by Weight Alignment

Figure 2 for WeightAlign: Normalizing Activations by Weight Alignment

Figure 3 for WeightAlign: Normalizing Activations by Weight Alignment

Figure 4 for WeightAlign: Normalizing Activations by Weight Alignment

Abstract:Batch normalization (BN) allows training very deep networks by normalizing activations by mini-batch sample statistics which renders BN unstable for small batch sizes. Current small-batch solutions such as Instance Norm, Layer Norm, and Group Norm use channel statistics which can be computed even for a single sample. Such methods are less stable than BN as they critically depend on the statistics of a single input sample. To address this problem, we propose a normalization of activation without sample statistics. We present WeightAlign: a method that normalizes the weights by the mean and scaled standard derivation computed within a filter, which normalizes activations without computing any sample statistics. Our proposed method is independent of batch size and stable over a wide range of batch sizes. Because weight statistics are orthogonal to sample statistics, we can directly combine WeightAlign with any method for activation normalization. We experimentally demonstrate these benefits for classification on CIFAR-10, CIFAR-100, ImageNet, for semantic segmentation on PASCAL VOC 2012 and for domain adaptation on Office-31.

* The first three authors contributed equally; accepted by ICPR 2020

Via

Access Paper or Ask Questions

Push for Quantization: Deep Fisher Hashing

Aug 31, 2019

Yunqiang Li, Wenjie Pei, Yufei zha, Jan van Gemert

Figure 1 for Push for Quantization: Deep Fisher Hashing

Figure 2 for Push for Quantization: Deep Fisher Hashing

Figure 3 for Push for Quantization: Deep Fisher Hashing

Figure 4 for Push for Quantization: Deep Fisher Hashing

Abstract:Current massive datasets demand light-weight access for analysis. Discrete hashing methods are thus beneficial because they map high-dimensional data to compact binary codes that are efficient to store and process, while preserving semantic similarity. To optimize powerful deep learning methods for image hashing, gradient-based methods are required. Binary codes, however, are discrete and thus have no continuous derivatives. Relaxing the problem by solving it in a continuous space and then quantizing the solution is not guaranteed to yield separable binary codes. The quantization needs to be included in the optimization. In this paper we push for quantization: We optimize maximum class separability in the binary space. We introduce a margin on distances between dissimilar image pairs as measured in the binary space. In addition to pair-wise distances, we draw inspiration from Fisher's Linear Discriminant Analysis (Fisher LDA) to maximize the binary distances between classes and at the same time minimize the binary distance of images within the same class. Experiments on CIFAR-10, NUS-WIDE and ImageNet100 demonstrate compact codes comparing favorably to the current state of the art.

* BMVC 2019

Via

Access Paper or Ask Questions