Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Niv Vosco

QFT: Post-training quantization via fast joint finetuning of all degrees of freedom

Dec 05, 2022

Alex Finkelstein, Ella Fuchs, Idan Tal, Mark Grobman, Niv Vosco, Eldad Meller

Abstract:The post-training quantization (PTQ) challenge of bringing quantized neural net accuracy close to original has drawn much attention driven by industry demand. Many of the methods emphasize optimization of a specific degree-of-freedom (DoF), such as quantization step size, preconditioning factors, bias fixing, often chained to others in multi-step solutions. Here we rethink quantized network parameterization in HW-aware fashion, towards a unified analysis of all quantization DoF, permitting for the first time their joint end-to-end finetuning. Our single-step simple and extendable method, dubbed quantization-aware finetuning (QFT), achieves 4-bit weight quantization results on-par with SoTA within PTQ constraints of speed and resource.

* Presented at CADL2022 workshop at ECCV2022

Via

Access Paper or Ask Questions

Tiled Squeeze-and-Excite: Channel Attention With Local Spatial Context

Jul 05, 2021

Niv Vosco, Alon Shenkler, Mark Grobman

Figure 1 for Tiled Squeeze-and-Excite: Channel Attention With Local Spatial Context

Figure 2 for Tiled Squeeze-and-Excite: Channel Attention With Local Spatial Context

Figure 3 for Tiled Squeeze-and-Excite: Channel Attention With Local Spatial Context

Figure 4 for Tiled Squeeze-and-Excite: Channel Attention With Local Spatial Context

Abstract:In this paper we investigate the amount of spatial context required for channel attention. To this end we study the popular squeeze-and-excite (SE) block which is a simple and lightweight channel attention mechanism. SE blocks and its numerous variants commonly use global average pooling (GAP) to create a single descriptor for each channel. Here, we empirically analyze the amount of spatial context needed for effective channel attention and find that limited localcontext on the order of seven rows or columns of the original image is sufficient to match the performance of global context. We propose tiled squeeze-and-excite (TSE), which is a framework for building SE-like blocks that employ several descriptors per channel, with each descriptor based on local context only. We further show that TSE is a drop-in replacement for the SE block and can be used in existing SE networks without re-training. This implies that local context descriptors are similar both to each other and to the global context descriptor. Finally, we show that TSE has important practical implications for deployment of SE-networks to dataflow AI accelerators due to their reduced pipeline buffering requirements. For example, using TSE reduces the amount of activation pipeline buffering in EfficientDetD2 by 90% compared to SE (from 50M to 4.77M) without loss of accuracy. Our code and pre-trained models will be publicly available.

Via

Access Paper or Ask Questions