Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sai Sanjeet

MixQuant: Pushing the Limits of Block Rotations in Post-Training Quantization

Jan 29, 2026

Sai Sanjeet, Ian Colbert, Pablo Monteagudo-Lago, Giuseppe Franco, Yaman Umuroglu, Nicholas J. Fraser

Abstract:Recent post-training quantization (PTQ) methods have adopted block rotations to diffuse outliers prior to rounding. While this reduces the overhead of full-vector rotations, the effect of block structure on outlier suppression remains poorly understood. To fill this gap, we present the first systematic, non-asymptotic analysis of outlier suppression for block Hadamard rotations. Our analysis reveals that outlier suppression is fundamentally limited by the geometry of the input vector. In particular, post-rotation outliers are deterministically minimized when the pre-rotation $\ell_1$ norm mass is evenly distributed across blocks. Guided by these insights, we introduce MixQuant, a block rotation-aware PTQ framework that redistributes activation mass via permutations prior to rotation. We propose a greedy mass diffusion algorithm to calibrate permutations by equalizing the expected blockwise $\ell_1$ norms. To avoid adding inference overhead, we identify permutation-equivariant regions in transformer architectures to merge the resulting permutations into model weights before deployment. Experiments show that MixQuant consistently improves accuracy across all block sizes, recovering up to 90% of the full-vector rotation perplexity when quantizing Llama3 1B to INT4 with block size 16, compared to 46% without permutations.

Via

Access Paper or Ask Questions

Variable Resolution Pixel Quantization for Low Power Machine Vision Application on Edge

Oct 07, 2024

Senorita Deb, Sai Sanjeet, Prabir Kumar Biswas, Bibhu Datta Sahoo

Figure 1 for Variable Resolution Pixel Quantization for Low Power Machine Vision Application on Edge

Figure 2 for Variable Resolution Pixel Quantization for Low Power Machine Vision Application on Edge

Figure 3 for Variable Resolution Pixel Quantization for Low Power Machine Vision Application on Edge

Figure 4 for Variable Resolution Pixel Quantization for Low Power Machine Vision Application on Edge

Abstract:This work describes an approach towards pixel quantization using variable resolution which is made feasible using image transformation in the analog domain. The main aim is to reduce the average bits-per-pixel (BPP) necessary for representing an image while maintaining the classification accuracy of a Convolutional Neural Network (CNN) that is trained for image classification. The proposed algorithm is based on the Hadamard transform that leads to a low-resolution variable quantization by the analog-to-digital converter (ADC) thus reducing the power dissipation in hardware at the sensor node. Despite the trade-offs inherent in image transformation, the proposed algorithm achieves competitive accuracy levels across various image sizes and ADC configurations, highlighting the importance of considering both accuracy and power consumption in edge computing applications. The schematic of a novel 1.5 bit ADC that incorporates the Hadamard transform is also proposed. A hardware implementation of the analog transformation followed by software-based variable quantization is done for the CIFAR-10 test dataset. The digitized data shows that the network can still identify transformed images with a remarkable 90% accuracy for 3-BPP transformed images following the proposed method.

* 13 pages, 20 figures

Via

Access Paper or Ask Questions

SpikePipe: Accelerated Training of Spiking Neural Networks via Inter-Layer Pipelining and Multiprocessor Scheduling

Jun 11, 2024

Sai Sanjeet, Bibhu Datta Sahoo, Keshab K. Parhi

Figure 1 for SpikePipe: Accelerated Training of Spiking Neural Networks via Inter-Layer Pipelining and Multiprocessor Scheduling

Figure 2 for SpikePipe: Accelerated Training of Spiking Neural Networks via Inter-Layer Pipelining and Multiprocessor Scheduling

Figure 3 for SpikePipe: Accelerated Training of Spiking Neural Networks via Inter-Layer Pipelining and Multiprocessor Scheduling

Figure 4 for SpikePipe: Accelerated Training of Spiking Neural Networks via Inter-Layer Pipelining and Multiprocessor Scheduling

Abstract:Spiking Neural Networks (SNNs) have gained popularity due to their high energy efficiency. Prior works have proposed various methods for training SNNs, including backpropagation-based methods. Training SNNs is computationally expensive compared to their conventional counterparts and would benefit from multiprocessor hardware acceleration. This is the first paper to propose inter-layer pipelining to accelerate training in SNNs using systolic array-based processors and multiprocessor scheduling. The impact of training using delayed gradients is observed using three networks training on different datasets, showing no degradation for small networks and < 10% degradation for large networks. The mapping of various training tasks of the SNN onto systolic arrays is formulated, and the proposed scheduling method is evaluated on the three networks. The results are compared against standard pipelining algorithms. The results show that the proposed method achieves an average speedup of 1.6X compared to standard pipelining algorithms, with an upwards of 2X improvement in some cases. The incurred communication overhead due to the proposed method is less than 0.5% of the total required communication of training.

Via

Access Paper or Ask Questions