Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Saman Naderiparizi

SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators

Oct 14, 2024

Rasoul Shafipour, David Harrison, Maxwell Horton, Jeffrey Marker, Houman Bedayat, Sachin Mehta, Mohammad Rastegari, Mahyar Najibi, Saman Naderiparizi

Figure 1 for SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators

Figure 2 for SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators

Figure 3 for SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators

Figure 4 for SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators

Abstract:Large Language Models (LLMs) have transformed natural language processing, but face significant challenges in widespread deployment due to their high runtime cost. In this paper, we introduce SeedLM, a novel post-training compression method that uses seeds of pseudo-random generators to encode and compress model weights. Specifically, for each block of weights, we find a seed that is fed into a Linear Feedback Shift Register (LFSR) during inference to efficiently generate a random matrix. This matrix is then linearly combined with compressed coefficients to reconstruct the weight block. SeedLM reduces memory access and leverages idle compute cycles during inference, effectively speeding up memory-bound tasks by trading compute for fewer memory accesses. Unlike state-of-the-art compression methods that rely on calibration data, our approach is data-free and generalizes well across diverse tasks. Our experiments with Llama 3 70B, which is particularly challenging to compress, show that SeedLM achieves significantly better zero-shot accuracy retention at 4- and 3-bit than state-of-the-art techniques, while maintaining performance comparable to FP16 baselines. Additionally, FPGA-based tests demonstrate that 4-bit SeedLM, as model size increases to 70B, approaches a 4x speed-up over an FP16 Llama 2/3 baseline.

Via

Access Paper or Ask Questions

Ultra-low-power Wireless Streaming Cameras

Jul 27, 2017

Saman Naderiparizi, Mehrdad Hessar, Vamsi Talla, Shyamnath Gollakota, Joshua R. Smith

Figure 1 for Ultra-low-power Wireless Streaming Cameras

Figure 2 for Ultra-low-power Wireless Streaming Cameras

Figure 3 for Ultra-low-power Wireless Streaming Cameras

Figure 4 for Ultra-low-power Wireless Streaming Cameras

Abstract:Wireless video streaming has traditionally been considered an extremely power-hungry operation. Existing approaches optimize the camera and communication modules individually to minimize their power consumption. However, the joint redesign and optimization of wireless communication as well as the camera is what that provides more power saving. We present an ultra-low-power wireless video streaming camera. To achieve this, we present a novel "analog" video backscatter technique that feeds analog pixels from the photo-diodes directly to the backscatter hardware, thereby eliminating power consuming hardware components such as ADCs and amplifiers. We prototype our wireless camera using off-the-shelf hardware and show that our design can stream video at up to 13 FPS and can operate up to a distance of 150 feet from the access point. Our COTS prototype consumes 2.36mW. Finally, to demonstrate the potential of our design, we built two proof-of-concept applications: video streaming for micro-robots and security cameras for face detection.

* 9 pages, 11 figures

Via

Access Paper or Ask Questions