Picture for Sayeh Sharify

Sayeh Sharify

Understanding the difficulty of low-precision post-training quantization of large language models

Add code
Oct 18, 2024
Viaarxiv icon

Scaling laws for post-training quantized large language models

Add code
Oct 15, 2024
Viaarxiv icon

Combining multiple post-training techniques to achieve most efficient quantized LLMs

Add code
May 12, 2024
Viaarxiv icon

Self-Selected Attention Span for Accelerating Large Language Model Inference

Add code
Apr 14, 2024
Viaarxiv icon

Mixed-Precision Quantization with Cross-Layer Dependencies

Add code
Jul 11, 2023
Viaarxiv icon

Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks

Add code
May 16, 2018
Figure 1 for Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks
Figure 2 for Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks
Figure 3 for Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks
Figure 4 for Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks
Viaarxiv icon

DPRed: Making Typical Activation Values Matter In Deep Learning Computing

Add code
May 15, 2018
Figure 1 for DPRed: Making Typical Activation Values Matter In Deep Learning Computing
Figure 2 for DPRed: Making Typical Activation Values Matter In Deep Learning Computing
Figure 3 for DPRed: Making Typical Activation Values Matter In Deep Learning Computing
Figure 4 for DPRed: Making Typical Activation Values Matter In Deep Learning Computing
Viaarxiv icon

Laconic Deep Learning Computing

Add code
May 10, 2018
Figure 1 for Laconic Deep Learning Computing
Figure 2 for Laconic Deep Learning Computing
Figure 3 for Laconic Deep Learning Computing
Figure 4 for Laconic Deep Learning Computing
Viaarxiv icon

Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How

Add code
Mar 09, 2018
Figure 1 for Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How
Figure 2 for Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How
Figure 3 for Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How
Figure 4 for Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How
Viaarxiv icon

Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability

Add code
Jul 27, 2017
Figure 1 for Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability
Figure 2 for Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability
Figure 3 for Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability
Figure 4 for Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability
Viaarxiv icon