Picture for Andreas Moshovos

Andreas Moshovos

University of Toronto, Vector Institute

Low-Bitwidth Floating Point Quantization for Efficient High-Quality Diffusion Models

Add code
Aug 13, 2024
Viaarxiv icon

Schrödinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training

Add code
Apr 28, 2022
Figure 1 for Schrödinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training
Figure 2 for Schrödinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training
Figure 3 for Schrödinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training
Figure 4 for Schrödinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training
Viaarxiv icon

Mokey: Enabling Narrow Fixed-Point Inference for Out-of-the-Box Floating-Point Transformer Models

Add code
Mar 23, 2022
Figure 1 for Mokey: Enabling Narrow Fixed-Point Inference for Out-of-the-Box Floating-Point Transformer Models
Figure 2 for Mokey: Enabling Narrow Fixed-Point Inference for Out-of-the-Box Floating-Point Transformer Models
Figure 3 for Mokey: Enabling Narrow Fixed-Point Inference for Out-of-the-Box Floating-Point Transformer Models
Figure 4 for Mokey: Enabling Narrow Fixed-Point Inference for Out-of-the-Box Floating-Point Transformer Models
Viaarxiv icon

APack: Off-Chip, Lossless Data Compression for Efficient Deep Learning Inference

Add code
Jan 21, 2022
Figure 1 for APack: Off-Chip, Lossless Data Compression for Efficient Deep Learning Inference
Figure 2 for APack: Off-Chip, Lossless Data Compression for Efficient Deep Learning Inference
Figure 3 for APack: Off-Chip, Lossless Data Compression for Efficient Deep Learning Inference
Figure 4 for APack: Off-Chip, Lossless Data Compression for Efficient Deep Learning Inference
Viaarxiv icon

FPRaker: A Processing Element For Accelerating Neural Network Training

Add code
Oct 15, 2020
Figure 1 for FPRaker: A Processing Element For Accelerating Neural Network Training
Figure 2 for FPRaker: A Processing Element For Accelerating Neural Network Training
Figure 3 for FPRaker: A Processing Element For Accelerating Neural Network Training
Figure 4 for FPRaker: A Processing Element For Accelerating Neural Network Training
Viaarxiv icon

TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference

Add code
Sep 01, 2020
Figure 1 for TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference
Figure 2 for TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference
Figure 3 for TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference
Figure 4 for TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference
Viaarxiv icon

GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference

Add code
May 08, 2020
Figure 1 for GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Figure 2 for GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Figure 3 for GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Figure 4 for GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Viaarxiv icon

BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization

Add code
Feb 08, 2020
Figure 1 for BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization
Figure 2 for BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization
Figure 3 for BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization
Figure 4 for BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization
Viaarxiv icon

Training CNNs faster with Dynamic Input and Kernel Downsampling

Add code
Oct 15, 2019
Figure 1 for Training CNNs faster with Dynamic Input and Kernel Downsampling
Figure 2 for Training CNNs faster with Dynamic Input and Kernel Downsampling
Figure 3 for Training CNNs faster with Dynamic Input and Kernel Downsampling
Figure 4 for Training CNNs faster with Dynamic Input and Kernel Downsampling
Viaarxiv icon

Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks

Add code
May 16, 2018
Figure 1 for Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks
Figure 2 for Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks
Figure 3 for Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks
Figure 4 for Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks
Viaarxiv icon