Picture for Puneet Gupta

Puneet Gupta

Performance Modeling and Workload Analysis of Distributed Large Language Model Training and Inference

Add code
Jul 19, 2024
Viaarxiv icon

FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models

Add code
Jun 28, 2024
Figure 1 for FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
Figure 2 for FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
Figure 3 for FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
Figure 4 for FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
Viaarxiv icon

Cost-Driven Hardware-Software Co-Optimization of Machine Learning Pipelines

Add code
Oct 19, 2023
Viaarxiv icon

Training Neural Networks for Execution on Approximate Hardware

Add code
Apr 08, 2023
Figure 1 for Training Neural Networks for Execution on Approximate Hardware
Figure 2 for Training Neural Networks for Execution on Approximate Hardware
Figure 3 for Training Neural Networks for Execution on Approximate Hardware
Figure 4 for Training Neural Networks for Execution on Approximate Hardware
Viaarxiv icon

PhotoFourier: A Photonic Joint Transform Correlator-Based Neural Network Accelerator

Add code
Nov 10, 2022
Viaarxiv icon

Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors

Add code
Jan 25, 2022
Figure 1 for Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors
Figure 2 for Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors
Figure 3 for Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors
Figure 4 for Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors
Viaarxiv icon

Batch Processing and Data Streaming Fourier-based Convolutional Neural Network Accelerator

Add code
Dec 23, 2021
Figure 1 for Batch Processing and Data Streaming Fourier-based Convolutional Neural Network Accelerator
Figure 2 for Batch Processing and Data Streaming Fourier-based Convolutional Neural Network Accelerator
Figure 3 for Batch Processing and Data Streaming Fourier-based Convolutional Neural Network Accelerator
Figure 4 for Batch Processing and Data Streaming Fourier-based Convolutional Neural Network Accelerator
Viaarxiv icon

SWIS -- Shared Weight bIt Sparsity for Efficient Neural Network Acceleration

Add code
Mar 03, 2021
Figure 1 for SWIS -- Shared Weight bIt Sparsity for Efficient Neural Network Acceleration
Figure 2 for SWIS -- Shared Weight bIt Sparsity for Efficient Neural Network Acceleration
Figure 3 for SWIS -- Shared Weight bIt Sparsity for Efficient Neural Network Acceleration
Figure 4 for SWIS -- Shared Weight bIt Sparsity for Efficient Neural Network Acceleration
Viaarxiv icon

MOMBAT: Heart Rate Monitoring from Face Video using Pulse Modeling and Bayesian Tracking

Add code
May 10, 2020
Figure 1 for MOMBAT: Heart Rate Monitoring from Face Video using Pulse Modeling and Bayesian Tracking
Figure 2 for MOMBAT: Heart Rate Monitoring from Face Video using Pulse Modeling and Bayesian Tracking
Figure 3 for MOMBAT: Heart Rate Monitoring from Face Video using Pulse Modeling and Bayesian Tracking
Figure 4 for MOMBAT: Heart Rate Monitoring from Face Video using Pulse Modeling and Bayesian Tracking
Viaarxiv icon

Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training

Add code
Jul 30, 2019
Figure 1 for Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
Figure 2 for Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
Figure 3 for Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
Figure 4 for Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
Viaarxiv icon