Picture for James Demmel

James Demmel

Computron: Serving Distributed Deep Learning Models with Model Parallel Swapping

Add code
Jun 24, 2023
Viaarxiv icon

Distributed-Memory Sparse Kernels for Machine Learning

Add code
Mar 18, 2022
Figure 1 for Distributed-Memory Sparse Kernels for Machine Learning
Figure 2 for Distributed-Memory Sparse Kernels for Machine Learning
Figure 3 for Distributed-Memory Sparse Kernels for Machine Learning
Figure 4 for Distributed-Memory Sparse Kernels for Machine Learning
Viaarxiv icon

CoSA: Scheduling by Constrained Optimization for Spatial Accelerators

Add code
May 05, 2021
Figure 1 for CoSA: Scheduling by Constrained Optimization for Spatial Accelerators
Figure 2 for CoSA: Scheduling by Constrained Optimization for Spatial Accelerators
Figure 3 for CoSA: Scheduling by Constrained Optimization for Spatial Accelerators
Figure 4 for CoSA: Scheduling by Constrained Optimization for Spatial Accelerators
Viaarxiv icon

Avoiding Communication in Logistic Regression

Add code
Nov 16, 2020
Figure 1 for Avoiding Communication in Logistic Regression
Figure 2 for Avoiding Communication in Logistic Regression
Figure 3 for Avoiding Communication in Logistic Regression
Figure 4 for Avoiding Communication in Logistic Regression
Viaarxiv icon

Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour

Add code
Nov 05, 2020
Figure 1 for Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour
Figure 2 for Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour
Figure 3 for Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour
Viaarxiv icon

The Limit of the Batch Size

Add code
Jun 15, 2020
Figure 1 for The Limit of the Batch Size
Figure 2 for The Limit of the Batch Size
Figure 3 for The Limit of the Batch Size
Figure 4 for The Limit of the Batch Size
Viaarxiv icon

Auto-Precision Scaling for Distributed Deep Learning

Add code
Nov 20, 2019
Figure 1 for Auto-Precision Scaling for Distributed Deep Learning
Figure 2 for Auto-Precision Scaling for Distributed Deep Learning
Figure 3 for Auto-Precision Scaling for Distributed Deep Learning
Figure 4 for Auto-Precision Scaling for Distributed Deep Learning
Viaarxiv icon

Reducing BERT Pre-Training Time from 3 Days to 76 Minutes

Add code
Apr 01, 2019
Figure 1 for Reducing BERT Pre-Training Time from 3 Days to 76 Minutes
Figure 2 for Reducing BERT Pre-Training Time from 3 Days to 76 Minutes
Figure 3 for Reducing BERT Pre-Training Time from 3 Days to 76 Minutes
Figure 4 for Reducing BERT Pre-Training Time from 3 Days to 76 Minutes
Viaarxiv icon

Large-Batch Training for LSTM and Beyond

Add code
Jan 24, 2019
Figure 1 for Large-Batch Training for LSTM and Beyond
Figure 2 for Large-Batch Training for LSTM and Beyond
Figure 3 for Large-Batch Training for LSTM and Beyond
Figure 4 for Large-Batch Training for LSTM and Beyond
Viaarxiv icon

ImageNet Training in Minutes

Add code
Jan 31, 2018
Figure 1 for ImageNet Training in Minutes
Figure 2 for ImageNet Training in Minutes
Figure 3 for ImageNet Training in Minutes
Figure 4 for ImageNet Training in Minutes
Viaarxiv icon