Picture for Valeriu Codreanu

Valeriu Codreanu

Reduced Precision Strategies for Deep Learning: A High Energy Physics Generative Adversarial Network Use Case

Add code
Mar 18, 2021
Figure 1 for Reduced Precision Strategies for Deep Learning: A High Energy Physics Generative Adversarial Network Use Case
Figure 2 for Reduced Precision Strategies for Deep Learning: A High Energy Physics Generative Adversarial Network Use Case
Figure 3 for Reduced Precision Strategies for Deep Learning: A High Energy Physics Generative Adversarial Network Use Case
Figure 4 for Reduced Precision Strategies for Deep Learning: A High Energy Physics Generative Adversarial Network Use Case
Viaarxiv icon

Densifying Assumed-sparse Tensors: Improving Memory Efficiency and MPI Collective Performance during Tensor Accumulation for Parallelized Training of Neural Machine Translation Models

Add code
May 10, 2019
Figure 1 for Densifying Assumed-sparse Tensors: Improving Memory Efficiency and MPI Collective Performance during Tensor Accumulation for Parallelized Training of Neural Machine Translation Models
Figure 2 for Densifying Assumed-sparse Tensors: Improving Memory Efficiency and MPI Collective Performance during Tensor Accumulation for Parallelized Training of Neural Machine Translation Models
Figure 3 for Densifying Assumed-sparse Tensors: Improving Memory Efficiency and MPI Collective Performance during Tensor Accumulation for Parallelized Training of Neural Machine Translation Models
Figure 4 for Densifying Assumed-sparse Tensors: Improving Memory Efficiency and MPI Collective Performance during Tensor Accumulation for Parallelized Training of Neural Machine Translation Models
Viaarxiv icon

Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train

Add code
Nov 15, 2017
Figure 1 for Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train
Figure 2 for Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train
Figure 3 for Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train
Figure 4 for Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train
Viaarxiv icon

CLTune: A Generic Auto-Tuner for OpenCL Kernels

Add code
Mar 19, 2017
Figure 1 for CLTune: A Generic Auto-Tuner for OpenCL Kernels
Figure 2 for CLTune: A Generic Auto-Tuner for OpenCL Kernels
Figure 3 for CLTune: A Generic Auto-Tuner for OpenCL Kernels
Figure 4 for CLTune: A Generic Auto-Tuner for OpenCL Kernels
Viaarxiv icon