Picture for Evangelos Georganas

Evangelos Georganas

Towards a high-performance AI compiler with upstream MLIR

Add code
Apr 15, 2024
Figure 1 for Towards a high-performance AI compiler with upstream MLIR
Figure 2 for Towards a high-performance AI compiler with upstream MLIR
Figure 3 for Towards a high-performance AI compiler with upstream MLIR
Figure 4 for Towards a high-performance AI compiler with upstream MLIR
Viaarxiv icon

Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures

Add code
Apr 25, 2023
Viaarxiv icon

FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems

Add code
Apr 22, 2022
Figure 1 for FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems
Figure 2 for FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems
Figure 3 for FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems
Figure 4 for FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems
Viaarxiv icon

DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks

Add code
Apr 16, 2021
Figure 1 for DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks
Figure 2 for DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks
Figure 3 for DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks
Figure 4 for DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks
Viaarxiv icon

Efficient and Generic 1D Dilated Convolution Layer for Deep Learning

Add code
Apr 16, 2021
Figure 1 for Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Figure 2 for Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Figure 3 for Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Figure 4 for Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Viaarxiv icon

Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads

Add code
Apr 14, 2021
Figure 1 for Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads
Figure 2 for Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads
Figure 3 for Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads
Figure 4 for Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads
Viaarxiv icon

Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures

Add code
May 10, 2020
Figure 1 for Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Figure 2 for Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Figure 3 for Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Figure 4 for Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Viaarxiv icon

High-Performance Deep Learning via a Single Building Block

Add code
Jun 18, 2019
Figure 1 for High-Performance Deep Learning via a Single Building Block
Figure 2 for High-Performance Deep Learning via a Single Building Block
Figure 3 for High-Performance Deep Learning via a Single Building Block
Figure 4 for High-Performance Deep Learning via a Single Building Block
Viaarxiv icon

A Study of BFLOAT16 for Deep Learning Training

Add code
Jun 13, 2019
Figure 1 for A Study of BFLOAT16 for Deep Learning Training
Figure 2 for A Study of BFLOAT16 for Deep Learning Training
Figure 3 for A Study of BFLOAT16 for Deep Learning Training
Figure 4 for A Study of BFLOAT16 for Deep Learning Training
Viaarxiv icon

ISA Mapper: A Compute and Hardware Agnostic Deep Learning Compiler

Add code
Oct 12, 2018
Figure 1 for ISA Mapper: A Compute and Hardware Agnostic Deep Learning Compiler
Figure 2 for ISA Mapper: A Compute and Hardware Agnostic Deep Learning Compiler
Figure 3 for ISA Mapper: A Compute and Hardware Agnostic Deep Learning Compiler
Figure 4 for ISA Mapper: A Compute and Hardware Agnostic Deep Learning Compiler
Viaarxiv icon