Picture for Saeed Maleki

Saeed Maleki

Microsoft Research

ForestColl: Efficient Collective Communications on Heterogeneous Network Fabrics

Add code
Feb 09, 2024
Viaarxiv icon

Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search

Add code
Nov 26, 2023
Viaarxiv icon

Look-Up mAI GeMM: Increasing AI GeMMs Performance by Nearly 2.5x via msGeMM

Add code
Oct 09, 2023
Viaarxiv icon

SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction

Add code
Jan 21, 2023
Figure 1 for SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction
Figure 2 for SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction
Figure 3 for SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction
Figure 4 for SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction
Viaarxiv icon

Synthesizing Collective Communication Algorithms for Heterogeneous Networks with TACCL

Add code
Nov 15, 2021
Figure 1 for Synthesizing Collective Communication Algorithms for Heterogeneous Networks with TACCL
Figure 2 for Synthesizing Collective Communication Algorithms for Heterogeneous Networks with TACCL
Figure 3 for Synthesizing Collective Communication Algorithms for Heterogeneous Networks with TACCL
Figure 4 for Synthesizing Collective Communication Algorithms for Heterogeneous Networks with TACCL
Viaarxiv icon

Total Least Squares for Optimal Pose Estimation

Add code
Jun 22, 2021
Figure 1 for Total Least Squares for Optimal Pose Estimation
Figure 2 for Total Least Squares for Optimal Pose Estimation
Figure 3 for Total Least Squares for Optimal Pose Estimation
Figure 4 for Total Least Squares for Optimal Pose Estimation
Viaarxiv icon

CoCoNet: Co-Optimizing Computation and Communication for Distributed Machine Learning

Add code
May 13, 2021
Figure 1 for CoCoNet: Co-Optimizing Computation and Communication for Distributed Machine Learning
Figure 2 for CoCoNet: Co-Optimizing Computation and Communication for Distributed Machine Learning
Figure 3 for CoCoNet: Co-Optimizing Computation and Communication for Distributed Machine Learning
Figure 4 for CoCoNet: Co-Optimizing Computation and Communication for Distributed Machine Learning
Viaarxiv icon

Scaling Distributed Training with Adaptive Summation

Add code
Jun 04, 2020
Figure 1 for Scaling Distributed Training with Adaptive Summation
Figure 2 for Scaling Distributed Training with Adaptive Summation
Figure 3 for Scaling Distributed Training with Adaptive Summation
Figure 4 for Scaling Distributed Training with Adaptive Summation
Viaarxiv icon

Distributed Word2Vec using Graph Analytics Frameworks

Add code
Sep 08, 2019
Figure 1 for Distributed Word2Vec using Graph Analytics Frameworks
Figure 2 for Distributed Word2Vec using Graph Analytics Frameworks
Figure 3 for Distributed Word2Vec using Graph Analytics Frameworks
Figure 4 for Distributed Word2Vec using Graph Analytics Frameworks
Viaarxiv icon

CHET: Compiler and Runtime for Homomorphic Evaluation of Tensor Programs

Add code
Oct 01, 2018
Figure 1 for CHET: Compiler and Runtime for Homomorphic Evaluation of Tensor Programs
Figure 2 for CHET: Compiler and Runtime for Homomorphic Evaluation of Tensor Programs
Figure 3 for CHET: Compiler and Runtime for Homomorphic Evaluation of Tensor Programs
Figure 4 for CHET: Compiler and Runtime for Homomorphic Evaluation of Tensor Programs
Viaarxiv icon