Picture for Hari Subramoni

Hari Subramoni

DK

Accelerating Large Language Model Training with Hybrid GPU-based Compression

Add code
Sep 04, 2024
Viaarxiv icon

Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer

Add code
Aug 30, 2024
Viaarxiv icon

Demystifying the Communication Characteristics for Distributed Transformer Models

Add code
Aug 19, 2024
Viaarxiv icon

The Case for Co-Designing Model Architectures with Hardware

Add code
Jan 30, 2024
Figure 1 for The Case for Co-Designing Model Architectures with Hardware
Figure 2 for The Case for Co-Designing Model Architectures with Hardware
Figure 3 for The Case for Co-Designing Model Architectures with Hardware
Figure 4 for The Case for Co-Designing Model Architectures with Hardware
Viaarxiv icon

Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference

Add code
Jan 17, 2024
Viaarxiv icon

Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference

Add code
May 24, 2023
Viaarxiv icon

MCR-DL: Mix-and-Match Communication Runtime for Deep Learning

Add code
Mar 15, 2023
Viaarxiv icon

Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version

Add code
Mar 09, 2023
Figure 1 for Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version
Figure 2 for Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version
Figure 3 for Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version
Figure 4 for Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version
Viaarxiv icon

OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems

Add code
Oct 20, 2021
Figure 1 for OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems
Figure 2 for OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems
Figure 3 for OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems
Figure 4 for OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems
Viaarxiv icon

Efficient MPI-based Communication for GPU-Accelerated Dask Applications

Add code
Jan 21, 2021
Figure 1 for Efficient MPI-based Communication for GPU-Accelerated Dask Applications
Figure 2 for Efficient MPI-based Communication for GPU-Accelerated Dask Applications
Figure 3 for Efficient MPI-based Communication for GPU-Accelerated Dask Applications
Figure 4 for Efficient MPI-based Communication for GPU-Accelerated Dask Applications
Viaarxiv icon