Picture for Hari Subramoni

Hari Subramoni

DK

Accelerating Large Language Model Training with Hybrid GPU-based Compression

Add code
Sep 04, 2024
Figure 1 for Accelerating Large Language Model Training with Hybrid GPU-based Compression
Figure 2 for Accelerating Large Language Model Training with Hybrid GPU-based Compression
Figure 3 for Accelerating Large Language Model Training with Hybrid GPU-based Compression
Figure 4 for Accelerating Large Language Model Training with Hybrid GPU-based Compression
Viaarxiv icon

Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer

Add code
Aug 30, 2024
Figure 1 for Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Figure 2 for Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Figure 3 for Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Figure 4 for Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Viaarxiv icon

Demystifying the Communication Characteristics for Distributed Transformer Models

Add code
Aug 19, 2024
Figure 1 for Demystifying the Communication Characteristics for Distributed Transformer Models
Figure 2 for Demystifying the Communication Characteristics for Distributed Transformer Models
Figure 3 for Demystifying the Communication Characteristics for Distributed Transformer Models
Figure 4 for Demystifying the Communication Characteristics for Distributed Transformer Models
Viaarxiv icon

The Case for Co-Designing Model Architectures with Hardware

Add code
Jan 30, 2024
Figure 1 for The Case for Co-Designing Model Architectures with Hardware
Figure 2 for The Case for Co-Designing Model Architectures with Hardware
Figure 3 for The Case for Co-Designing Model Architectures with Hardware
Figure 4 for The Case for Co-Designing Model Architectures with Hardware
Viaarxiv icon

Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference

Add code
Jan 17, 2024
Viaarxiv icon

Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference

Add code
May 24, 2023
Figure 1 for Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference
Figure 2 for Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference
Figure 3 for Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference
Figure 4 for Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference
Viaarxiv icon

MCR-DL: Mix-and-Match Communication Runtime for Deep Learning

Add code
Mar 15, 2023
Viaarxiv icon

Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version

Add code
Mar 09, 2023
Figure 1 for Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version
Figure 2 for Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version
Figure 3 for Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version
Figure 4 for Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version
Viaarxiv icon

OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems

Add code
Oct 20, 2021
Figure 1 for OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems
Figure 2 for OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems
Figure 3 for OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems
Figure 4 for OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems
Viaarxiv icon

Efficient MPI-based Communication for GPU-Accelerated Dask Applications

Add code
Jan 21, 2021
Figure 1 for Efficient MPI-based Communication for GPU-Accelerated Dask Applications
Figure 2 for Efficient MPI-based Communication for GPU-Accelerated Dask Applications
Figure 3 for Efficient MPI-based Communication for GPU-Accelerated Dask Applications
Figure 4 for Efficient MPI-based Communication for GPU-Accelerated Dask Applications
Viaarxiv icon