Picture for Arvind Krishnamurthy

Arvind Krishnamurthy

University of Washington

FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving

Add code
Jan 02, 2025
Figure 1 for FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Figure 2 for FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Figure 3 for FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Figure 4 for FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Viaarxiv icon

ForestColl: Efficient Collective Communications on Heterogeneous Network Fabrics

Add code
Feb 09, 2024
Viaarxiv icon

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Add code
Nov 07, 2023
Figure 1 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 2 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 3 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 4 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Viaarxiv icon

Punica: Multi-Tenant LoRA Serving

Add code
Oct 28, 2023
Viaarxiv icon

Symphony: Optimized Model Serving using Centralized Orchestration

Add code
Aug 14, 2023
Figure 1 for Symphony: Optimized Model Serving using Centralized Orchestration
Figure 2 for Symphony: Optimized Model Serving using Centralized Orchestration
Figure 3 for Symphony: Optimized Model Serving using Centralized Orchestration
Figure 4 for Symphony: Optimized Model Serving using Centralized Orchestration
Viaarxiv icon

Bandwidth Optimal Pipeline Schedule for Collective Communication

Add code
May 31, 2023
Figure 1 for Bandwidth Optimal Pipeline Schedule for Collective Communication
Figure 2 for Bandwidth Optimal Pipeline Schedule for Collective Communication
Viaarxiv icon

Optimal Direct-Connect Topologies for Collective Communications

Add code
Feb 07, 2022
Figure 1 for Optimal Direct-Connect Topologies for Collective Communications
Figure 2 for Optimal Direct-Connect Topologies for Collective Communications
Figure 3 for Optimal Direct-Connect Topologies for Collective Communications
Figure 4 for Optimal Direct-Connect Topologies for Collective Communications
Viaarxiv icon

Cloud Collectives: Towards Cloud-aware Collectives forML Workloads with Rank Reordering

Add code
May 28, 2021
Figure 1 for Cloud Collectives: Towards Cloud-aware Collectives forML Workloads with Rank Reordering
Figure 2 for Cloud Collectives: Towards Cloud-aware Collectives forML Workloads with Rank Reordering
Figure 3 for Cloud Collectives: Towards Cloud-aware Collectives forML Workloads with Rank Reordering
Figure 4 for Cloud Collectives: Towards Cloud-aware Collectives forML Workloads with Rank Reordering
Viaarxiv icon

AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly

Add code
May 22, 2021
Figure 1 for AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly
Figure 2 for AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly
Figure 3 for AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly
Figure 4 for AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly
Viaarxiv icon

Scaling Distributed Machine Learning with In-Network Aggregation

Add code
Feb 22, 2019
Figure 1 for Scaling Distributed Machine Learning with In-Network Aggregation
Figure 2 for Scaling Distributed Machine Learning with In-Network Aggregation
Figure 3 for Scaling Distributed Machine Learning with In-Network Aggregation
Figure 4 for Scaling Distributed Machine Learning with In-Network Aggregation
Viaarxiv icon