Picture for Alexey Tumanov

Alexey Tumanov

Georgia Institute of Technology

Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations

Add code
Sep 25, 2024
Figure 1 for Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Figure 2 for Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Figure 3 for Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Figure 4 for Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Viaarxiv icon

Metron: Holistic Performance Evaluation Framework for LLM Inference Systems

Add code
Jul 09, 2024
Figure 1 for Metron: Holistic Performance Evaluation Framework for LLM Inference Systems
Figure 2 for Metron: Holistic Performance Evaluation Framework for LLM Inference Systems
Figure 3 for Metron: Holistic Performance Evaluation Framework for LLM Inference Systems
Figure 4 for Metron: Holistic Performance Evaluation Framework for LLM Inference Systems
Viaarxiv icon

DεpS: Delayed ε-Shrinking for Faster Once-For-All Training

Add code
Jul 08, 2024
Figure 1 for DεpS: Delayed ε-Shrinking for Faster Once-For-All Training
Figure 2 for DεpS: Delayed ε-Shrinking for Faster Once-For-All Training
Figure 3 for DεpS: Delayed ε-Shrinking for Faster Once-For-All Training
Figure 4 for DεpS: Delayed ε-Shrinking for Faster Once-For-All Training
Viaarxiv icon

Vidur: A Large-Scale Simulation Framework For LLM Inference

Add code
May 08, 2024
Viaarxiv icon

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve

Add code
Mar 04, 2024
Viaarxiv icon

SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads

Add code
Dec 27, 2023
Viaarxiv icon

Signed Binarization: Unlocking Efficiency Through Repetition-Sparsity Trade-Off

Add code
Dec 04, 2023
Figure 1 for Signed Binarization: Unlocking Efficiency Through Repetition-Sparsity Trade-Off
Figure 2 for Signed Binarization: Unlocking Efficiency Through Repetition-Sparsity Trade-Off
Figure 3 for Signed Binarization: Unlocking Efficiency Through Repetition-Sparsity Trade-Off
Figure 4 for Signed Binarization: Unlocking Efficiency Through Repetition-Sparsity Trade-Off
Viaarxiv icon

ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation

Add code
Oct 24, 2023
Figure 1 for ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation
Figure 2 for ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation
Figure 3 for ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation
Figure 4 for ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation
Viaarxiv icon

Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems

Add code
Jul 03, 2023
Viaarxiv icon

Subgraph Stationary Hardware-Software Inference Co-Design

Add code
Jun 21, 2023
Viaarxiv icon