Picture for Aurick Qiao

Aurick Qiao

SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference

Add code
Nov 07, 2024
Viaarxiv icon

SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation

Add code
Oct 04, 2024
Viaarxiv icon

STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning

Add code
Sep 10, 2024
Figure 1 for STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
Figure 2 for STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
Figure 3 for STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
Figure 4 for STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
Viaarxiv icon

Efficient LLM Scheduling by Learning to Rank

Add code
Aug 28, 2024
Figure 1 for Efficient LLM Scheduling by Learning to Rank
Figure 2 for Efficient LLM Scheduling by Learning to Rank
Figure 3 for Efficient LLM Scheduling by Learning to Rank
Figure 4 for Efficient LLM Scheduling by Learning to Rank
Viaarxiv icon

LLM360: Towards Fully Transparent Open-Source LLMs

Add code
Dec 11, 2023
Viaarxiv icon

Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning

Add code
Aug 27, 2020
Figure 1 for Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning
Figure 2 for Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning
Figure 3 for Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning
Figure 4 for Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning
Viaarxiv icon

Fault Tolerance in Iterative-Convergent Machine Learning

Add code
Oct 17, 2018
Figure 1 for Fault Tolerance in Iterative-Convergent Machine Learning
Figure 2 for Fault Tolerance in Iterative-Convergent Machine Learning
Figure 3 for Fault Tolerance in Iterative-Convergent Machine Learning
Figure 4 for Fault Tolerance in Iterative-Convergent Machine Learning
Viaarxiv icon