Picture for Shuzhang Zhong

Shuzhang Zhong

PrivQuant: Communication-Efficient Private Inference with Quantized Network/Protocol Co-Optimization

Add code
Oct 12, 2024
Viaarxiv icon

AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference

Add code
Aug 19, 2024
Viaarxiv icon

ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding

Add code
Feb 21, 2024
Figure 1 for ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding
Figure 2 for ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding
Figure 3 for ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding
Figure 4 for ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding
Viaarxiv icon

Memory-aware Scheduling for Complex Wired Networks with Iterative Graph Optimization

Add code
Aug 26, 2023
Viaarxiv icon