Picture for Size Zheng

Size Zheng

Eric

MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness

Add code
Mar 27, 2025
Viaarxiv icon

Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts

Add code
Feb 27, 2025
Viaarxiv icon

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Add code
Oct 28, 2024
Viaarxiv icon

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Add code
Nov 07, 2023
Figure 1 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 2 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 3 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 4 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Viaarxiv icon

HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation

Add code
May 04, 2021
Figure 1 for HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation
Figure 2 for HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation
Figure 3 for HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation
Figure 4 for HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation
Viaarxiv icon