Picture for Shijie Cao

Shijie Cao

Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach

Add code
Nov 28, 2024
Viaarxiv icon

SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Add code
Oct 17, 2024
Figure 1 for SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Figure 2 for SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Figure 3 for SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Figure 4 for SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Viaarxiv icon

LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration

Add code
Aug 12, 2024
Viaarxiv icon

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge

Add code
Jun 25, 2024
Viaarxiv icon

BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation

Add code
Feb 16, 2024
Viaarxiv icon

AFPQ: Asymmetric Floating Point Quantization for LLMs

Add code
Nov 03, 2023
Viaarxiv icon

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Add code
Aug 23, 2023
Viaarxiv icon

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Add code
May 31, 2023
Figure 1 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Figure 2 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Figure 3 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Figure 4 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Viaarxiv icon

Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training

Add code
May 31, 2023
Viaarxiv icon

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models

Add code
May 21, 2023
Viaarxiv icon