Picture for Zhihang Yuan

Zhihang Yuan

CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios

Add code
Sep 16, 2024
Viaarxiv icon

Learning High-Frequency Functions Made Easy with Sinusoidal Positional Encoding

Add code
Jul 12, 2024
Viaarxiv icon

DiTFastAttn: Attention Compression for Diffusion Transformer Models

Add code
Jun 12, 2024
Viaarxiv icon

PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram

Add code
May 29, 2024
Figure 1 for PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram
Figure 2 for PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram
Figure 3 for PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram
Figure 4 for PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram
Viaarxiv icon

I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models

Add code
May 28, 2024
Viaarxiv icon

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

Add code
May 27, 2024
Viaarxiv icon

SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models

Add code
May 10, 2024
Viaarxiv icon

A Survey on Efficient Inference for Large Language Models

Add code
Apr 22, 2024
Viaarxiv icon

PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds

Add code
Apr 11, 2024
Viaarxiv icon

LLM Inference Unveiled: Survey and Roofline Model Insights

Add code
Mar 11, 2024
Viaarxiv icon