Picture for Tianle Cai

Tianle Cai

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Add code
Nov 07, 2024
Figure 1 for SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Figure 2 for SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Figure 3 for SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Figure 4 for SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Viaarxiv icon

Training-Free Activation Sparsity in Large Language Models

Add code
Aug 26, 2024
Viaarxiv icon

FlexAttention for Efficient High-Resolution Vision-Language Models

Add code
Jul 29, 2024
Figure 1 for FlexAttention for Efficient High-Resolution Vision-Language Models
Figure 2 for FlexAttention for Efficient High-Resolution Vision-Language Models
Figure 3 for FlexAttention for Efficient High-Resolution Vision-Language Models
Figure 4 for FlexAttention for Efficient High-Resolution Vision-Language Models
Viaarxiv icon

Incremental Comprehension of Garden-Path Sentences by Large Language Models: Semantic Interpretation, Syntactic Re-Analysis, and Attention

Add code
May 25, 2024
Viaarxiv icon

SnapKV: LLM Knows What You are Looking for Before Generation

Add code
Apr 22, 2024
Viaarxiv icon

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Add code
Apr 11, 2024
Viaarxiv icon

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Add code
Mar 07, 2024
Figure 1 for DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Figure 2 for DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Figure 3 for DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Figure 4 for DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Viaarxiv icon

Accelerating Greedy Coordinate Gradient via Probe Sampling

Add code
Mar 02, 2024
Viaarxiv icon

BitDelta: Your Fine-Tune May Only Be Worth One Bit

Add code
Feb 28, 2024
Viaarxiv icon

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Add code
Jan 19, 2024
Viaarxiv icon