Picture for Guangxuan Xiao

Guangxuan Xiao

BLASST: Dynamic BLocked Attention Sparsity via Softmax Thresholding

Add code
Dec 12, 2025
Viaarxiv icon

Optimizing Mixture of Block Attention

Add code
Nov 14, 2025
Viaarxiv icon

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Add code
Oct 10, 2025
Viaarxiv icon

XAttention: Block Sparse Attention with Antidiagonal Scoring

Add code
Mar 20, 2025
Figure 1 for XAttention: Block Sparse Attention with Antidiagonal Scoring
Figure 2 for XAttention: Block Sparse Attention with Antidiagonal Scoring
Figure 3 for XAttention: Block Sparse Attention with Antidiagonal Scoring
Figure 4 for XAttention: Block Sparse Attention with Antidiagonal Scoring
Viaarxiv icon

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

Add code
Feb 20, 2025
Viaarxiv icon

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Add code
Oct 14, 2024
Figure 1 for DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Figure 2 for DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Figure 3 for DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Figure 4 for DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Viaarxiv icon

Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Add code
Jun 16, 2024
Figure 1 for Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Figure 2 for Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Figure 3 for Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Figure 4 for Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Viaarxiv icon

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Add code
May 07, 2024
Figure 1 for QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Figure 2 for QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Figure 3 for QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Figure 4 for QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Viaarxiv icon

Retrieval Head Mechanistically Explains Long-Context Factuality

Add code
Apr 24, 2024
Figure 1 for Retrieval Head Mechanistically Explains Long-Context Factuality
Figure 2 for Retrieval Head Mechanistically Explains Long-Context Factuality
Figure 3 for Retrieval Head Mechanistically Explains Long-Context Factuality
Figure 4 for Retrieval Head Mechanistically Explains Long-Context Factuality
Viaarxiv icon

BitDelta: Your Fine-Tune May Only Be Worth One Bit

Add code
Feb 28, 2024
Viaarxiv icon