Picture for Chi-Chih Chang

Chi-Chih Chang

SRT: Accelerating Reinforcement Learning via Speculative Rollout with Tree-Structured Cache

Add code
Jan 14, 2026
Viaarxiv icon

SplitReason: Learning To Offload Reasoning

Add code
Apr 23, 2025
Viaarxiv icon

Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models

Add code
Mar 28, 2025
Viaarxiv icon

xKV: Cross-Layer SVD for KV-Cache Compression

Add code
Mar 24, 2025
Viaarxiv icon

TokenButler: Token Importance is Predictable

Add code
Mar 10, 2025
Figure 1 for TokenButler: Token Importance is Predictable
Figure 2 for TokenButler: Token Importance is Predictable
Figure 3 for TokenButler: Token Importance is Predictable
Figure 4 for TokenButler: Token Importance is Predictable
Viaarxiv icon

SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs

Add code
Feb 18, 2025
Figure 1 for SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs
Figure 2 for SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs
Figure 3 for SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs
Figure 4 for SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs
Viaarxiv icon

V"Mean"ba: Visual State Space Models only need 1 hidden dimension

Add code
Dec 21, 2024
Figure 1 for V"Mean"ba: Visual State Space Models only need 1 hidden dimension
Figure 2 for V"Mean"ba: Visual State Space Models only need 1 hidden dimension
Figure 3 for V"Mean"ba: Visual State Space Models only need 1 hidden dimension
Figure 4 for V"Mean"ba: Visual State Space Models only need 1 hidden dimension
Viaarxiv icon

Quamba: A Post-Training Quantization Recipe for Selective State Space Models

Add code
Oct 17, 2024
Viaarxiv icon

ELSA: Exploiting Layer-wise N:M Sparsity for Vision Transformer Acceleration

Add code
Sep 15, 2024
Figure 1 for ELSA: Exploiting Layer-wise N:M Sparsity for Vision Transformer Acceleration
Figure 2 for ELSA: Exploiting Layer-wise N:M Sparsity for Vision Transformer Acceleration
Figure 3 for ELSA: Exploiting Layer-wise N:M Sparsity for Vision Transformer Acceleration
Figure 4 for ELSA: Exploiting Layer-wise N:M Sparsity for Vision Transformer Acceleration
Viaarxiv icon

Palu: Compressing KV-Cache with Low-Rank Projection

Add code
Jul 30, 2024
Figure 1 for Palu: Compressing KV-Cache with Low-Rank Projection
Figure 2 for Palu: Compressing KV-Cache with Low-Rank Projection
Figure 3 for Palu: Compressing KV-Cache with Low-Rank Projection
Figure 4 for Palu: Compressing KV-Cache with Low-Rank Projection
Viaarxiv icon