Picture for Haiduo Huang

Haiduo Huang

Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding

Add code
Mar 13, 2025
Viaarxiv icon

Partial Convolution Meets Visual Attention

Add code
Mar 05, 2025
Viaarxiv icon

Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE

Add code
Feb 10, 2025
Viaarxiv icon

Nearly Lossless Adaptive Bit Switching

Add code
Feb 03, 2025
Figure 1 for Nearly Lossless Adaptive Bit Switching
Figure 2 for Nearly Lossless Adaptive Bit Switching
Figure 3 for Nearly Lossless Adaptive Bit Switching
Figure 4 for Nearly Lossless Adaptive Bit Switching
Viaarxiv icon

Partial Channel Network: Compute Fewer, Perform Better

Add code
Feb 03, 2025
Viaarxiv icon

FTP: A Fine-grained Token-wise Pruner for Large Language Models via Token Routing

Add code
Dec 16, 2024
Viaarxiv icon