Picture for William Brandon

William Brandon

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

Add code
Jul 15, 2024
Figure 1 for Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Figure 2 for Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Figure 3 for Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Figure 4 for Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Viaarxiv icon

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Add code
May 21, 2024
Viaarxiv icon

Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding

Add code
Feb 07, 2024
Viaarxiv icon

Striped Attention: Faster Ring Attention for Causal Transformers

Add code
Nov 15, 2023
Viaarxiv icon