Picture for Aniruddha Nrusimha

Aniruddha Nrusimha

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Add code
May 21, 2024
Viaarxiv icon

Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

Add code
Apr 04, 2024
Viaarxiv icon

Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding

Add code
Feb 07, 2024
Viaarxiv icon

Towards Verifiable Text Generation with Symbolic References

Add code
Nov 15, 2023
Viaarxiv icon

Striped Attention: Faster Ring Attention for Causal Transformers

Add code
Nov 15, 2023
Viaarxiv icon

Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization

Add code
Oct 07, 2019
Figure 1 for Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization
Figure 2 for Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization
Figure 3 for Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization
Figure 4 for Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization
Viaarxiv icon