Picture for Sujan Kumar Gonugondla

Sujan Kumar Gonugondla

UCLA-CS

The N-Grammys: Accelerating Autoregressive Inference with Learning-Free Batched Speculation

Add code
Nov 06, 2024
Viaarxiv icon

BASS: Batched Attention-optimized Speculative Sampling

Add code
Apr 24, 2024
Viaarxiv icon

Token Alignment via Character Matching for Subword Completion

Add code
Mar 13, 2024
Viaarxiv icon

Bifurcated Attention for Single-Context Large-Batch Sampling

Add code
Mar 13, 2024
Viaarxiv icon

Multi-lingual Evaluation of Code Generation Models

Add code
Oct 26, 2022
Viaarxiv icon

Fundamental Limits on Energy-Delay-Accuracy of In-memory Architectures in Inference Applications

Add code
Dec 25, 2020
Figure 1 for Fundamental Limits on Energy-Delay-Accuracy of In-memory Architectures in Inference Applications
Figure 2 for Fundamental Limits on Energy-Delay-Accuracy of In-memory Architectures in Inference Applications
Figure 3 for Fundamental Limits on Energy-Delay-Accuracy of In-memory Architectures in Inference Applications
Figure 4 for Fundamental Limits on Energy-Delay-Accuracy of In-memory Architectures in Inference Applications
Viaarxiv icon