Picture for Jungi Lee

Jungi Lee

InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management

Add code
Jun 28, 2024
Viaarxiv icon

Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization

Add code
Jun 16, 2024
Viaarxiv icon