Picture for Jaewoong Sim

Jaewoong Sim

InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management

Add code
Jun 28, 2024
Viaarxiv icon

Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization

Add code
Jun 16, 2024
Viaarxiv icon

MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models

Add code
May 29, 2024
Viaarxiv icon