Picture for Victor Ruhle

Victor Ruhle

TURBOATTENTION: Efficient Attention Approximation For High Throughputs LLMs

Add code
Dec 11, 2024
Viaarxiv icon

Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing

Add code
Apr 22, 2024
Viaarxiv icon

Hybrid Retrieval-Augmented Generation for Real-time Composition Assistance

Add code
Aug 08, 2023
Viaarxiv icon