Picture for Amir Gholami

Amir Gholami

UC Berkeley/LBNL/ICSI

Squeezed Attention: Accelerating Long Context Length LLM Inference

Add code
Nov 14, 2024
Figure 1 for Squeezed Attention: Accelerating Long Context Length LLM Inference
Figure 2 for Squeezed Attention: Accelerating Long Context Length LLM Inference
Figure 3 for Squeezed Attention: Accelerating Long Context Length LLM Inference
Figure 4 for Squeezed Attention: Accelerating Long Context Length LLM Inference
Viaarxiv icon

Efficient and Scalable Estimation of Tool Representations in Vector Space

Add code
Sep 02, 2024
Figure 1 for Efficient and Scalable Estimation of Tool Representations in Vector Space
Figure 2 for Efficient and Scalable Estimation of Tool Representations in Vector Space
Figure 3 for Efficient and Scalable Estimation of Tool Representations in Vector Space
Figure 4 for Efficient and Scalable Estimation of Tool Representations in Vector Space
Viaarxiv icon

TinyAgent: Function Calling at the Edge

Add code
Sep 01, 2024
Viaarxiv icon

Characterizing Prompt Compression Methods for Long Context Inference

Add code
Jul 11, 2024
Viaarxiv icon

Reliable edge machine learning hardware for scientific applications

Add code
Jun 27, 2024
Figure 1 for Reliable edge machine learning hardware for scientific applications
Figure 2 for Reliable edge machine learning hardware for scientific applications
Figure 3 for Reliable edge machine learning hardware for scientific applications
Figure 4 for Reliable edge machine learning hardware for scientific applications
Viaarxiv icon

LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

Add code
Mar 22, 2024
Figure 1 for LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Figure 2 for LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Figure 3 for LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Figure 4 for LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Viaarxiv icon

AI and Memory Wall

Add code
Mar 21, 2024
Figure 1 for AI and Memory Wall
Figure 2 for AI and Memory Wall
Figure 3 for AI and Memory Wall
Figure 4 for AI and Memory Wall
Viaarxiv icon

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Add code
Feb 07, 2024
Viaarxiv icon

An LLM Compiler for Parallel Function Calling

Add code
Dec 07, 2023
Viaarxiv icon

SPEED: Speculative Pipelined Execution for Efficient Decoding

Add code
Oct 18, 2023
Viaarxiv icon