Picture for Kuntai Du

Kuntai Du

LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts

Add code
Nov 21, 2024
Viaarxiv icon

CacheBlend: Fast Large Language Model Serving with Cached Knowledge Fusion

Add code
May 26, 2024
Figure 1 for CacheBlend: Fast Large Language Model Serving with Cached Knowledge Fusion
Figure 2 for CacheBlend: Fast Large Language Model Serving with Cached Knowledge Fusion
Figure 3 for CacheBlend: Fast Large Language Model Serving with Cached Knowledge Fusion
Figure 4 for CacheBlend: Fast Large Language Model Serving with Cached Knowledge Fusion
Viaarxiv icon

Chatterbox: Robust Transport for LLM Token Streaming under Unstable Network

Add code
Jan 23, 2024
Viaarxiv icon

CacheGen: Fast Context Loading for Language Model Applications

Add code
Oct 11, 2023
Viaarxiv icon

Automatic and Efficient Customization of Neural Networks for ML Applications

Add code
Oct 07, 2023
Viaarxiv icon

OneAdapt: Fast Adaptation for Deep Learning Applications via Backpropagation

Add code
Oct 03, 2023
Viaarxiv icon

AccMPEG: Optimizing Video Encoding for Video Analytics

Add code
Apr 26, 2022
Figure 1 for AccMPEG: Optimizing Video Encoding for Video Analytics
Figure 2 for AccMPEG: Optimizing Video Encoding for Video Analytics
Figure 3 for AccMPEG: Optimizing Video Encoding for Video Analytics
Figure 4 for AccMPEG: Optimizing Video Encoding for Video Analytics
Viaarxiv icon