Picture for Junchen Jiang

Junchen Jiang

LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts

Add code
Nov 21, 2024
Viaarxiv icon

DroidSpeak: Enhancing Cross-LLM Communication

Add code
Nov 05, 2024
Viaarxiv icon

SwiftQueue: Optimizing Low-Latency Applications with Swift Packet Queuing

Add code
Oct 08, 2024
Viaarxiv icon

CacheBlend: Fast Large Language Model Serving with Cached Knowledge Fusion

Add code
May 26, 2024
Figure 1 for CacheBlend: Fast Large Language Model Serving with Cached Knowledge Fusion
Figure 2 for CacheBlend: Fast Large Language Model Serving with Cached Knowledge Fusion
Figure 3 for CacheBlend: Fast Large Language Model Serving with Cached Knowledge Fusion
Figure 4 for CacheBlend: Fast Large Language Model Serving with Cached Knowledge Fusion
Viaarxiv icon

Large Language Model Adaptation for Networking

Add code
Feb 04, 2024
Viaarxiv icon

Chatterbox: Robust Transport for LLM Token Streaming under Unstable Network

Add code
Jan 23, 2024
Viaarxiv icon

CacheGen: Fast Context Loading for Language Model Applications

Add code
Oct 11, 2023
Viaarxiv icon

Automatic and Efficient Customization of Neural Networks for ML Applications

Add code
Oct 07, 2023
Viaarxiv icon

OneAdapt: Fast Adaptation for Deep Learning Applications via Backpropagation

Add code
Oct 03, 2023
Viaarxiv icon

Grace++: Loss-Resilient Real-Time Video Communication under High Network Latency

Add code
May 21, 2023
Viaarxiv icon