Picture for Haibo Chen

Haibo Chen

KunServe: Elastic and Efficient Large Language Model Serving with Parameter-centric Memory Management

Add code
Dec 24, 2024
Viaarxiv icon

Unifying KV Cache Compression for Large Language Models with LeanKV

Add code
Dec 04, 2024
Figure 1 for Unifying KV Cache Compression for Large Language Models with LeanKV
Figure 2 for Unifying KV Cache Compression for Large Language Models with LeanKV
Figure 3 for Unifying KV Cache Compression for Large Language Models with LeanKV
Figure 4 for Unifying KV Cache Compression for Large Language Models with LeanKV
Viaarxiv icon

PowerInfer-2: Fast Large Language Model Inference on a Smartphone

Add code
Jun 12, 2024
Figure 1 for PowerInfer-2: Fast Large Language Model Inference on a Smartphone
Figure 2 for PowerInfer-2: Fast Large Language Model Inference on a Smartphone
Figure 3 for PowerInfer-2: Fast Large Language Model Inference on a Smartphone
Figure 4 for PowerInfer-2: Fast Large Language Model Inference on a Smartphone
Viaarxiv icon

Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters

Add code
Jun 11, 2024
Figure 1 for Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
Figure 2 for Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
Figure 3 for Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
Figure 4 for Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
Viaarxiv icon

Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search and Breaking It with Second-Tier Memory

Add code
May 07, 2024
Viaarxiv icon

PNeSM: Arbitrary 3D Scene Stylization via Prompt-Based Neural Style Mapping

Add code
Mar 13, 2024
Viaarxiv icon

Attack Deterministic Conditional Image Generative Models for Diverse and Controllable Generation

Add code
Mar 13, 2024
Viaarxiv icon

PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

Add code
Dec 16, 2023
Figure 1 for PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Figure 2 for PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Figure 3 for PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Figure 4 for PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Viaarxiv icon

TSSAT: Two-Stage Statistics-Aware Transformation for Artistic Style Transfer

Add code
Sep 12, 2023
Viaarxiv icon

An Overview of Resource Allocation in Integrated Sensing and Communication

Add code
May 15, 2023
Viaarxiv icon