Picture for Minsik Cho

Minsik Cho

Towards Low-bit Communication for Tensor Parallel LLM Inference

Add code
Nov 12, 2024
Figure 1 for Towards Low-bit Communication for Tensor Parallel LLM Inference
Figure 2 for Towards Low-bit Communication for Tensor Parallel LLM Inference
Figure 3 for Towards Low-bit Communication for Tensor Parallel LLM Inference
Viaarxiv icon

Apple Intelligence Foundation Language Models

Add code
Jul 29, 2024
Figure 1 for Apple Intelligence Foundation Language Models
Figure 2 for Apple Intelligence Foundation Language Models
Figure 3 for Apple Intelligence Foundation Language Models
Figure 4 for Apple Intelligence Foundation Language Models
Viaarxiv icon

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

Add code
Jul 19, 2024
Viaarxiv icon

KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation

Add code
May 08, 2024
Viaarxiv icon

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Add code
Dec 12, 2023
Viaarxiv icon

Prompting might be all you need to repair Compressed LLMs

Add code
Oct 14, 2023
Viaarxiv icon

Streaming Anchor Loss: Augmenting Supervision with Temporal Significance

Add code
Oct 09, 2023
Viaarxiv icon

eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models

Add code
Sep 13, 2023
Viaarxiv icon

Flexible Keyword Spotting based on Homogeneous Audio-Text Embedding

Add code
Aug 12, 2023
Viaarxiv icon

Matching Latent Encoding for Audio-Text based Keyword Spotting

Add code
Jun 08, 2023
Viaarxiv icon