Picture for Minsik Cho

Minsik Cho

Towards Low-bit Communication for Tensor Parallel LLM Inference

Add code
Nov 12, 2024
Figure 1 for Towards Low-bit Communication for Tensor Parallel LLM Inference
Figure 2 for Towards Low-bit Communication for Tensor Parallel LLM Inference
Figure 3 for Towards Low-bit Communication for Tensor Parallel LLM Inference
Viaarxiv icon

Apple Intelligence Foundation Language Models

Add code
Jul 29, 2024
Figure 1 for Apple Intelligence Foundation Language Models
Figure 2 for Apple Intelligence Foundation Language Models
Figure 3 for Apple Intelligence Foundation Language Models
Figure 4 for Apple Intelligence Foundation Language Models
Viaarxiv icon

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

Add code
Jul 19, 2024
Viaarxiv icon

KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation

Add code
May 08, 2024
Viaarxiv icon

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Add code
Dec 12, 2023
Figure 1 for LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Figure 2 for LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Figure 3 for LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Figure 4 for LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Viaarxiv icon

Prompting might be all you need to repair Compressed LLMs

Add code
Oct 14, 2023
Viaarxiv icon

Streaming Anchor Loss: Augmenting Supervision with Temporal Significance

Add code
Oct 09, 2023
Viaarxiv icon

eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models

Add code
Sep 13, 2023
Viaarxiv icon

Flexible Keyword Spotting based on Homogeneous Audio-Text Embedding

Add code
Aug 12, 2023
Viaarxiv icon

Matching Latent Encoding for Audio-Text based Keyword Spotting

Add code
Jun 08, 2023
Viaarxiv icon