Picture for Matei Zaharia

Matei Zaharia

LangProBe: a Language Programs Benchmark

Add code
Feb 27, 2025
Viaarxiv icon

Optimizing Model Selection for Compound AI Systems

Add code
Feb 20, 2025
Viaarxiv icon

LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!

Add code
Feb 11, 2025
Viaarxiv icon

Adaptive Semantic Prompt Caching with VectorQ

Add code
Feb 06, 2025
Viaarxiv icon

BARE: Combining Base and Instruction-Tuned Language Models for Better Synthetic Data Generation

Add code
Feb 03, 2025
Viaarxiv icon

WARP: An Efficient Engine for Multi-Vector Retrieval

Add code
Jan 29, 2025
Figure 1 for WARP: An Efficient Engine for Multi-Vector Retrieval
Figure 2 for WARP: An Efficient Engine for Multi-Vector Retrieval
Figure 3 for WARP: An Efficient Engine for Multi-Vector Retrieval
Figure 4 for WARP: An Efficient Engine for Multi-Vector Retrieval
Viaarxiv icon

HashAttention: Semantic Sparsity for Faster Inference

Add code
Dec 19, 2024
Figure 1 for HashAttention: Semantic Sparsity for Faster Inference
Figure 2 for HashAttention: Semantic Sparsity for Faster Inference
Figure 3 for HashAttention: Semantic Sparsity for Faster Inference
Figure 4 for HashAttention: Semantic Sparsity for Faster Inference
Viaarxiv icon

MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs

Add code
Nov 18, 2024
Figure 1 for MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs
Figure 2 for MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs
Figure 3 for MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs
Figure 4 for MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs
Viaarxiv icon

Drowning in Documents: Consequences of Scaling Reranker Inference

Add code
Nov 18, 2024
Viaarxiv icon

Long Context RAG Performance of Large Language Models

Add code
Nov 05, 2024
Figure 1 for Long Context RAG Performance of Large Language Models
Figure 2 for Long Context RAG Performance of Large Language Models
Figure 3 for Long Context RAG Performance of Large Language Models
Viaarxiv icon