Picture for Matei Zaharia

Matei Zaharia

Why Do Multi-Agent LLM Systems Fail?

Add code
Mar 17, 2025
Viaarxiv icon

LangProBe: a Language Programs Benchmark

Add code
Feb 27, 2025
Viaarxiv icon

Optimizing Model Selection for Compound AI Systems

Add code
Feb 20, 2025
Viaarxiv icon

LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!

Add code
Feb 11, 2025
Viaarxiv icon

Adaptive Semantic Prompt Caching with VectorQ

Add code
Feb 06, 2025
Viaarxiv icon

BARE: Combining Base and Instruction-Tuned Language Models for Better Synthetic Data Generation

Add code
Feb 03, 2025
Viaarxiv icon

WARP: An Efficient Engine for Multi-Vector Retrieval

Add code
Jan 29, 2025
Figure 1 for WARP: An Efficient Engine for Multi-Vector Retrieval
Figure 2 for WARP: An Efficient Engine for Multi-Vector Retrieval
Figure 3 for WARP: An Efficient Engine for Multi-Vector Retrieval
Figure 4 for WARP: An Efficient Engine for Multi-Vector Retrieval
Viaarxiv icon

HashAttention: Semantic Sparsity for Faster Inference

Add code
Dec 19, 2024
Figure 1 for HashAttention: Semantic Sparsity for Faster Inference
Figure 2 for HashAttention: Semantic Sparsity for Faster Inference
Figure 3 for HashAttention: Semantic Sparsity for Faster Inference
Figure 4 for HashAttention: Semantic Sparsity for Faster Inference
Viaarxiv icon

MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs

Add code
Nov 18, 2024
Figure 1 for MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs
Figure 2 for MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs
Figure 3 for MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs
Figure 4 for MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs
Viaarxiv icon

Drowning in Documents: Consequences of Scaling Reranker Inference

Add code
Nov 18, 2024
Figure 1 for Drowning in Documents: Consequences of Scaling Reranker Inference
Figure 2 for Drowning in Documents: Consequences of Scaling Reranker Inference
Figure 3 for Drowning in Documents: Consequences of Scaling Reranker Inference
Figure 4 for Drowning in Documents: Consequences of Scaling Reranker Inference
Viaarxiv icon