Picture for Yao Fu

Yao Fu

MoE-CAP: Cost-Accuracy-Performance Benchmarking for Mixture-of-Experts Systems

Add code
Dec 10, 2024
Viaarxiv icon

Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models

Add code
Nov 25, 2024
Figure 1 for Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models
Figure 2 for Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models
Figure 3 for Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models
Figure 4 for Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models
Viaarxiv icon

Interactive and Expressive Code-Augmented Planning with Large Language Models

Add code
Nov 21, 2024
Viaarxiv icon

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Add code
Oct 14, 2024
Figure 1 for DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Figure 2 for DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Figure 3 for DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Figure 4 for DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Viaarxiv icon

ProTrain: Efficient LLM Training via Memory-Aware Techniques

Add code
Jun 12, 2024
Figure 1 for ProTrain: Efficient LLM Training via Memory-Aware Techniques
Figure 2 for ProTrain: Efficient LLM Training via Memory-Aware Techniques
Figure 3 for ProTrain: Efficient LLM Training via Memory-Aware Techniques
Figure 4 for ProTrain: Efficient LLM Training via Memory-Aware Techniques
Viaarxiv icon

Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis

Add code
May 14, 2024
Viaarxiv icon

Long Context Alignment with Short Instructions and Synthesized Positions

Add code
May 07, 2024
Figure 1 for Long Context Alignment with Short Instructions and Synthesized Positions
Figure 2 for Long Context Alignment with Short Instructions and Synthesized Positions
Figure 3 for Long Context Alignment with Short Instructions and Synthesized Positions
Figure 4 for Long Context Alignment with Short Instructions and Synthesized Positions
Viaarxiv icon

Retrieval Head Mechanistically Explains Long-Context Factuality

Add code
Apr 24, 2024
Figure 1 for Retrieval Head Mechanistically Explains Long-Context Factuality
Figure 2 for Retrieval Head Mechanistically Explains Long-Context Factuality
Figure 3 for Retrieval Head Mechanistically Explains Long-Context Factuality
Figure 4 for Retrieval Head Mechanistically Explains Long-Context Factuality
Viaarxiv icon

Toward Inference-optimal Mixture-of-Expert Large Language Models

Add code
Apr 03, 2024
Figure 1 for Toward Inference-optimal Mixture-of-Expert Large Language Models
Figure 2 for Toward Inference-optimal Mixture-of-Expert Large Language Models
Figure 3 for Toward Inference-optimal Mixture-of-Expert Large Language Models
Figure 4 for Toward Inference-optimal Mixture-of-Expert Large Language Models
Viaarxiv icon

AutoGuide: Automated Generation and Selection of State-Aware Guidelines for Large Language Model Agents

Add code
Mar 13, 2024
Viaarxiv icon