Picture for Yuanzhi Li

Yuanzhi Li

Phi-4 Technical Report

Add code
Dec 12, 2024
Viaarxiv icon

Mixture of Parrots: Experts improve memorization more than reasoning

Add code
Oct 24, 2024
Figure 1 for Mixture of Parrots: Experts improve memorization more than reasoning
Figure 2 for Mixture of Parrots: Experts improve memorization more than reasoning
Figure 3 for Mixture of Parrots: Experts improve memorization more than reasoning
Figure 4 for Mixture of Parrots: Experts improve memorization more than reasoning
Viaarxiv icon

LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks

Add code
Oct 16, 2024
Figure 1 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 2 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 3 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 4 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Viaarxiv icon

Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data

Add code
Oct 11, 2024
Figure 1 for Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
Figure 2 for Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
Figure 3 for Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
Figure 4 for Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
Viaarxiv icon

Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts

Add code
Sep 02, 2024
Viaarxiv icon

Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

Add code
Aug 29, 2024
Viaarxiv icon

Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process

Add code
Jul 29, 2024
Figure 1 for Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Figure 2 for Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Figure 3 for Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Figure 4 for Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Viaarxiv icon

How Does Overparameterization Affect Features?

Add code
Jul 01, 2024
Viaarxiv icon

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Add code
Apr 23, 2024
Figure 1 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Figure 2 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Figure 3 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Figure 4 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Viaarxiv icon

AgentKit: Flow Engineering with Graphs, not Coding

Add code
Apr 17, 2024
Viaarxiv icon