Picture for Zeyuan Allen-Zhu

Zeyuan Allen-Zhu

Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

Add code
Aug 29, 2024
Viaarxiv icon

Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process

Add code
Jul 29, 2024
Figure 1 for Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Figure 2 for Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Figure 3 for Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Figure 4 for Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Viaarxiv icon

Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws

Add code
Apr 08, 2024
Viaarxiv icon

Reverse Training to Nurse the Reversal Curse

Add code
Mar 20, 2024
Figure 1 for Reverse Training to Nurse the Reversal Curse
Figure 2 for Reverse Training to Nurse the Reversal Curse
Figure 3 for Reverse Training to Nurse the Reversal Curse
Figure 4 for Reverse Training to Nurse the Reversal Curse
Viaarxiv icon

Physics of Language Models: Part 3.2, Knowledge Manipulation

Add code
Sep 25, 2023
Viaarxiv icon

Physics of Language Models: Part 1, Context-Free Grammar

Add code
May 23, 2023
Viaarxiv icon

LoRA: Low-Rank Adaptation of Large Language Models

Add code
Jun 17, 2021
Figure 1 for LoRA: Low-Rank Adaptation of Large Language Models
Figure 2 for LoRA: Low-Rank Adaptation of Large Language Models
Figure 3 for LoRA: Low-Rank Adaptation of Large Language Models
Figure 4 for LoRA: Low-Rank Adaptation of Large Language Models
Viaarxiv icon

Forward Super-Resolution: How Can GANs Learn Hierarchical Generative Models for Real-World Distributions

Add code
Jun 04, 2021
Figure 1 for Forward Super-Resolution: How Can GANs Learn Hierarchical Generative Models for Real-World Distributions
Figure 2 for Forward Super-Resolution: How Can GANs Learn Hierarchical Generative Models for Real-World Distributions
Figure 3 for Forward Super-Resolution: How Can GANs Learn Hierarchical Generative Models for Real-World Distributions
Figure 4 for Forward Super-Resolution: How Can GANs Learn Hierarchical Generative Models for Real-World Distributions
Viaarxiv icon

Byzantine-Resilient Non-Convex Stochastic Gradient Descent

Add code
Dec 28, 2020
Figure 1 for Byzantine-Resilient Non-Convex Stochastic Gradient Descent
Figure 2 for Byzantine-Resilient Non-Convex Stochastic Gradient Descent
Figure 3 for Byzantine-Resilient Non-Convex Stochastic Gradient Descent
Figure 4 for Byzantine-Resilient Non-Convex Stochastic Gradient Descent
Viaarxiv icon

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

Add code
Dec 17, 2020
Figure 1 for Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning
Figure 2 for Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning
Figure 3 for Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning
Figure 4 for Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning
Viaarxiv icon