Picture for Sashank J. Reddi

Sashank J. Reddi

On the Role of Depth and Looping for In-Context Learning with Task Diversity

Add code
Oct 29, 2024
Figure 1 for On the Role of Depth and Looping for In-Context Learning with Task Diversity
Figure 2 for On the Role of Depth and Looping for In-Context Learning with Task Diversity
Viaarxiv icon

A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs

Add code
Oct 24, 2024
Figure 1 for A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
Figure 2 for A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
Figure 3 for A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
Figure 4 for A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
Viaarxiv icon

Simplicity Bias via Global Convergence of Sharpness Minimization

Add code
Oct 21, 2024
Viaarxiv icon

Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?

Add code
Oct 10, 2024
Figure 1 for Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Figure 2 for Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Figure 3 for Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Figure 4 for Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Viaarxiv icon

On the Inductive Bias of Stacking Towards Improving Reasoning

Add code
Sep 27, 2024
Viaarxiv icon

Efficient Document Ranking with Learnable Late Interactions

Add code
Jun 25, 2024
Figure 1 for Efficient Document Ranking with Learnable Late Interactions
Figure 2 for Efficient Document Ranking with Learnable Late Interactions
Figure 3 for Efficient Document Ranking with Learnable Late Interactions
Figure 4 for Efficient Document Ranking with Learnable Late Interactions
Viaarxiv icon

Landscape-Aware Growing: The Power of a Little LAG

Add code
Jun 04, 2024
Figure 1 for Landscape-Aware Growing: The Power of a Little LAG
Figure 2 for Landscape-Aware Growing: The Power of a Little LAG
Figure 3 for Landscape-Aware Growing: The Power of a Little LAG
Figure 4 for Landscape-Aware Growing: The Power of a Little LAG
Viaarxiv icon

Depth Dependence of $μ$P Learning Rates in ReLU MLPs

Add code
May 13, 2023
Viaarxiv icon

Differentially Private Adaptive Optimization with Delayed Preconditioners

Add code
Dec 01, 2022
Viaarxiv icon

On the Algorithmic Stability and Generalization of Adaptive Optimization Methods

Add code
Nov 08, 2022
Figure 1 for On the Algorithmic Stability and Generalization of Adaptive Optimization Methods
Figure 2 for On the Algorithmic Stability and Generalization of Adaptive Optimization Methods
Figure 3 for On the Algorithmic Stability and Generalization of Adaptive Optimization Methods
Figure 4 for On the Algorithmic Stability and Generalization of Adaptive Optimization Methods
Viaarxiv icon