Picture for Nikunj Saunshi

Nikunj Saunshi

Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding

Add code
Feb 17, 2025
Viaarxiv icon

StagFormer: Time Staggering Transformer Decoding for RunningLayers In Parallel

Add code
Jan 26, 2025
Viaarxiv icon

On the Role of Depth and Looping for In-Context Learning with Task Diversity

Add code
Oct 29, 2024
Figure 1 for On the Role of Depth and Looping for In-Context Learning with Task Diversity
Figure 2 for On the Role of Depth and Looping for In-Context Learning with Task Diversity
Viaarxiv icon

A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs

Add code
Oct 24, 2024
Figure 1 for A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
Figure 2 for A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
Figure 3 for A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
Figure 4 for A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
Viaarxiv icon

Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?

Add code
Oct 10, 2024
Figure 1 for Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Figure 2 for Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Figure 3 for Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Figure 4 for Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Viaarxiv icon

On the Inductive Bias of Stacking Towards Improving Reasoning

Add code
Sep 27, 2024
Viaarxiv icon

Landscape-Aware Growing: The Power of a Little LAG

Add code
Jun 04, 2024
Figure 1 for Landscape-Aware Growing: The Power of a Little LAG
Figure 2 for Landscape-Aware Growing: The Power of a Little LAG
Figure 3 for Landscape-Aware Growing: The Power of a Little LAG
Figure 4 for Landscape-Aware Growing: The Power of a Little LAG
Viaarxiv icon

Efficient Stagewise Pretraining via Progressive Subnetworks

Add code
Feb 08, 2024
Viaarxiv icon

Reasoning in Large Language Models Through Symbolic Math Word Problems

Add code
Aug 03, 2023
Viaarxiv icon

Task-Specific Skill Localization in Fine-tuned Language Models

Add code
Feb 13, 2023
Figure 1 for Task-Specific Skill Localization in Fine-tuned Language Models
Figure 2 for Task-Specific Skill Localization in Fine-tuned Language Models
Figure 3 for Task-Specific Skill Localization in Fine-tuned Language Models
Figure 4 for Task-Specific Skill Localization in Fine-tuned Language Models
Viaarxiv icon