Picture for Abhishek Panigrahi

Abhishek Panigrahi

Progressive distillation induces an implicit curriculum

Add code
Oct 07, 2024
Viaarxiv icon

Representing Rule-based Chatbots with Transformers

Add code
Jul 15, 2024
Figure 1 for Representing Rule-based Chatbots with Transformers
Figure 2 for Representing Rule-based Chatbots with Transformers
Figure 3 for Representing Rule-based Chatbots with Transformers
Figure 4 for Representing Rule-based Chatbots with Transformers
Viaarxiv icon

Efficient Stagewise Pretraining via Progressive Subnetworks

Add code
Feb 08, 2024
Viaarxiv icon

Trainable Transformer in Transformer

Add code
Jul 03, 2023
Viaarxiv icon

Do Transformers Parse while Predicting the Masked Word?

Add code
Mar 14, 2023
Figure 1 for Do Transformers Parse while Predicting the Masked Word?
Figure 2 for Do Transformers Parse while Predicting the Masked Word?
Figure 3 for Do Transformers Parse while Predicting the Masked Word?
Figure 4 for Do Transformers Parse while Predicting the Masked Word?
Viaarxiv icon

Task-Specific Skill Localization in Fine-tuned Language Models

Add code
Feb 13, 2023
Viaarxiv icon

On the SDEs and Scaling Rules for Adaptive Gradient Algorithms

Add code
May 20, 2022
Figure 1 for On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Figure 2 for On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Figure 3 for On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Figure 4 for On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Viaarxiv icon

Understanding Gradient Descent on Edge of Stability in Deep Learning

Add code
May 19, 2022
Figure 1 for Understanding Gradient Descent on Edge of Stability in Deep Learning
Figure 2 for Understanding Gradient Descent on Edge of Stability in Deep Learning
Figure 3 for Understanding Gradient Descent on Edge of Stability in Deep Learning
Figure 4 for Understanding Gradient Descent on Edge of Stability in Deep Learning
Viaarxiv icon

Learning and Generalization in RNNs

Add code
May 31, 2021
Figure 1 for Learning and Generalization in RNNs
Figure 2 for Learning and Generalization in RNNs
Viaarxiv icon

Non-Gaussianity of Stochastic Gradient Noise

Add code
Oct 25, 2019
Figure 1 for Non-Gaussianity of Stochastic Gradient Noise
Figure 2 for Non-Gaussianity of Stochastic Gradient Noise
Figure 3 for Non-Gaussianity of Stochastic Gradient Noise
Figure 4 for Non-Gaussianity of Stochastic Gradient Noise
Viaarxiv icon