Picture for Srinadh Bhojanapalli

Srinadh Bhojanapalli

Dj

Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count

Add code
Oct 21, 2024
Viaarxiv icon

Mimetic Initialization Helps State Space Models Learn to Recall

Add code
Oct 14, 2024
Figure 1 for Mimetic Initialization Helps State Space Models Learn to Recall
Figure 2 for Mimetic Initialization Helps State Space Models Learn to Recall
Figure 3 for Mimetic Initialization Helps State Space Models Learn to Recall
Figure 4 for Mimetic Initialization Helps State Space Models Learn to Recall
Viaarxiv icon

Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers

Add code
May 31, 2024
Viaarxiv icon

Efficient Language Model Architectures for Differentially Private Federated Learning

Add code
Mar 12, 2024
Figure 1 for Efficient Language Model Architectures for Differentially Private Federated Learning
Figure 2 for Efficient Language Model Architectures for Differentially Private Federated Learning
Figure 3 for Efficient Language Model Architectures for Differentially Private Federated Learning
Figure 4 for Efficient Language Model Architectures for Differentially Private Federated Learning
Viaarxiv icon

HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference

Add code
Feb 14, 2024
Viaarxiv icon

Efficacy of Dual-Encoders for Extreme Multi-Label Classification

Add code
Oct 16, 2023
Viaarxiv icon

Functional Interpolation for Relative Positions Improves Long Context Transformers

Add code
Oct 06, 2023
Viaarxiv icon

Depth Dependence of $μ$P Learning Rates in ReLU MLPs

Add code
May 13, 2023
Viaarxiv icon

On student-teacher deviations in distillation: does it pay to disobey?

Add code
Jan 30, 2023
Viaarxiv icon

On the Adversarial Robustness of Mixture of Experts

Add code
Oct 19, 2022
Figure 1 for On the Adversarial Robustness of Mixture of Experts
Figure 2 for On the Adversarial Robustness of Mixture of Experts
Figure 3 for On the Adversarial Robustness of Mixture of Experts
Figure 4 for On the Adversarial Robustness of Mixture of Experts
Viaarxiv icon