Picture for Darshil Doshi

Darshil Doshi

Grokking Modular Polynomials

Add code
Jun 05, 2024
Viaarxiv icon

Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks

Add code
Jun 04, 2024
Figure 1 for Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
Figure 2 for Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
Figure 3 for Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
Figure 4 for Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
Viaarxiv icon

To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets

Add code
Oct 19, 2023
Viaarxiv icon

AutoInit: Automatic Initialization via Jacobian Tuning

Add code
Jun 27, 2022
Figure 1 for AutoInit: Automatic Initialization via Jacobian Tuning
Figure 2 for AutoInit: Automatic Initialization via Jacobian Tuning
Figure 3 for AutoInit: Automatic Initialization via Jacobian Tuning
Figure 4 for AutoInit: Automatic Initialization via Jacobian Tuning
Viaarxiv icon

Critical initialization of wide and deep neural networks through partial Jacobians: general theory and applications to LayerNorm

Add code
Nov 30, 2021
Figure 1 for Critical initialization of wide and deep neural networks through partial Jacobians: general theory and applications to LayerNorm
Figure 2 for Critical initialization of wide and deep neural networks through partial Jacobians: general theory and applications to LayerNorm
Figure 3 for Critical initialization of wide and deep neural networks through partial Jacobians: general theory and applications to LayerNorm
Figure 4 for Critical initialization of wide and deep neural networks through partial Jacobians: general theory and applications to LayerNorm
Viaarxiv icon