Picture for Seijin Kobayashi

Seijin Kobayashi

Weight decay induces low-rank attention layers

Add code
Oct 31, 2024
Figure 1 for Weight decay induces low-rank attention layers
Figure 2 for Weight decay induces low-rank attention layers
Figure 3 for Weight decay induces low-rank attention layers
Figure 4 for Weight decay induces low-rank attention layers
Viaarxiv icon

Multi-agent cooperation through learning-aware policy gradients

Add code
Oct 24, 2024
Figure 1 for Multi-agent cooperation through learning-aware policy gradients
Figure 2 for Multi-agent cooperation through learning-aware policy gradients
Figure 3 for Multi-agent cooperation through learning-aware policy gradients
Figure 4 for Multi-agent cooperation through learning-aware policy gradients
Viaarxiv icon

Learning Randomized Algorithms with Transformers

Add code
Aug 20, 2024
Viaarxiv icon

When can transformers compositionally generalize in-context?

Add code
Jul 17, 2024
Viaarxiv icon

Attention as a Hypernetwork

Add code
Jun 09, 2024
Viaarxiv icon

Discovering modular solutions that generalize compositionally

Add code
Dec 22, 2023
Viaarxiv icon

Uncovering mesa-optimization algorithms in Transformers

Add code
Sep 11, 2023
Figure 1 for Uncovering mesa-optimization algorithms in Transformers
Figure 2 for Uncovering mesa-optimization algorithms in Transformers
Figure 3 for Uncovering mesa-optimization algorithms in Transformers
Figure 4 for Uncovering mesa-optimization algorithms in Transformers
Viaarxiv icon

Gated recurrent neural networks discover attention

Add code
Sep 04, 2023
Viaarxiv icon

Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis

Add code
Jun 29, 2023
Viaarxiv icon

Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel

Add code
Oct 18, 2022
Figure 1 for Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
Figure 2 for Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
Figure 3 for Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
Figure 4 for Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
Viaarxiv icon