Picture for Seijin Kobayashi

Seijin Kobayashi

Weight decay induces low-rank attention layers

Add code
Oct 31, 2024
Figure 1 for Weight decay induces low-rank attention layers
Figure 2 for Weight decay induces low-rank attention layers
Figure 3 for Weight decay induces low-rank attention layers
Figure 4 for Weight decay induces low-rank attention layers
Viaarxiv icon

Multi-agent cooperation through learning-aware policy gradients

Add code
Oct 24, 2024
Figure 1 for Multi-agent cooperation through learning-aware policy gradients
Figure 2 for Multi-agent cooperation through learning-aware policy gradients
Figure 3 for Multi-agent cooperation through learning-aware policy gradients
Figure 4 for Multi-agent cooperation through learning-aware policy gradients
Viaarxiv icon

Learning Randomized Algorithms with Transformers

Add code
Aug 20, 2024
Viaarxiv icon

When can transformers compositionally generalize in-context?

Add code
Jul 17, 2024
Viaarxiv icon

Attention as a Hypernetwork

Add code
Jun 09, 2024
Viaarxiv icon

Discovering modular solutions that generalize compositionally

Add code
Dec 22, 2023
Viaarxiv icon

Uncovering mesa-optimization algorithms in Transformers

Add code
Sep 11, 2023
Figure 1 for Uncovering mesa-optimization algorithms in Transformers
Figure 2 for Uncovering mesa-optimization algorithms in Transformers
Figure 3 for Uncovering mesa-optimization algorithms in Transformers
Figure 4 for Uncovering mesa-optimization algorithms in Transformers
Viaarxiv icon

Gated recurrent neural networks discover attention

Add code
Sep 04, 2023
Viaarxiv icon

Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis

Add code
Jun 29, 2023
Figure 1 for Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis
Figure 2 for Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis
Figure 3 for Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis
Figure 4 for Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis
Viaarxiv icon

Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel

Add code
Oct 18, 2022
Figure 1 for Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
Figure 2 for Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
Figure 3 for Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
Figure 4 for Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
Viaarxiv icon