Picture for Johannes von Oswald

Johannes von Oswald

Institute of Neuroinformatics, ETH Zürich and University of Zürich, Zürich, Switzerland

Multi-agent cooperation through learning-aware policy gradients

Add code
Oct 24, 2024
Figure 1 for Multi-agent cooperation through learning-aware policy gradients
Figure 2 for Multi-agent cooperation through learning-aware policy gradients
Figure 3 for Multi-agent cooperation through learning-aware policy gradients
Figure 4 for Multi-agent cooperation through learning-aware policy gradients
Viaarxiv icon

Learning Randomized Algorithms with Transformers

Add code
Aug 20, 2024
Viaarxiv icon

When can transformers compositionally generalize in-context?

Add code
Jul 17, 2024
Viaarxiv icon

State Soup: In-Context Skill Learning, Retrieval and Mixing

Add code
Jun 12, 2024
Viaarxiv icon

Linear Transformers are Versatile In-Context Learners

Add code
Feb 21, 2024
Viaarxiv icon

Discovering modular solutions that generalize compositionally

Add code
Dec 22, 2023
Viaarxiv icon

Uncovering mesa-optimization algorithms in Transformers

Add code
Sep 11, 2023
Viaarxiv icon

Gated recurrent neural networks discover attention

Add code
Sep 04, 2023
Viaarxiv icon

Transformers learn in-context by gradient descent

Add code
Dec 15, 2022
Viaarxiv icon

Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel

Add code
Oct 18, 2022
Figure 1 for Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
Figure 2 for Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
Figure 3 for Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
Figure 4 for Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
Viaarxiv icon