Picture for Omid Saremi

Omid Saremi

How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks

Add code
Jul 03, 2024
Viaarxiv icon

How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad

Add code
Jun 10, 2024
Viaarxiv icon

LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures

Add code
Dec 07, 2023
Figure 1 for LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures
Figure 2 for LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures
Figure 3 for LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures
Figure 4 for LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures
Viaarxiv icon

Vanishing Gradients in Reinforcement Finetuning of Language Models

Add code
Oct 31, 2023
Figure 1 for Vanishing Gradients in Reinforcement Finetuning of Language Models
Figure 2 for Vanishing Gradients in Reinforcement Finetuning of Language Models
Figure 3 for Vanishing Gradients in Reinforcement Finetuning of Language Models
Figure 4 for Vanishing Gradients in Reinforcement Finetuning of Language Models
Viaarxiv icon

What Algorithms can Transformers Learn? A Study in Length Generalization

Add code
Oct 24, 2023
Figure 1 for What Algorithms can Transformers Learn? A Study in Length Generalization
Figure 2 for What Algorithms can Transformers Learn? A Study in Length Generalization
Figure 3 for What Algorithms can Transformers Learn? A Study in Length Generalization
Figure 4 for What Algorithms can Transformers Learn? A Study in Length Generalization
Viaarxiv icon

When can transformers reason with abstract symbols?

Add code
Oct 15, 2023
Figure 1 for When can transformers reason with abstract symbols?
Figure 2 for When can transformers reason with abstract symbols?
Figure 3 for When can transformers reason with abstract symbols?
Figure 4 for When can transformers reason with abstract symbols?
Viaarxiv icon

Adaptivity and Modularity for Efficient Generalization Over Task Complexity

Add code
Oct 13, 2023
Viaarxiv icon

The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

Add code
Jun 13, 2022
Figure 1 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Figure 2 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Figure 3 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Figure 4 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Viaarxiv icon

Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks

Add code
Jul 02, 2021
Figure 1 for Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks
Figure 2 for Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks
Figure 3 for Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks
Figure 4 for Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks
Viaarxiv icon

Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks

Add code
Jul 02, 2021
Figure 1 for Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks
Figure 2 for Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks
Figure 3 for Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks
Figure 4 for Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks
Viaarxiv icon