Picture for Joshua Susskind

Joshua Susskind

Target Concrete Score Matching: A Holistic Framework for Discrete Diffusion

Add code
Apr 23, 2025
Figure 1 for Target Concrete Score Matching: A Holistic Framework for Discrete Diffusion
Figure 2 for Target Concrete Score Matching: A Holistic Framework for Discrete Diffusion
Figure 3 for Target Concrete Score Matching: A Holistic Framework for Discrete Diffusion
Figure 4 for Target Concrete Score Matching: A Holistic Framework for Discrete Diffusion
Viaarxiv icon

Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

Add code
Apr 10, 2025
Figure 1 for Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
Figure 2 for Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
Figure 3 for Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
Figure 4 for Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
Viaarxiv icon

World-consistent Video Diffusion with Explicit 3D Modeling

Add code
Dec 02, 2024
Viaarxiv icon

How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks

Add code
Jul 03, 2024
Figure 1 for How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Figure 2 for How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Figure 3 for How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Figure 4 for How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Viaarxiv icon

Vanishing Gradients in Reinforcement Finetuning of Language Models

Add code
Oct 31, 2023
Figure 1 for Vanishing Gradients in Reinforcement Finetuning of Language Models
Figure 2 for Vanishing Gradients in Reinforcement Finetuning of Language Models
Figure 3 for Vanishing Gradients in Reinforcement Finetuning of Language Models
Figure 4 for Vanishing Gradients in Reinforcement Finetuning of Language Models
Viaarxiv icon

When can transformers reason with abstract symbols?

Add code
Oct 15, 2023
Figure 1 for When can transformers reason with abstract symbols?
Figure 2 for When can transformers reason with abstract symbols?
Figure 3 for When can transformers reason with abstract symbols?
Figure 4 for When can transformers reason with abstract symbols?
Viaarxiv icon

Transformers learn through gradual rank increase

Add code
Jun 12, 2023
Figure 1 for Transformers learn through gradual rank increase
Figure 2 for Transformers learn through gradual rank increase
Figure 3 for Transformers learn through gradual rank increase
Figure 4 for Transformers learn through gradual rank increase
Viaarxiv icon

Position Prediction as an Effective Pretraining Strategy

Add code
Jul 15, 2022
Figure 1 for Position Prediction as an Effective Pretraining Strategy
Figure 2 for Position Prediction as an Effective Pretraining Strategy
Figure 3 for Position Prediction as an Effective Pretraining Strategy
Figure 4 for Position Prediction as an Effective Pretraining Strategy
Viaarxiv icon

The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

Add code
Jun 13, 2022
Figure 1 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Figure 2 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Figure 3 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Figure 4 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Viaarxiv icon

Efficient Embedding of Semantic Similarity in Control Policies via Entangled Bisimulation

Add code
Jan 28, 2022
Figure 1 for Efficient Embedding of Semantic Similarity in Control Policies via Entangled Bisimulation
Figure 2 for Efficient Embedding of Semantic Similarity in Control Policies via Entangled Bisimulation
Figure 3 for Efficient Embedding of Semantic Similarity in Control Policies via Entangled Bisimulation
Figure 4 for Efficient Embedding of Semantic Similarity in Control Policies via Entangled Bisimulation
Viaarxiv icon