Picture for Jascha Sohl-Dickstein

Jascha Sohl-Dickstein

Shammie

Scaling Exponents Across Parameterizations and Optimizers

Add code
Jul 08, 2024
Figure 1 for Scaling Exponents Across Parameterizations and Optimizers
Figure 2 for Scaling Exponents Across Parameterizations and Optimizers
Figure 3 for Scaling Exponents Across Parameterizations and Optimizers
Figure 4 for Scaling Exponents Across Parameterizations and Optimizers
Viaarxiv icon

Training LLMs over Neurally Compressed Text

Add code
Apr 04, 2024
Figure 1 for Training LLMs over Neurally Compressed Text
Figure 2 for Training LLMs over Neurally Compressed Text
Figure 3 for Training LLMs over Neurally Compressed Text
Figure 4 for Training LLMs over Neurally Compressed Text
Viaarxiv icon

The boundary of neural network trainability is fractal

Add code
Feb 09, 2024
Viaarxiv icon

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Add code
Dec 22, 2023
Viaarxiv icon

Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?

Add code
Nov 15, 2023
Figure 1 for Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?
Figure 2 for Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?
Figure 3 for Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?
Figure 4 for Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?
Viaarxiv icon

Noise-Reuse in Online Evolution Strategies

Add code
Apr 21, 2023
Viaarxiv icon

Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC

Add code
Feb 22, 2023
Figure 1 for Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC
Figure 2 for Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC
Figure 3 for Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC
Figure 4 for Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC
Viaarxiv icon

General-Purpose In-Context Learning by Meta-Learning Transformers

Add code
Dec 08, 2022
Viaarxiv icon

VeLO: Training Versatile Learned Optimizers by Scaling Up

Add code
Nov 17, 2022
Viaarxiv icon

A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases

Add code
Sep 22, 2022
Figure 1 for A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases
Figure 2 for A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases
Figure 3 for A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases
Figure 4 for A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases
Viaarxiv icon