Picture for Guy Gur-Ari

Guy Gur-Ari

Shammie

Towards Understanding Inductive Bias in Transformers: A View From Infinity

Add code
Feb 07, 2024
Viaarxiv icon

PaLM 2 Technical Report

Add code
May 17, 2023
Viaarxiv icon

Exploring Length Generalization in Large Language Models

Add code
Jul 11, 2022
Figure 1 for Exploring Length Generalization in Large Language Models
Figure 2 for Exploring Length Generalization in Large Language Models
Figure 3 for Exploring Length Generalization in Large Language Models
Figure 4 for Exploring Length Generalization in Large Language Models
Viaarxiv icon

Solving Quantitative Reasoning Problems with Language Models

Add code
Jul 01, 2022
Figure 1 for Solving Quantitative Reasoning Problems with Language Models
Figure 2 for Solving Quantitative Reasoning Problems with Language Models
Figure 3 for Solving Quantitative Reasoning Problems with Language Models
Figure 4 for Solving Quantitative Reasoning Problems with Language Models
Viaarxiv icon

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Add code
Jun 10, 2022
Viaarxiv icon

PaLM: Scaling Language Modeling with Pathways

Add code
Apr 19, 2022
Figure 1 for PaLM: Scaling Language Modeling with Pathways
Figure 2 for PaLM: Scaling Language Modeling with Pathways
Figure 3 for PaLM: Scaling Language Modeling with Pathways
Figure 4 for PaLM: Scaling Language Modeling with Pathways
Viaarxiv icon

Show Your Work: Scratchpads for Intermediate Computation with Language Models

Add code
Nov 30, 2021
Figure 1 for Show Your Work: Scratchpads for Intermediate Computation with Language Models
Figure 2 for Show Your Work: Scratchpads for Intermediate Computation with Language Models
Figure 3 for Show Your Work: Scratchpads for Intermediate Computation with Language Models
Figure 4 for Show Your Work: Scratchpads for Intermediate Computation with Language Models
Viaarxiv icon

Are wider nets better given the same number of parameters?

Add code
Oct 27, 2020
Figure 1 for Are wider nets better given the same number of parameters?
Figure 2 for Are wider nets better given the same number of parameters?
Figure 3 for Are wider nets better given the same number of parameters?
Figure 4 for Are wider nets better given the same number of parameters?
Viaarxiv icon

On the training dynamics of deep networks with $L_2$ regularization

Add code
Jun 15, 2020
Figure 1 for On the training dynamics of deep networks with $L_2$ regularization
Figure 2 for On the training dynamics of deep networks with $L_2$ regularization
Figure 3 for On the training dynamics of deep networks with $L_2$ regularization
Figure 4 for On the training dynamics of deep networks with $L_2$ regularization
Viaarxiv icon

On the asymptotics of wide networks with polynomial activations

Add code
Jun 11, 2020
Figure 1 for On the asymptotics of wide networks with polynomial activations
Figure 2 for On the asymptotics of wide networks with polynomial activations
Figure 3 for On the asymptotics of wide networks with polynomial activations
Figure 4 for On the asymptotics of wide networks with polynomial activations
Viaarxiv icon