Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andrew J. Nam

Out-of-Distribution Generalization in Algorithmic Reasoning Through Curriculum Learning

Oct 07, 2022

Andrew J. Nam, Mustafa Abdool, Trevor Maxfield, James L. McClelland

Figure 1 for Out-of-Distribution Generalization in Algorithmic Reasoning Through Curriculum Learning

Figure 2 for Out-of-Distribution Generalization in Algorithmic Reasoning Through Curriculum Learning

Figure 3 for Out-of-Distribution Generalization in Algorithmic Reasoning Through Curriculum Learning

Abstract:Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks, and is quite apparent in tasks with well-defined variables and rules, where explicit use of the rules can solve problems independently of the particular values of the variables. Large transformer-based language models have pushed the boundaries on how well neural networks can generalize to novel inputs, but their complexity obfuscates they achieve such robustness. As a step toward understanding how transformer-based systems generalize, we explore the question of OODG in smaller scale transformers. Using a reasoning task based on the puzzle Sudoku, we show that OODG can occur on complex problems if the training set includes examples sampled from the whole distribution of simpler component tasks.

Via

Access Paper or Ask Questions

Learning to Reason With Relational Abstractions

Oct 06, 2022

Andrew J. Nam, Mengye Ren, Chelsea Finn, James L. McClelland

Figure 1 for Learning to Reason With Relational Abstractions

Figure 2 for Learning to Reason With Relational Abstractions

Figure 3 for Learning to Reason With Relational Abstractions

Figure 4 for Learning to Reason With Relational Abstractions

Abstract:Large language models have recently shown promising progress in mathematical reasoning when fine-tuned with human-generated sequences walking through a sequence of solution steps. However, the solution sequences are not formally structured and the resulting model-generated sequences may not reflect the kind of systematic reasoning we might expect an expert human to produce. In this paper, we study how to build stronger reasoning capability in language models using the idea of relational abstractions. We introduce new types of sequences that more explicitly provide an abstract characterization of the transitions through intermediate solution steps to the goal state. We find that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy, and models that are trained to produce such sequences solve problems better than those that are trained with previously used human-generated sequences and other baselines. Our work thus takes several steps toward elucidating and improving how language models perform on tasks requiring multi-step mathematical reasoning.

Via

Access Paper or Ask Questions