Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Diverse capability and scaling of diffusion and auto-regressive models when learning abstract rules

Nov 12, 2024

Binxu Wang, Jiaqi Shang, Haim Sompolinsky

Figure 1 for Diverse capability and scaling of diffusion and auto-regressive models when learning abstract rules

Figure 2 for Diverse capability and scaling of diffusion and auto-regressive models when learning abstract rules

Figure 3 for Diverse capability and scaling of diffusion and auto-regressive models when learning abstract rules

Figure 4 for Diverse capability and scaling of diffusion and auto-regressive models when learning abstract rules

Share this with someone who'll enjoy it:

Abstract:Humans excel at discovering regular structures from limited samples and applying inferred rules to novel settings. We investigate whether modern generative models can similarly learn underlying rules from finite samples and perform reasoning through conditional sampling. Inspired by Raven's Progressive Matrices task, we designed GenRAVEN dataset, where each sample consists of three rows, and one of 40 relational rules governing the object position, number, or attributes applies to all rows. We trained generative models to learn the data distribution, where samples are encoded as integer arrays to focus on rule learning. We compared two generative model families: diffusion (EDM, DiT, SiT) and autoregressive models (GPT2, Mamba). We evaluated their ability to generate structurally consistent samples and perform panel completion via unconditional and conditional sampling. We found diffusion models excel at unconditional generation, producing more novel and consistent samples from scratch and memorizing less, but performing less well in panel completion, even with advanced conditional sampling methods. Conversely, autoregressive models excel at completing missing panels in a rule-consistent manner but generate less consistent samples unconditionally. We observe diverse data scaling behaviors: for both model families, rule learning emerges at a certain dataset size - around 1000s examples per rule. With more training data, diffusion models improve both their unconditional and conditional generation capabilities. However, for autoregressive models, while panel completion improves with more training data, unconditional generation consistency declines. Our findings highlight complementary capabilities and limitations of diffusion and autoregressive models in rule learning and reasoning tasks, suggesting avenues for further research into their mechanisms and potential for human-like reasoning.

* 12 pages, 5 figures. Accepted to NeurIPS2024 Workshop on System 2 Reasoning At Scale as long paper

View paper on

Share this with someone who'll enjoy it:

Title:Diverse capability and scaling of diffusion and auto-regressive models when learning abstract rules

Paper and Code