Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kateryna Pistunova

Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting

Jul 23, 2023

Rylan Schaeffer, Kateryna Pistunova, Samar Khanna, Sarthak Consul, Sanmi Koyejo

Figure 1 for Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting

Figure 2 for Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting

Figure 3 for Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting

Figure 4 for Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting

Abstract:Language models can be prompted to reason through problems in a manner that significantly improves performance. However, \textit{why} such prompting improves performance is unclear. Recent work showed that using logically \textit{invalid} Chain-of-Thought (CoT) prompting improves performance almost as much as logically \textit{valid} CoT prompting, and that editing CoT prompts to replace problem-specific information with abstract information or out-of-distribution information typically doesn't harm performance. Critics have responded that these findings are based on too few and too easily solved tasks to draw meaningful conclusions. To resolve this dispute, we test whether logically invalid CoT prompts offer the same level of performance gains as logically valid prompts on the hardest tasks in the BIG-Bench benchmark, termed BIG-Bench Hard (BBH). We find that the logically \textit{invalid} reasoning prompts do indeed achieve similar performance gains on BBH tasks as logically valid reasoning prompts. We also discover that some CoT prompts used by previous works contain logical errors. This suggests that covariates beyond logically valid reasoning are responsible for performance improvements.

* ICML 2023 Workshop: Knowledge and Logical Reasoning in the Era of Data-driven Learning

Via

Access Paper or Ask Questions

Extracting Molecular Properties from Natural Language with Multimodal Contrastive Learning

Jul 22, 2023

Romain Lacombe, Andrew Gaut, Jeff He, David Lüdeke, Kateryna Pistunova

Figure 1 for Extracting Molecular Properties from Natural Language with Multimodal Contrastive Learning

Figure 2 for Extracting Molecular Properties from Natural Language with Multimodal Contrastive Learning

Figure 3 for Extracting Molecular Properties from Natural Language with Multimodal Contrastive Learning

Figure 4 for Extracting Molecular Properties from Natural Language with Multimodal Contrastive Learning

Abstract:Deep learning in computational biochemistry has traditionally focused on molecular graphs neural representations; however, recent advances in language models highlight how much scientific knowledge is encoded in text. To bridge these two modalities, we investigate how molecular property information can be transferred from natural language to graph representations. We study property prediction performance gains after using contrastive learning to align neural graph representations with representations of textual descriptions of their characteristics. We implement neural relevance scoring strategies to improve text retrieval, introduce a novel chemically-valid molecular graph augmentation strategy inspired by organic reactions, and demonstrate improved performance on downstream MoleculeNet property classification tasks. We achieve a +4.26% AUROC gain versus models pre-trained on the graph modality alone, and a +1.54% gain compared to recently proposed molecular graph/text contrastively trained MoMu model (Su et al. 2022).

* 2023 ICML Workshop on Computational Biology

Via

Access Paper or Ask Questions

Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle

Mar 24, 2023

Rylan Schaeffer, Mikail Khona, Zachary Robertson, Akhilan Boopathy, Kateryna Pistunova, Jason W. Rocks, Ila Rani Fiete, Oluwasanmi Koyejo

Figure 1 for Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle

Figure 2 for Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle

Figure 3 for Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle

Figure 4 for Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle

Abstract:Double descent is a surprising phenomenon in machine learning, in which as the number of model parameters grows relative to the number of data, test error drops as models grow ever larger into the highly overparameterized (data undersampled) regime. This drop in test error flies against classical learning theory on overfitting and has arguably underpinned the success of large models in machine learning. This non-monotonic behavior of test loss depends on the number of data, the dimensionality of the data and the number of model parameters. Here, we briefly describe double descent, then provide an explanation of why double descent occurs in an informal and approachable manner, requiring only familiarity with linear algebra and introductory probability. We provide visual intuition using polynomial regression, then mathematically analyze double descent with ordinary linear regression and identify three interpretable factors that, when simultaneously all present, together create double descent. We demonstrate that double descent occurs on real data when using ordinary linear regression, then demonstrate that double descent does not occur when any of the three factors are ablated. We use this understanding to shed light on recent observations in nonlinear models concerning superposition and double descent. Code is publicly available.

Via

Access Paper or Ask Questions