Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shima Rahimi Moghaddam

Boosting Theory-of-Mind Performance in Large Language Models via Prompting

Apr 26, 2023

Shima Rahimi Moghaddam, Christopher J. Honey

Figure 1 for Boosting Theory-of-Mind Performance in Large Language Models via Prompting

Figure 2 for Boosting Theory-of-Mind Performance in Large Language Models via Prompting

Figure 3 for Boosting Theory-of-Mind Performance in Large Language Models via Prompting

Figure 4 for Boosting Theory-of-Mind Performance in Large Language Models via Prompting

Abstract:Large language models (LLMs) excel in many tasks in 2023, but they still face challenges in complex reasoning. Theory-of-mind (ToM) tasks, which require understanding agents' beliefs, goals, and mental states, are essential for common-sense reasoning involving humans, making it crucial to enhance LLM performance in this area. This study measures the ToM performance of GPT-4 and three GPT-3.5 variants (Davinci-2, Davinci-3, GPT-3.5-Turbo), and investigates the effectiveness of in-context learning in improving their ToM comprehension. We evaluated prompts featuring two-shot chain of thought reasoning and step-by-step thinking instructions. We found that LLMs trained with Reinforcement Learning from Human Feedback (RLHF) (all models excluding Davinci-2) improved their ToM accuracy via in-context learning. GPT-4 performed best in zero-shot settings, reaching nearly 80% ToM accuracy, but still fell short of the 87% human accuracy on the test set. However, when supplied with prompts for in-context learning, all RLHF-trained LLMs exceeded 80% ToM accuracy, with GPT-4 reaching 100%. These results demonstrate that appropriate prompting enhances LLM ToM reasoning, and they underscore the context-dependent nature of LLM cognitive capacities.

* 27 pages, 4 main figures, 2 supplementary figures

Via

Access Paper or Ask Questions

Learning Representations from Temporally Smooth Data

Dec 12, 2020

Shima Rahimi Moghaddam, Fanjun Bu, Christopher J. Honey

Figure 1 for Learning Representations from Temporally Smooth Data

Figure 2 for Learning Representations from Temporally Smooth Data

Figure 3 for Learning Representations from Temporally Smooth Data

Figure 4 for Learning Representations from Temporally Smooth Data

Abstract:Events in the real world are correlated across nearby points in time, and we must learn from this temporally smooth data. However, when neural networks are trained to categorize or reconstruct single items, the common practice is to randomize the order of training items. What are the effects of temporally smooth training data on the efficiency of learning? We first tested the effects of smoothness in training data on incremental learning in feedforward nets and found that smoother data slowed learning. Moreover, sampling so as to minimize temporal smoothness produced more efficient learning than sampling randomly. If smoothness generally impairs incremental learning, then how can networks be modified to benefit from smoothness in the training data? We hypothesized that two simple brain-inspired mechanisms, leaky memory in activation units and memory-gating, could enable networks to rapidly extract useful representations from smooth data. Across all levels of data smoothness, these brain-inspired architectures achieved more efficient category learning than feedforward networks. This advantage persisted, even when leaky memory networks with gating were trained on smooth data and tested on randomly-ordered data. Finally, we investigated how these brain-inspired mechanisms altered the internal representations learned by the networks. We found that networks with multi-scale leaky memory and memory-gating could learn internal representations that un-mixed data sources which vary on fast and slow timescales across training samples. Altogether, we identified simple mechanisms enabling neural networks to learn more quickly from temporally smooth data, and to generate internal representations that separate timescales in the training signal.

Via

Access Paper or Ask Questions