Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:The Cognitive Capabilities of Generative AI: A Comparative Analysis with Human Benchmarks

Oct 09, 2024

Isaac R. Galatzer-Levy, David Munday, Jed McGiffin, Xin Liu, Danny Karmon, Ilia Labzovsky, Rivka Moroshko, Amir Zait, Daniel McDuff

Figure 1 for The Cognitive Capabilities of Generative AI: A Comparative Analysis with Human Benchmarks

Figure 2 for The Cognitive Capabilities of Generative AI: A Comparative Analysis with Human Benchmarks

Figure 3 for The Cognitive Capabilities of Generative AI: A Comparative Analysis with Human Benchmarks

Figure 4 for The Cognitive Capabilities of Generative AI: A Comparative Analysis with Human Benchmarks

Share this with someone who'll enjoy it:

Abstract:There is increasing interest in tracking the capabilities of general intelligence foundation models. This study benchmarks leading large language models and vision language models against human performance on the Wechsler Adult Intelligence Scale (WAIS-IV), a comprehensive, population-normed assessment of underlying human cognition and intellectual abilities, with a focus on the domains of VerbalComprehension (VCI), Working Memory (WMI), and Perceptual Reasoning (PRI). Most models demonstrated exceptional capabilities in the storage, retrieval, and manipulation of tokens such as arbitrary sequences of letters and numbers, with performance on the Working Memory Index (WMI) greater or equal to the 99.5th percentile when compared to human population normative ability. Performance on the Verbal Comprehension Index (VCI) which measures retrieval of acquired information, and linguistic understanding about the meaning of words and their relationships to each other, also demonstrated consistent performance at or above the 98th percentile. Despite these broad strengths, we observed consistently poor performance on the Perceptual Reasoning Index (PRI; range 0.1-10th percentile) from multimodal models indicating profound inability to interpret and reason on visual information. Smaller and older model versions consistently performed worse, indicating that training data, parameter count and advances in tuning are resulting in significant advances in cognitive ability.

View paper on

Share this with someone who'll enjoy it:

Title:The Cognitive Capabilities of Generative AI: A Comparative Analysis with Human Benchmarks

Paper and Code