Picture for Melanie Mitchell

Melanie Mitchell

Evaluating the Robustness of Analogical Reasoning in Large Language Models

Add code
Nov 21, 2024
Figure 1 for Evaluating the Robustness of Analogical Reasoning in Large Language Models
Figure 2 for Evaluating the Robustness of Analogical Reasoning in Large Language Models
Figure 3 for Evaluating the Robustness of Analogical Reasoning in Large Language Models
Figure 4 for Evaluating the Robustness of Analogical Reasoning in Large Language Models
Viaarxiv icon

Can Large Language Models generalize analogy solving like people can?

Add code
Nov 04, 2024
Viaarxiv icon

Imagining and building wise machines: The centrality of AI metacognition

Add code
Nov 04, 2024
Figure 1 for Imagining and building wise machines: The centrality of AI metacognition
Figure 2 for Imagining and building wise machines: The centrality of AI metacognition
Figure 3 for Imagining and building wise machines: The centrality of AI metacognition
Viaarxiv icon

Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models

Add code
Feb 14, 2024
Viaarxiv icon

Perspectives on the State and Future of Deep Learning - 2023

Add code
Dec 19, 2023
Viaarxiv icon

Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks

Add code
Nov 26, 2023
Viaarxiv icon

The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain

Add code
May 11, 2023
Viaarxiv icon

Gathering Strength, Gathering Storms: The One Hundred Year Study on Artificial Intelligence (AI100) 2021 Study Panel Report

Add code
Oct 27, 2022
Viaarxiv icon

Embodied, Situated, and Grounded Intelligence: Implications for AI

Add code
Oct 24, 2022
Viaarxiv icon

Evaluating Understanding on Conceptual Abstraction Benchmarks

Add code
Jun 28, 2022
Figure 1 for Evaluating Understanding on Conceptual Abstraction Benchmarks
Figure 2 for Evaluating Understanding on Conceptual Abstraction Benchmarks
Figure 3 for Evaluating Understanding on Conceptual Abstraction Benchmarks
Figure 4 for Evaluating Understanding on Conceptual Abstraction Benchmarks
Viaarxiv icon