Picture for Melanie Mitchell

Melanie Mitchell

Evaluating the Robustness of Analogical Reasoning in Large Language Models

Add code
Nov 21, 2024
Figure 1 for Evaluating the Robustness of Analogical Reasoning in Large Language Models
Figure 2 for Evaluating the Robustness of Analogical Reasoning in Large Language Models
Figure 3 for Evaluating the Robustness of Analogical Reasoning in Large Language Models
Figure 4 for Evaluating the Robustness of Analogical Reasoning in Large Language Models
Viaarxiv icon

Can Large Language Models generalize analogy solving like people can?

Add code
Nov 04, 2024
Figure 1 for Can Large Language Models generalize analogy solving like people can?
Figure 2 for Can Large Language Models generalize analogy solving like people can?
Figure 3 for Can Large Language Models generalize analogy solving like people can?
Figure 4 for Can Large Language Models generalize analogy solving like people can?
Viaarxiv icon

Imagining and building wise machines: The centrality of AI metacognition

Add code
Nov 04, 2024
Figure 1 for Imagining and building wise machines: The centrality of AI metacognition
Figure 2 for Imagining and building wise machines: The centrality of AI metacognition
Figure 3 for Imagining and building wise machines: The centrality of AI metacognition
Viaarxiv icon

Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models

Add code
Feb 14, 2024
Figure 1 for Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models
Figure 2 for Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models
Figure 3 for Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models
Figure 4 for Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models
Viaarxiv icon

Perspectives on the State and Future of Deep Learning - 2023

Add code
Dec 19, 2023
Viaarxiv icon

Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks

Add code
Nov 26, 2023
Viaarxiv icon

The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain

Add code
May 11, 2023
Figure 1 for The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain
Figure 2 for The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain
Figure 3 for The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain
Figure 4 for The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain
Viaarxiv icon

Gathering Strength, Gathering Storms: The One Hundred Year Study on Artificial Intelligence (AI100) 2021 Study Panel Report

Add code
Oct 27, 2022
Viaarxiv icon

Embodied, Situated, and Grounded Intelligence: Implications for AI

Add code
Oct 24, 2022
Viaarxiv icon

Evaluating Understanding on Conceptual Abstraction Benchmarks

Add code
Jun 28, 2022
Figure 1 for Evaluating Understanding on Conceptual Abstraction Benchmarks
Figure 2 for Evaluating Understanding on Conceptual Abstraction Benchmarks
Figure 3 for Evaluating Understanding on Conceptual Abstraction Benchmarks
Figure 4 for Evaluating Understanding on Conceptual Abstraction Benchmarks
Viaarxiv icon