Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Joy He-Yueya

Psychometric Alignment: Capturing Human Knowledge Distributions via Language Models

Jul 22, 2024

Joy He-Yueya, Wanjing Anya Ma, Kanishk Gandhi, Benjamin W. Domingue, Emma Brunskill, Noah D. Goodman

Figure 1 for Psychometric Alignment: Capturing Human Knowledge Distributions via Language Models

Figure 2 for Psychometric Alignment: Capturing Human Knowledge Distributions via Language Models

Figure 3 for Psychometric Alignment: Capturing Human Knowledge Distributions via Language Models

Figure 4 for Psychometric Alignment: Capturing Human Knowledge Distributions via Language Models

Abstract:Language models (LMs) are increasingly used to simulate human-like responses in scenarios where accurately mimicking a population's behavior can guide decision-making, such as in developing educational materials and designing public policies. The objective of these simulations is for LMs to capture the variations in human responses, rather than merely providing the expected correct answers. Prior work has shown that LMs often generate unrealistically accurate responses, but there are no established metrics to quantify how closely the knowledge distribution of LMs aligns with that of humans. To address this, we introduce "psychometric alignment," a metric that measures the extent to which LMs reflect human knowledge distribution. Assessing this alignment involves collecting responses from both LMs and humans to the same set of test items and using Item Response Theory to analyze the differences in item functioning between the groups. We demonstrate that our metric can capture important variations in populations that traditional metrics, like differences in accuracy, fail to capture. We apply this metric to assess existing LMs for their alignment with human knowledge distributions across three real-world domains. We find significant misalignment between LMs and human populations, though using persona-based prompts can improve alignment. Interestingly, smaller LMs tend to achieve greater psychometric alignment than larger LMs. Further, training LMs on human response data from the target distribution enhances their psychometric alignment on unseen test items, but the effectiveness of such training varies across domains.

* Code and data: https://github.com/joyheyueya/psychometric-alignment

Via

Access Paper or Ask Questions

Evaluating and Optimizing Educational Content with Large Language Model Judgments

Mar 05, 2024

Joy He-Yueya, Noah D. Goodman, Emma Brunskill

Figure 1 for Evaluating and Optimizing Educational Content with Large Language Model Judgments

Figure 2 for Evaluating and Optimizing Educational Content with Large Language Model Judgments

Figure 3 for Evaluating and Optimizing Educational Content with Large Language Model Judgments

Figure 4 for Evaluating and Optimizing Educational Content with Large Language Model Judgments

Abstract:Creating effective educational materials generally requires expensive and time-consuming studies of student learning outcomes. To overcome this barrier, one idea is to build computational models of student learning and use them to optimize instructional materials. However, it is difficult to model the cognitive processes of learning dynamics. We propose an alternative approach that uses Language Models (LMs) as educational experts to assess the impact of various instructions on learning outcomes. Specifically, we use GPT-3.5 to evaluate the overall effect of instructional materials on different student groups and find that it can replicate well-established educational findings such as the Expertise Reversal Effect and the Variability Effect. This demonstrates the potential of LMs as reliable evaluators of educational content. Building on this insight, we introduce an instruction optimization approach in which one LM generates instructional materials using the judgments of another LM as a reward function. We apply this approach to create math word problem worksheets aimed at maximizing student learning gains. Human teachers' evaluations of these LM-generated worksheets show a significant alignment between the LM judgments and human teacher preferences. We conclude by discussing potential divergences between human and LM opinions and the resulting pitfalls of automating instructional design.

Via

Access Paper or Ask Questions

Solving Math Word Problems by Combining Language Models With Symbolic Solvers

Apr 16, 2023

Joy He-Yueya, Gabriel Poesia, Rose E. Wang, Noah D. Goodman

Abstract:Automatically generating high-quality step-by-step solutions to math word problems has many applications in education. Recently, combining large language models (LLMs) with external tools to perform complex reasoning and calculation has emerged as a promising direction for solving math word problems, but prior approaches such as Program-Aided Language model (PAL) are biased towards simple procedural problems and less effective for problems that require declarative reasoning. We propose an approach that combines an LLM that can incrementally formalize word problems as a set of variables and equations with an external symbolic solver that can solve the equations. Our approach achieves comparable accuracy to the original PAL on the GSM8K benchmark of math word problems and outperforms PAL by an absolute 20% on ALGEBRA, a new dataset of more challenging word problems extracted from Algebra textbooks. Our work highlights the benefits of using declarative and incremental representations when interfacing with an external tool for solving complex math word problems. Our data and prompts are publicly available at https://github.com/joyheyueya/declarative-math-word-problem.

Via

Access Paper or Ask Questions