Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuhong Sun

Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework

Jan 26, 2025

Yuhong Sun, Zhangyue Yin, Xuanjing Huang, Xipeng Qiu, Hui Zhao

Abstract:Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains. Math Word Problems (MWPs) serve as a crucial benchmark for evaluating LLMs' reasoning abilities. While most research primarily focuses on improving accuracy, it often neglects understanding and addressing the underlying patterns of errors. Current error classification methods rely on static and predefined categories, which limit their ability to capture the full spectrum of error patterns in mathematical reasoning. To enable systematic error analysis, we collect error samples from 15 different LLMs of varying sizes across four distinct MWP datasets using multiple sampling strategies. Based on this extensive collection, we introduce MWPES-300K, a comprehensive dataset containing 304,865 error samples that cover diverse error patterns and reasoning paths. To reduce human bias and enable fine-grained analysis of error patterns, we propose a novel framework for automated dynamic error classification in mathematical reasoning. Experimental results demonstrate that dataset characteristics significantly shape error patterns, which evolve from basic to complex manifestations as model capabilities increase. With deeper insights into error patterns, we propose error-aware prompting that incorporates common error patterns as explicit guidance, leading to significant improvements in mathematical reasoning performance.

* 22 pages, 9 figures

Via

Access Paper or Ask Questions

Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem

Mar 06, 2024

Yuhong Sun, Zhangyue Yin, Qipeng Guo, Jiawen Wu, Xipeng Qiu, Hui Zhao

Figure 1 for Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem

Figure 2 for Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem

Figure 3 for Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem

Figure 4 for Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem

Abstract:Large language models (LLMs) are highly effective in various natural language processing (NLP) tasks. However, they are susceptible to producing unreliable conjectures in ambiguous contexts called hallucination. This paper presents a new method for evaluating LLM hallucination in Question Answering (QA) based on the unanswerable math word problem (MWP). To support this approach, we innovatively develop a dataset called Unanswerable Math Word Problem (UMWP) which comprises 5200 questions across five categories. We developed an evaluation methodology combining text similarity and mathematical expression detection to determine whether LLM considers the question unanswerable. The results of extensive experiments conducted on 31 LLMs, including GPT-3, InstructGPT, LLaMA, and Claude, demonstrate that in-context learning and reinforcement learning with human feedback (RLHF) training significantly enhance the model's ability to avoid hallucination. We show that utilizing MWP is a reliable and effective approach to assess hallucination. Our code and data are available at https://github.com/Yuki-Asuuna/UMWP.

* 11 pages, 8 figures, accepted by LREC-Coling 2024

Via

Access Paper or Ask Questions

A Metamodel and Framework for Artificial General Intelligence From Theory to Practice

Feb 11, 2021

Hugo Latapie, Ozkan Kilic, Gaowen Liu, Yan Yan, Ramana Kompella, Pei Wang, Kristinn R. Thorisson, Adam Lawrence, Yuhong Sun, Jayanth Srinivasa

Figure 1 for A Metamodel and Framework for Artificial General Intelligence From Theory to Practice

Figure 2 for A Metamodel and Framework for Artificial General Intelligence From Theory to Practice

Figure 3 for A Metamodel and Framework for Artificial General Intelligence From Theory to Practice

Figure 4 for A Metamodel and Framework for Artificial General Intelligence From Theory to Practice

Abstract:This paper introduces a new metamodel-based knowledge representation that significantly improves autonomous learning and adaptation. While interest in hybrid machine learning / symbolic AI systems leveraging, for example, reasoning and knowledge graphs, is gaining popularity, we find there remains a need for both a clear definition of knowledge and a metamodel to guide the creation and manipulation of knowledge. Some of the benefits of the metamodel we introduce in this paper include a solution to the symbol grounding problem, cumulative learning, and federated learning. We have applied the metamodel to problems ranging from time series analysis, computer vision, and natural language understanding and have found that the metamodel enables a wide variety of learning mechanisms ranging from machine learning, to graph network analysis and learning by reasoning engines to interoperate in a highly synergistic way. Our metamodel-based projects have consistently exhibited unprecedented accuracy, performance, and ability to generalize. This paper is inspired by the state-of-the-art approaches to AGI, recent AGI-aspiring work, the granular computing community, as well as Alfred Korzybski's general semantics. One surprising consequence of the metamodel is that it not only enables a new level of autonomous learning and optimal functioning for machine intelligences, but may also shed light on a path to better understanding how to improve human cognition.

* arXiv admin note: text overlap with arXiv:2008.12879

Via

Access Paper or Ask Questions