Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:SCORE: A framework for Self-Contradictory Reasoning Evaluation

Nov 16, 2023

Ziyi Liu, Isabelle Lee, Yongkang Du, Soumya Sanyal, Jieyu Zhao

Figure 1 for SCORE: A framework for Self-Contradictory Reasoning Evaluation

Figure 2 for SCORE: A framework for Self-Contradictory Reasoning Evaluation

Figure 3 for SCORE: A framework for Self-Contradictory Reasoning Evaluation

Figure 4 for SCORE: A framework for Self-Contradictory Reasoning Evaluation

Share this with someone who'll enjoy it:

Abstract:Large language models (LLMs) have demonstrated impressive reasoning ability in various language-based tasks. Despite many proposed reasoning methods aimed at enhancing performance in downstream tasks, two fundamental questions persist: Does reasoning genuinely support predictions, and how reliable is the quality of reasoning? In this paper, we propose a framework \textsc{SCORE} to analyze how well LLMs can reason. Specifically, we focus on self-contradictory reasoning, where reasoning does not support the prediction. We find that LLMs often contradict themselves when performing reasoning tasks that involve contextual information and commonsense. The model may miss evidence or use shortcuts, thereby exhibiting self-contradictory behaviors. We also employ the Point-of-View (POV) method, which probes models to generate reasoning from multiple perspectives, as a diagnostic tool for further analysis. We find that though LLMs may appear to perform well in one-perspective settings, they fail to stabilize such behavior in multi-perspectives settings. Even for correct predictions, the reasoning may be messy and incomplete, and LLMs can easily be led astray from good reasoning. \textsc{SCORE}'s results underscore the lack of robustness required for trustworthy reasoning and the urgency for further research to establish best practices for a comprehensive evaluation of reasoning beyond accuracy-based metrics.

View paper on

Share this with someone who'll enjoy it:

Title:SCORE: A framework for Self-Contradictory Reasoning Evaluation

Paper and Code