Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shaobo Cui

Unraveling Misinformation Propagation in LLM Reasoning

May 24, 2025

Yiyang Feng, Yichen Wang, Shaobo Cui, Boi Faltings, Mina Lee, Jiawei Zhou

Abstract:Large Language Models (LLMs) have demonstrated impressive capabilities in reasoning, positioning them as promising tools for supporting human problem-solving. However, what happens when their performance is affected by misinformation, i.e., incorrect inputs introduced by users due to oversights or gaps in knowledge? Such misinformation is prevalent in real-world interactions with LLMs, yet how it propagates within LLMs' reasoning process remains underexplored. Focusing on mathematical reasoning, we present a comprehensive analysis of how misinformation affects intermediate reasoning steps and final answers. We also examine how effectively LLMs can correct misinformation when explicitly instructed to do so. Even with explicit instructions, LLMs succeed less than half the time in rectifying misinformation, despite possessing correct internal knowledge, leading to significant accuracy drops (10.02% - 72.20%). Further analysis shows that applying factual corrections early in the reasoning process most effectively reduces misinformation propagation, and fine-tuning on synthesized data with early-stage corrections significantly improves reasoning factuality. Our work offers a practical approach to mitigating misinformation propagation.

* 24 pages, 14 figures, 4 tables

Via

Access Paper or Ask Questions

Nuance Matters: Probing Epistemic Consistency in Causal Reasoning

Aug 27, 2024

Shaobo Cui, Junyou Li, Luca Mouchel, Yiyang Feng, Boi Faltings

Abstract:To address this gap, our study introduces the concept of causal epistemic consistency, which focuses on the self-consistency of Large Language Models (LLMs) in differentiating intermediates with nuanced differences in causal reasoning. We propose a suite of novel metrics -- intensity ranking concordance, cross-group position agreement, and intra-group clustering -- to evaluate LLMs on this front. Through extensive empirical studies on 21 high-profile LLMs, including GPT-4, Claude3, and LLaMA3-70B, we have favoring evidence that current models struggle to maintain epistemic consistency in identifying the polarity and intensity of intermediates in causal reasoning. Additionally, we explore the potential of using internal token probabilities as an auxiliary tool to maintain causal epistemic consistency. In summary, our study bridges a critical gap in AI research by investigating the self-consistency over fine-grained intermediates involved in causal reasoning.

* 20 pages

Via

Access Paper or Ask Questions

A Logical Fallacy-Informed Framework for Argument Generation

Aug 07, 2024

Luca Mouchel, Debjit Paul, Shaobo Cui, Robert West, Antoine Bosselut, Boi Faltings

Figure 1 for A Logical Fallacy-Informed Framework for Argument Generation

Figure 2 for A Logical Fallacy-Informed Framework for Argument Generation

Figure 3 for A Logical Fallacy-Informed Framework for Argument Generation

Figure 4 for A Logical Fallacy-Informed Framework for Argument Generation

Abstract:Despite the remarkable performance of Large Language Models (LLMs), they still struggle with generating logically sound arguments, resulting in potential risks such as spreading misinformation. An important factor contributing to LLMs' suboptimal performance in generating coherent arguments is their oversight of logical fallacies. To address this issue, we introduce FIPO, a fallacy-informed framework that leverages preference optimization methods to steer LLMs toward logically sound arguments. FIPO includes a classification loss, to capture the fine-grained information on fallacy categories. Our results on argumentation datasets show that our method reduces the fallacy errors by up to 17.5%. Furthermore, our human evaluation results indicate that the quality of the generated arguments by our method significantly outperforms the fine-tuned baselines, as well as prior preference optimization methods, such as DPO. These findings highlight the importance of ensuring models are aware of logical fallacies for effective argument generation.

Via

Access Paper or Ask Questions

The Odyssey of Commonsense Causality: From Foundational Benchmarks to Cutting-Edge Reasoning

Jun 27, 2024

Shaobo Cui, Zhijing Jin, Bernhard Schölkopf, Boi Faltings

Figure 1 for The Odyssey of Commonsense Causality: From Foundational Benchmarks to Cutting-Edge Reasoning

Figure 2 for The Odyssey of Commonsense Causality: From Foundational Benchmarks to Cutting-Edge Reasoning

Figure 3 for The Odyssey of Commonsense Causality: From Foundational Benchmarks to Cutting-Edge Reasoning

Figure 4 for The Odyssey of Commonsense Causality: From Foundational Benchmarks to Cutting-Edge Reasoning

Abstract:Understanding commonsense causality is a unique mark of intelligence for humans. It helps people understand the principles of the real world better and benefits the decision-making process related to causation. For instance, commonsense causality is crucial in judging whether a defendant's action causes the plaintiff's loss in determining legal liability. Despite its significance, a systematic exploration of this topic is notably lacking. Our comprehensive survey bridges this gap by focusing on taxonomies, benchmarks, acquisition methods, qualitative reasoning, and quantitative measurements in commonsense causality, synthesizing insights from over 200 representative articles. Our work aims to provide a systematic overview, update scholars on recent advancements, provide a pragmatic guide for beginners, and highlight promising future research directions in this vital field.

* 42 pages

Via

Access Paper or Ask Questions

δ-CAUSAL: Exploring Defeasibility in Causal Reasoning

Jan 06, 2024

Shaobo Cui, Lazar Milikic, Yiyang Feng, Mete Ismayilzada, Debjit Paul, Antoine Bosselut, Boi Faltings

Figure 1 for δ-CAUSAL: Exploring Defeasibility in Causal Reasoning

Figure 2 for δ-CAUSAL: Exploring Defeasibility in Causal Reasoning

Figure 3 for δ-CAUSAL: Exploring Defeasibility in Causal Reasoning

Figure 4 for δ-CAUSAL: Exploring Defeasibility in Causal Reasoning

Abstract:Defeasibility in causal reasoning implies that the causal relationship between cause and effect can be strengthened or weakened. Namely, the causal strength between cause and effect should increase or decrease with the incorporation of strengthening arguments (supporters) or weakening arguments (defeaters), respectively. However, existing works ignore defeasibility in causal reasoning and fail to evaluate existing causal strength metrics in defeasible settings. In this work, we present {\delta}-CAUSAL, the first benchmark dataset for studying defeasibility in causal reasoning. {\delta}-CAUSAL includes around 11K events spanning ten domains, featuring defeasible causality pairs, i.e., cause-effect pairs accompanied by supporters and defeaters. We further show current causal strength metrics fail to reflect the change of causal strength with the incorporation of supporters or defeaters in {\delta}-CAUSAL. To this end, we propose CESAR (Causal Embedding aSsociation with Attention Rating), a metric that measures causal strength based on token-level causal relationships. CESAR achieves a significant 69.7% relative improvement over existing metrics, increasing from 47.2% to 80.1% in capturing the causal strength change brought by supporters and defeaters. We further demonstrate even Large Language Models (LLMs) like GPT-3.5 still lag 4.5 and 10.7 points behind humans in generating supporters and defeaters, emphasizing the challenge posed by {\delta}-CAUSAL.

Via

Access Paper or Ask Questions

Towards Zero-Shot Personalized Table-to-Text Generation with Contrastive Persona Distillation

Apr 18, 2023

Haolan Zhan, Xuming Lin, Shaobo Cui, Zhongzhou Zhao, Wei Zhou, Haiqing Chen

Abstract:Existing neural methods have shown great potentials towards generating informative text from structured tabular data as well as maintaining high content fidelity. However, few of them shed light on generating personalized expressions, which often requires well-aligned persona-table-text datasets that are difficult to obtain. To overcome these obstacles, we explore personalized table-to-text generation under a zero-shot setting, by assuming no well-aligned persona-table-text triples are required during training. To this end, we firstly collect a set of unpaired persona information and then propose a semi-supervised approach with contrastive persona distillation (S2P-CPD) to generate personalized context. Specifically, tabular data and persona information are firstly represented as latent variables separately. Then, we devise a latent space fusion technique to distill persona information into the table representation. Besides, a contrastive-based discriminator is employed to guarantee the style consistency between the generated context and its corresponding persona. Experimental results on two benchmarks demonstrate S2P-CPD's ability on keeping both content fidelity and personalized expressions.

* Accepted by ICASSP 2023

Via

Access Paper or Ask Questions

GGP: A Graph-based Grouping Planner for Explicit Control of Long Text Generation

Aug 18, 2021

Xuming Lin, Shaobo Cui, Zhongzhou Zhao, Wei Zhou, Ji Zhang, Haiqing Chen

Figure 1 for GGP: A Graph-based Grouping Planner for Explicit Control of Long Text Generation

Figure 2 for GGP: A Graph-based Grouping Planner for Explicit Control of Long Text Generation

Figure 3 for GGP: A Graph-based Grouping Planner for Explicit Control of Long Text Generation

Figure 4 for GGP: A Graph-based Grouping Planner for Explicit Control of Long Text Generation

Abstract:Existing data-driven methods can well handle short text generation. However, when applied to the long-text generation scenarios such as story generation or advertising text generation in the commercial scenario, these methods may generate illogical and uncontrollable texts. To address these aforementioned issues, we propose a graph-based grouping planner(GGP) following the idea of first-plan-then-generate. Specifically, given a collection of key phrases, GGP firstly encodes these phrases into an instance-level sequential representation and a corpus-level graph-based representation separately. With these two synergic representations, we then regroup these phrases into a fine-grained plan, based on which we generate the final long text. We conduct our experiments on three long text generation datasets and the experimental results reveal that GGP significantly outperforms baselines, which proves that GGP can control the long text generation by knowing how to say and in what order.

Via

Access Paper or Ask Questions

SPMoE: Generate Multiple Pattern-Aware Outputs with Sparse Pattern Mixture of Experts

Aug 18, 2021

Shaobo Cui, Xintong Bao, Xuming Lin, Zhongzhou Zhao, Ji Zhang, Wei Zhou, Haiqing Chen

Abstract:Many generation tasks follow a one-to-many mapping relationship: each input could be associated with multiple outputs. Existing methods like Conditional Variational AutoEncoder(CVAE) employ a latent variable to model this one-to-many relationship. However, this high-dimensional and dense latent variable lacks explainability and usually leads to poor and uncontrollable generations. In this paper, we innovatively introduce the linguistic concept of pattern to decompose the one-to-many mapping into multiple one-to-one mappings and further propose a model named Sparse Pattern Mixture of Experts(SPMoE). Each one-to-one mapping is associated with a conditional generation pattern and is modeled with an expert in SPMoE. To ensure each language pattern can be exclusively handled with an expert model for better explainability and diversity, a sparse mechanism is employed to coordinate all the expert models in SPMoE. We assess the performance of our SPMoE on the paraphrase generation task and the experiment results prove that SPMoE can achieve a good balance in terms of quality, pattern-level diversity, and corpus-level diversity.

Via

Access Paper or Ask Questions

OneStop QAMaker: Extract Question-Answer Pairs from Text in a One-Stop Approach

Feb 24, 2021

Shaobo Cui, Xintong Bao, Xinxing Zu, Yangyang Guo, Zhongzhou Zhao, Ji Zhang, Haiqing Chen

Abstract:Large-scale question-answer (QA) pairs are critical for advancing research areas like machine reading comprehension and question answering. To construct QA pairs from documents requires determining how to ask a question and what is the corresponding answer. Existing methods for QA pair generation usually follow a pipeline approach. Namely, they first choose the most likely candidate answer span and then generate the answer-specific question. This pipeline approach, however, is undesired in mining the most appropriate QA pairs from documents since it ignores the connection between question generation and answer extraction, which may lead to incompatible QA pair generation, i.e., the selected answer span is inappropriate for question generation. However, for human annotators, we take the whole QA pair into account and consider the compatibility between question and answer. Inspired by such motivation, instead of the conventional pipeline approach, we propose a model named OneStop generate QA pairs from documents in a one-stop approach. Specifically, questions and their corresponding answer span is extracted simultaneously and the process of question generation and answer extraction mutually affect each other. Additionally, OneStop is much more efficient to be trained and deployed in industrial scenarios since it involves only one model to solve the complex QA generation task. We conduct comprehensive experiments on three large-scale machine reading comprehension datasets: SQuAD, NewsQA, and DuReader. The experimental results demonstrate that our OneStop model outperforms the baselines significantly regarding the quality of generated questions, quality of generated question-answer pairs, and model efficiency.

* 8 pages

Via

Access Paper or Ask Questions

Should Answer Immediately or Wait for Further Information? A Novel Wait-or-Answer Task and Its Predictive Approach

May 27, 2020

Zehao Lin, Shaobo Cui, Xiaoming Kang, Guodun Li, Feng Ji, Haiqing Chen, Yin Zhang

Figure 1 for Should Answer Immediately or Wait for Further Information? A Novel Wait-or-Answer Task and Its Predictive Approach

Figure 2 for Should Answer Immediately or Wait for Further Information? A Novel Wait-or-Answer Task and Its Predictive Approach

Figure 3 for Should Answer Immediately or Wait for Further Information? A Novel Wait-or-Answer Task and Its Predictive Approach

Figure 4 for Should Answer Immediately or Wait for Further Information? A Novel Wait-or-Answer Task and Its Predictive Approach

Abstract:Different people have different habits of describing their intents in conversations. Some people may tend to deliberate their full intents in several successive utterances, i.e., they use several consistent messages for readability instead of a long sentence to express their question. This creates a predicament faced by dialogue systems' application, especially in real-world industrial scenarios, in which the dialogue system is unsure that whether it should answer the user's query immediately or wait for users' further supplementary input. Motivated by such interesting quandary, we define a novel task: Wait-or-Answer to better tackle this dilemma faced by dialogue systems. We shed light on a new research topic about how the dialogue system can be more competent to behave in this Wait-or-Answer quandary. Further, we propose a predictive approach dubbed Imagine-then-Arbitrate (ITA) to resolve this Wait-or-Answer task. More specifically, we take advantage of an arbitrator model to help the dialogue system decide to wait or answer. The arbitrator's decision is made with the assistance of two ancillary imaginator models: a wait imaginator and an answer imaginator. The wait imaginator tries to predict what the user would supplement and use its prediction to persuade the arbitrator that the user has some information to add, so the dialogue system should wait. The answer imaginator, nevertheless, struggles to predict the answer of the dialogue system and convince the arbitrator that it's a superior choice to answer the users' query immediately. To our best knowledge, our paper is the first work to explicitly define the Wait-or-Answer task in the dialogue system. Additionally, our proposed ITA approach significantly outperforms the existing models in solving this Wait-or-Answer problem.

* This previously appeared as arXiv:2002.09616v2, which was mistakenly submitted as a replacement. arXiv admin note: text overlap with arXiv:2002.09616v3

Via

Access Paper or Ask Questions