Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xinyu Pi

Beyond LLMs: A Linguistic Approach to Causal Graph Generation from Narrative Texts

Apr 10, 2025

Zehan Li, Ruhua Pan, Xinyu Pi

Abstract:We propose a novel framework for generating causal graphs from narrative texts, bridging high-level causality and detailed event-specific relationships. Our method first extracts concise, agent-centered vertices using large language model (LLM)-based summarization. We introduce an "Expert Index," comprising seven linguistically informed features, integrated into a Situation-Task-Action-Consequence (STAC) classification model. This hybrid system, combining RoBERTa embeddings with the Expert Index, achieves superior precision in causal link identification compared to pure LLM-based approaches. Finally, a structured five-iteration prompting process refines and constructs connected causal graphs. Experiments on 100 narrative chapters and short stories demonstrate that our approach consistently outperforms GPT-4o and Claude 3.5 in causal graph quality, while maintaining readability. The open-source tool provides an interpretable, efficient solution for capturing nuanced causal chains in narratives.

* published at the 7th Workshop on Narrative Understanding, NAACL 2025

Via

Access Paper or Ask Questions

UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models

Jul 25, 2024

Xinyu Pi, Mingyuan Wu, Jize Jiang, Haozhen Zheng, Beitong Tian, Chengxiang Zhai, Klara Nahrstedt, Zhiting Hu

Figure 1 for UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models

Figure 2 for UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models

Figure 3 for UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models

Figure 4 for UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models

Abstract:Smaller-scale Vision-Langauge Models (VLMs) often claim to perform on par with larger models in general-domain visual grounding and question-answering benchmarks while offering advantages in computational efficiency and storage. However, their ability to handle rare objects, which fall into the long tail of data distributions, is less understood. To rigorously evaluate this aspect, we introduce the "Uncontextualized Uncommon Objects" (UOUO) benchmark. This benchmark focuses on systematically testing VLMs with both large and small parameter counts on rare and specialized objects. Our comprehensive analysis reveals that while smaller VLMs maintain competitive performance on common datasets, they significantly underperform on tasks involving uncommon objects. We also propose an advanced, scalable pipeline for data collection and cleaning, ensuring the UOUO benchmark provides high-quality, challenging instances. These findings highlight the need to consider long-tail distributions when assessing the true capabilities of VLMs.

* 10 pages

Via

Access Paper or Ask Questions

Towards Robustness of Text-to-SQL Models Against Natural and Realistic Adversarial Table Perturbation

Dec 20, 2022

Xinyu Pi, Bing Wang, Yan Gao, Jiaqi Guo, Zhoujun Li, Jian-Guang Lou

Abstract:The robustness of Text-to-SQL parsers against adversarial perturbations plays a crucial role in delivering highly reliable applications. Previous studies along this line primarily focused on perturbations in the natural language question side, neglecting the variability of tables. Motivated by this, we propose the Adversarial Table Perturbation (ATP) as a new attacking paradigm to measure the robustness of Text-to-SQL models. Following this proposition, we curate ADVETA, the first robustness evaluation benchmark featuring natural and realistic ATPs. All tested state-of-the-art models experience dramatic performance drops on ADVETA, revealing models' vulnerability in real-world practices. To defend against ATP, we build a systematic adversarial training example generation framework tailored for better contextualization of tabular data. Experiments show that our approach not only brings the best robustness improvement against table-side perturbations but also substantially empowers models against NL-side perturbations. We release our benchmark and code at: https://github.com/microsoft/ContextualSP.

* Accepted by ACL 2022 (Oral)

Via

Access Paper or Ask Questions

LogiGAN: Learning Logical Reasoning via Adversarial Pre-training

May 18, 2022

Xinyu Pi, Wanjun Zhong, Yan Gao, Nan Duan, Jian-Guang Lou

Figure 1 for LogiGAN: Learning Logical Reasoning via Adversarial Pre-training

Figure 2 for LogiGAN: Learning Logical Reasoning via Adversarial Pre-training

Figure 3 for LogiGAN: Learning Logical Reasoning via Adversarial Pre-training

Figure 4 for LogiGAN: Learning Logical Reasoning via Adversarial Pre-training

Abstract:We present LogiGAN, an unsupervised adversarial pre-training framework for improving logical reasoning abilities of language models. Upon automatic identifying logical reasoning phenomena in massive text corpus via detection heuristics, we train language models to predict the masked-out logical statements. Inspired by the facilitation effect of reflective thinking in human learning, we analogically simulate the learning-thinking process with an adversarial Generator-Verifier architecture to assist logic learning. LogiGAN implements a novel sequential GAN approach that (a) circumvents the non-differentiable challenge of the sequential GAN by leveraging the Generator as a sentence-level generative likelihood scorer with a learning objective of reaching scoring consensus with the Verifier; (b) is computationally feasible for large-scale pre-training with arbitrary target length. Both base and large size language models pre-trained with LogiGAN demonstrate obvious performance improvement on 12 datasets requiring general reasoning abilities, revealing the fundamental role of logic in broad reasoning, as well as the effectiveness of LogiGAN. Ablation studies on LogiGAN components reveal the relative orthogonality between linguistic and logic abilities and suggest that reflective thinking's facilitation effect might also generalize to machine learning.

* 19 pages

Via

Access Paper or Ask Questions

Reasoning Like Program Executors

Jan 27, 2022

Xinyu Pi, Qian Liu, Bei Chen, Morteza Ziyadi, Zeqi Lin, Yan Gao, Qiang Fu, Jian-Guang Lou, Weizhu Chen

Figure 1 for Reasoning Like Program Executors

Figure 2 for Reasoning Like Program Executors

Figure 3 for Reasoning Like Program Executors

Figure 4 for Reasoning Like Program Executors

Abstract:Reasoning over natural language is a long-standing goal for the research community. However, studies have shown that existing language models are inadequate in reasoning. To address the issue, we present POET, a new pre-training paradigm. Through pre-training language models with programs and their execution results, POET empowers language models to harvest the reasoning knowledge possessed in program executors via a data-driven approach. POET is conceptually simple and can be instantiated by different kinds of programs. In this paper, we show three empirically powerful instances, i.e., POET-Math, POET-Logic, and POET-SQL. Experimental results on six benchmarks demonstrate that POET can significantly boost model performance on natural language reasoning, such as numerical reasoning, logical reasoning, and multi-hop reasoning. Taking the DROP benchmark as a representative example, POET improves the F1 metric of BART from 69.2% to 80.6%. Furthermore, POET shines in giant language models, pushing the F1 metric of T5-11B to 87.6% and achieving a new state-of-the-art performance on DROP. POET opens a new gate on reasoning-enhancement pre-training and we hope our analysis would shed light on the future research of reasoning like program executors.

* Work in progress.The first two authors contributed equally

Via

Access Paper or Ask Questions