Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qisheng Hu

Coordinating Search-Informed Reasoning and Reasoning-Guided Search in Claim Verification

Jun 09, 2025

Qisheng Hu, Quanyu Long, Wenya Wang

Abstract:Multi-hop claim verification is inherently challenging, requiring multi-step reasoning to construct verification chains while iteratively searching for information to uncover hidden bridging facts. This process is fundamentally interleaved, as effective reasoning relies on dynamically retrieved evidence, while effective search demands reasoning to refine queries based on partial information. To achieve this, we propose Hierarchical Agent Reasoning and Information Search (HARIS), explicitly modeling the coordinated process of reasoning-driven searching and search-informed reasoning. HARIS consists of a high-level reasoning agent that focuses on constructing the main verification chain, generating factual questions when more information is needed, and a low-level search agent that iteratively retrieves more information, refining its search based on intermediate findings. This design allows each agent to specialize in its respective task, enhancing verification accuracy and interpretability. HARIS is trained using reinforcement learning with outcome-based rewards. Experimental results on the EX-FEVER and HOVER benchmarks demonstrate that HARIS achieves strong performance, greatly advancing multi-hop claim verification.

* 19 pages, 9 figures

Via

Access Paper or Ask Questions

BOOST: Bootstrapping Strategy-Driven Reasoning Programs for Program-Guided Fact-Checking

Apr 03, 2025

Qisheng Hu, Quanyu Long, Wenya Wang

Abstract:Program-guided reasoning has shown promise in complex claim fact-checking by decomposing claims into function calls and executing reasoning programs. However, prior work primarily relies on few-shot in-context learning (ICL) with ad-hoc demonstrations, which limit program diversity and require manual design with substantial domain knowledge. Fundamentally, the underlying principles of effective reasoning program generation still remain underexplored, making it challenging to construct effective demonstrations. To address this, we propose BOOST, a bootstrapping-based framework for few-shot reasoning program generation. BOOST explicitly integrates claim decomposition and information-gathering strategies as structural guidance for program generation, iteratively refining bootstrapped demonstrations in a strategy-driven and data-centric manner without human intervention. This enables a seamless transition from zero-shot to few-shot strategic program-guided learning, enhancing interpretability and effectiveness. Experimental results show that BOOST outperforms prior few-shot baselines in both zero-shot and few-shot settings for complex claim verification.

* 18 pages, 5 figures

Via

Access Paper or Ask Questions

Just What You Desire: Constrained Timeline Summarization with Self-Reflection for Enhanced Relevance

Dec 23, 2024

Muhammad Reza Qorib, Qisheng Hu, Hwee Tou Ng

Figure 1 for Just What You Desire: Constrained Timeline Summarization with Self-Reflection for Enhanced Relevance

Figure 2 for Just What You Desire: Constrained Timeline Summarization with Self-Reflection for Enhanced Relevance

Figure 3 for Just What You Desire: Constrained Timeline Summarization with Self-Reflection for Enhanced Relevance

Figure 4 for Just What You Desire: Constrained Timeline Summarization with Self-Reflection for Enhanced Relevance

Abstract:Given news articles about an entity, such as a public figure or organization, timeline summarization (TLS) involves generating a timeline that summarizes the key events about the entity. However, the TLS task is too underspecified, since what is of interest to each reader may vary, and hence there is not a single ideal or optimal timeline. In this paper, we introduce a novel task, called Constrained Timeline Summarization (CTLS), where a timeline is generated in which all events in the timeline meet some constraint. An example of a constrained timeline concerns the legal battles of Tiger Woods, where only events related to his legal problems are selected to appear in the timeline. We collected a new human-verified dataset of constrained timelines involving 47 entities and 5 constraints per entity. We propose an approach that employs a large language model (LLM) to summarize news articles according to a specified constraint and cluster them to identify key events to include in a constrained timeline. In addition, we propose a novel self-reflection method during summary generation, demonstrating that this approach successfully leads to improved performance.

* AAAI 2025 (with appendix)

Via

Access Paper or Ask Questions

MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems

Apr 15, 2024

Kaixin Li, Yuchen Tian, Qisheng Hu, Ziyang Luo, Jing Ma

Figure 1 for MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems

Figure 2 for MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems

Figure 3 for MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems

Figure 4 for MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems

Abstract:Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts. While recent developments in Large Multimodal Models have demonstrated remarkable abilities in visual reasoning and mathematical tasks, there is little work on investigating whether these models can effectively interpret visual elements for code generation. To this end, we present MMCode, the first multi-modal coding dataset for evaluating algorithmic problem-solving skills in visually rich contexts. MMCode contains 3,548 questions and 6,620 images collected from real-world programming challenges harvested from 10 code competition websites, presenting significant challenges due to the extreme demand for reasoning abilities. Our experiment results show that current state-of-the-art models struggle to solve these problems. The results highlight the lack of powerful vision-code models, and we hope MMCode can serve as an inspiration for future works in this domain. The data and code are publicly available at https://github.com/happylkx/MMCode.

* 46 pages, 21 figures and 6 tables

Via

Access Paper or Ask Questions

InstructCoder: Empowering Language Models for Code Editing

Oct 31, 2023

Qisheng Hu, Kaixin Li, Xu Zhao, Yuxi Xie, Tiedong Liu, Hui Chen, Qizhe Xie, Junxian He

Abstract:Code editing encompasses a variety of pragmatic tasks that developers deal with daily. Despite its relevance and practical usefulness, automatic code editing remains an underexplored area in the evolution of deep learning models, partly due to data scarcity. In this work, we explore the use of large language models (LLMs) to edit code based on user instructions, covering a broad range of implicit tasks such as comment insertion, code optimization, and code refactoring. To facilitate this, we introduce InstructCoder, the first dataset designed to adapt LLMs for general-purpose code editing, containing highdiversity code-editing tasks. It consists of over 114,000 instruction-input-output triplets and covers multiple distinct code editing scenarios. The dataset is systematically expanded through an iterative process that commences with code editing data sourced from GitHub commits as seed tasks. Seed and generated tasks are used subsequently to prompt ChatGPT for more task data. Our experiments demonstrate that open-source LLMs fine-tuned on InstructCoder can edit code correctly based on users' instructions most of the time, exhibiting unprecedented code-editing performance levels. Such results suggest that proficient instruction-finetuning can lead to significant amelioration in code editing abilities. The dataset and the source code are available at https://github.com/qishenghu/CodeInstruct.

Via

Access Paper or Ask Questions