Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Runcong Zhao

Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration

May 30, 2025

Qinglin Zhu, Runcong Zhao, Hanqi Yan, Yulan He, Yudong Chen, Lin Gui

Abstract:Large Language Models (LLMs) struggle with complex reasoning due to limited diversity and inefficient search. We propose Soft Reasoning, an embedding-based search framework that optimises the embedding of the first token to guide generation. It combines (1) embedding perturbation for controlled exploration and (2) Bayesian optimisation to refine embeddings via a verifier-guided objective, balancing exploration and exploitation. This approach improves reasoning accuracy and coherence while avoiding reliance on heuristic search. Experiments demonstrate superior correctness with minimal computation, making it a scalable, model-agnostic solution.

* Accepted by ICML 2025

Via

Access Paper or Ask Questions

Sparse Activation Editing for Reliable Instruction Following in Narratives

May 22, 2025

Runcong Zhao, Chengyu Cao, Qinglin Zhu, Xiucheng Lv, Shun Shao, Lin Gui, Ruifeng Xu, Yulan He

Abstract:Complex narrative contexts often challenge language models' ability to follow instructions, and existing benchmarks fail to capture these difficulties. To address this, we propose Concise-SAE, a training-free framework that improves instruction following by identifying and editing instruction-relevant neurons using only natural language instructions, without requiring labelled data. To thoroughly evaluate our method, we introduce FreeInstruct, a diverse and realistic benchmark of 1,212 examples that highlights the challenges of instruction following in narrative-rich settings. While initially motivated by complex narratives, Concise-SAE demonstrates state-of-the-art instruction adherence across varied tasks without compromising generation quality.

Via

Access Paper or Ask Questions

**PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games**

Apr 26, 2024

Qinglin Zhu, Runcong Zhao, Jinhua Du, Lin Gui, Yulan He

Abstract:Recent advancements in Large Language Models (LLMs) have enhanced the efficacy of agent communication and social interactions. Despite these advancements, building LLM-based agents for reasoning in dynamic environments involving competition and collaboration remains challenging due to the limitations of informed graph-based search methods. We propose PLAYER*, a novel framework based on an anytime sampling-based planner, which utilises sensors and pruners to enable a purely question-driven searching framework for complex reasoning tasks. We also introduce a quantifiable evaluation method using multiple-choice questions and construct the WellPlay dataset with 1,482 QA pairs. Experiments demonstrate PLAYER*'s efficiency and performance enhancements compared to existing methods in complex, dynamic environments with quantifiable results.

Via

Access Paper or Ask Questions

Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives

Feb 16, 2024

Runcong Zhao, Qinglin Zhu, Hainiu Xu, Jiazheng Li, Yuxiang Zhou, Yulan He, Lin Gui

Figure 1 for Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives

Figure 2 for Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives

Figure 3 for Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives

Figure 4 for Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives

Abstract:Existing datasets for narrative understanding often fail to represent the complexity and uncertainty of relationships in real-life social scenarios. To address this gap, we introduce a new benchmark, Conan, designed for extracting and analysing intricate character relation graphs from detective narratives. Specifically, we designed hierarchical relationship categories and manually extracted and annotated role-oriented relationships from the perspectives of various characters, incorporating both public relationships known to most characters and secret ones known to only a few. Our experiments with advanced Large Language Models (LLMs) like GPT-3.5, GPT-4, and Llama2 reveal their limitations in inferencing complex relationships and handling longer narratives. The combination of the Conan dataset and our pipeline strategy is geared towards understanding the ability of LLMs to comprehend nuanced relational dynamics in narrative contexts.

Via

Access Paper or Ask Questions

OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models

Feb 14, 2024

Hainiu Xu, Runcong Zhao, Lixing Zhu, Jinhua Du, Yulan He

Figure 1 for OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models

Figure 2 for OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models

Figure 3 for OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models

Figure 4 for OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models

Abstract:Neural Theory-of-Mind (N-ToM), machine's ability to understand and keep track of the mental states of others, is pivotal in developing socially intelligent agents. However, prevalent N-ToM benchmarks have several shortcomings, including the presence of ambiguous and artificial narratives, absence of personality traits and preferences, a lack of questions addressing characters' psychological mental states, and limited diversity in the questions posed. In response to these issues, we construct OpenToM, a new benchmark for assessing N-ToM with (1) longer and clearer narrative stories, (2) characters with explicit personality traits, (3) actions that are triggered by character intentions, and (4) questions designed to challenge LLMs' capabilities of modeling characters' mental states of both the physical and psychological world. Using OpenToM, we reveal that state-of-the-art LLMs thrive at modeling certain aspects of mental states in the physical world but fall short when tracking characters' mental states in the psychological world.

Via

Access Paper or Ask Questions

Are NLP Models Good at Tracing Thoughts: An Overview of Narrative Understanding

Oct 28, 2023

Lixing Zhu, Runcong Zhao, Lin Gui, Yulan He

Abstract:Narrative understanding involves capturing the author's cognitive processes, providing insights into their knowledge, intentions, beliefs, and desires. Although large language models (LLMs) excel in generating grammatically coherent text, their ability to comprehend the author's thoughts remains uncertain. This limitation hinders the practical applications of narrative understanding. In this paper, we conduct a comprehensive survey of narrative understanding tasks, thoroughly examining their key features, definitions, taxonomy, associated datasets, training objectives, evaluation metrics, and limitations. Furthermore, we explore the potential of expanding the capabilities of modularized LLMs to address novel narrative understanding tasks. By framing narrative understanding as the retrieval of the author's imaginative cues that outline the narrative structure, our study introduces a fresh perspective on enhancing narrative comprehension.

Via

Access Paper or Ask Questions

NarrativePlay: Interactive Narrative Understanding

Oct 02, 2023

Runcong Zhao, Wenjia Zhang, Jiazheng Li, Lixing Zhu, Yanran Li, Yulan He, Lin Gui

Abstract:In this paper, we introduce NarrativePlay, a novel system that allows users to role-play a fictional character and interact with other characters in narratives such as novels in an immersive environment. We leverage Large Language Models (LLMs) to generate human-like responses, guided by personality traits extracted from narratives. The system incorporates auto-generated visual display of narrative settings, character portraits, and character speech, greatly enhancing user experience. Our approach eschews predefined sandboxes, focusing instead on main storyline events extracted from narratives from the perspective of a user-selected character. NarrativePlay has been evaluated on two types of narratives, detective and adventure stories, where users can either explore the world or improve their favorability with the narrative characters through conversations.

Via

Access Paper or Ask Questions

OverPrompt: Enhancing ChatGPT Capabilities through an Efficient In-Context Learning Approach

May 24, 2023

Jiazheng Li, Runcong Zhao, Yulan He, Lin Gui

Abstract:The exceptional performance of pre-trained large language models has revolutionised various applications, but their adoption in production environments is hindered by prohibitive costs and inefficiencies, particularly when utilising long prompts. This paper proposes OverPrompt, an in-context learning method aimed at improving LLM efficiency and performance by processing multiple inputs in parallel. Evaluated across diverse datasets, OverPrompt enhances task efficiency and integrates a diverse range of examples for improved performance. Particularly, it amplifies fact-checking and sentiment analysis tasks when supplemented with contextual information. Synthetic data grouping further enhances performance, suggesting a viable approach for data augmentation.

Via

Access Paper or Ask Questions

Cone: Unsupervised Contrastive Opinion Extraction

May 08, 2023

Runcong Zhao, Lin Gui, Yulan He

Abstract:Contrastive opinion extraction aims to extract a structured summary or key points organised as positive and negative viewpoints towards a common aspect or topic. Most recent works for unsupervised key point extraction is largely built on sentence clustering or opinion summarisation based on the popularity of opinions expressed in text. However, these methods tend to generate aspect clusters with incoherent sentences, conflicting viewpoints, redundant aspects. To address these problems, we propose a novel unsupervised Contrastive OpinioN Extraction model, called Cone, which learns disentangled latent aspect and sentiment representations based on pseudo aspect and sentiment labels by combining contrastive learning with iterative aspect/sentiment clustering refinement. Apart from being able to extract contrastive opinions, it is also able to quantify the relative popularity of aspects and their associated sentiment distributions. The model has been evaluated on both a hotel review dataset and a Twitter dataset about COVID vaccines. The results show that despite using no label supervision or aspect-denoted seed words, Cone outperforms a number of competitive baselines on contrastive opinion extraction. The results of Cone can be used to offer a better recommendation of products and services online.

Via

Access Paper or Ask Questions

PANACEA: An Automated Misinformation Detection System on COVID-19

Feb 28, 2023

Runcong Zhao, Miguel Arana-Catania, Lixing Zhu, Elena Kochkina, Lin Gui, Arkaitz Zubiaga, Rob Procter, Maria Liakata, Yulan He

Abstract:In this demo, we introduce a web-based misinformation detection system PANACEA on COVID-19 related claims, which has two modules, fact-checking and rumour detection. Our fact-checking module, which is supported by novel natural language inference methods with a self-attention network, outperforms state-of-the-art approaches. It is also able to give automated veracity assessment and ranked supporting evidence with the stance towards the claim to be checked. In addition, PANACEA adapts the bi-directional graph convolutional networks model, which is able to detect rumours based on comment networks of related tweets, instead of relying on the knowledge base. This rumour detection module assists by warning the users in the early stages when a knowledge base may not be available.

Via

Access Paper or Ask Questions