Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiangnan Li

Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings

Jun 10, 2025

Liyan Xu, Zhenlin Su, Mo Yu, Jiangnan Li, Fandong Meng, Jie Zhou

Abstract:This work focuses on an observed limitation of text encoders: embeddings may not be able to recognize fine-grained entities or events within the semantics, resulting in failed dense retrieval on even simple cases. To examine such behaviors, we first introduce a new evaluation dataset in Chinese, named CapRetrieval, whose passages are image captions, and queries are phrases inquiring entities or events in various forms. Zero-shot evaluation suggests that encoders may fail on these fine-grained matching, regardless of training sources or model sizes. Aiming for enhancement, we proceed to finetune encoders with our proposed data generation strategies, which obtains the best performance on CapRetrieval. Within this process, we further identify an issue of granularity dilemma, a challenge for embeddings to express fine-grained salience while aligning with overall semantics. Our dataset, code and models in this work are publicly released at https://github.com/lxucs/CapRetrieval.

Via

Access Paper or Ask Questions

CONGRAD:Conflicting Gradient Filtering for Multilingual Preference Alignment

Mar 31, 2025

Jiangnan Li, Thuy-Trang Vu, Christian Herold, Amirhossein Tebbifakhr, Shahram Khadivi, Gholamreza Haffari

Abstract:Naive joint training of large language models (LLMs) for multilingual preference alignment can suffer from negative interference. This is a known issue in multilingual training, where conflicting objectives degrade overall performance. However, the impact of this phenomenon in the context of multilingual preference alignment remains largely underexplored. To address this issue, we propose CONGRAD, a scalable and effective filtering method that selects high-quality preference samples with minimal gradient conflicts across languages. Our method leverages gradient surgery to retain samples aligned with an aggregated multilingual update direction. Additionally, we incorporate a sublinear gradient compression strategy that reduces memory overhead during gradient accumulation. We integrate CONGRAD into self-rewarding framework and evaluate on LLaMA3-8B and Gemma2-2B across 10 languages. Results show that CONGRAD consistently outperforms strong baselines in both seen and unseen languages, with minimal alignment tax.

Via

Access Paper or Ask Questions

The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding

Feb 13, 2025

Mo Yu, Lemao Liu, Junjie Wu, Tsz Ting Chung, Shunchi Zhang, Jiangnan Li, Dit-Yan Yeung, Jie Zhou

Abstract:In a systematic way, we investigate a widely asked question: Do LLMs really understand what they say?, which relates to the more familiar term Stochastic Parrot. To this end, we propose a summative assessment over a carefully designed physical concept understanding task, PhysiCo. Our task alleviates the memorization issue via the usage of grid-format inputs that abstractly describe physical phenomena. The grids represents varying levels of understanding, from the core phenomenon, application examples to analogies to other abstract patterns in the grid world. A comprehensive study on our task demonstrates: (1) state-of-the-art LLMs, including GPT-4o, o1 and Gemini 2.0 flash thinking, lag behind humans by ~40%; (2) the stochastic parrot phenomenon is present in LLMs, as they fail on our grid task but can describe and recognize the same concepts well in natural language; (3) our task challenges the LLMs due to intrinsic difficulties rather than the unfamiliar grid format, as in-context learning and fine-tuning on same formatted data added little to their performance.

* NAACL 2025 Main Conference. First 5 authors contributed equally. Project page: https://physico-benchmark.github.io/

Via

Access Paper or Ask Questions

The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters

Jan 03, 2025

Chulun Zhou, Qiujing Wang, Mo Yu, Xiaoqian Yue, Rui Lu, Jiangnan Li, Yifan Zhou, Shunchi Zhang, Jie Zhou, Wai Lam

Figure 1 for The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters

Figure 2 for The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters

Figure 3 for The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters

Figure 4 for The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters

Abstract:Theory-of-Mind (ToM) is a fundamental psychological capability that allows humans to understand and interpret the mental states of others. Humans infer others' thoughts by integrating causal cues and indirect clues from broad contextual information, often derived from past interactions. In other words, human ToM heavily relies on the understanding about the backgrounds and life stories of others. Unfortunately, this aspect is largely overlooked in existing benchmarks for evaluating machines' ToM capabilities, due to their usage of short narratives without global backgrounds. In this paper, we verify the importance of understanding long personal backgrounds in ToM and assess the performance of LLMs in such realistic evaluation scenarios. To achieve this, we introduce a novel benchmark, CharToM-QA, comprising 1,035 ToM questions based on characters from classic novels. Our human study reveals a significant disparity in performance: the same group of educated participants performs dramatically better when they have read the novels compared to when they have not. In parallel, our experiments on state-of-the-art LLMs, including the very recent o1 model, show that LLMs still perform notably worse than humans, despite that they have seen these stories during pre-training. This highlights the limitations of current LLMs in capturing the nuanced contextual information required for ToM reasoning.

* 17 pages, under review

Via

Access Paper or Ask Questions

NeuralMAG: Fast and Generalizable Micromagnetic Simulation with Deep Neural Nets

Oct 19, 2024

Yunqi Cai, Jiangnan Li, Dong Wang

Abstract:Micromagnetics has made significant strides, particularly due to its wide-ranging applications in magnetic storage design. Numerical simulation is a cornerstone of micromagnetics research, relying on first-principle rules to compute the dynamic evolution of micromagnetic systems based on the renowned LLG equation, named after Landau, Lifshitz, and Gilbert. However, simulations are often hindered by their slow speed. Although Fast-Fourier transformation (FFT) calculations reduce the computational complexity to O(NlogN), it remains impractical for large-scale simulations. In this paper, we introduce NeuralMAG, a deep learning approach to micromagnetic simulation. Our approach follows the LLG iterative framework but accelerates demagnetizing field computation through the employment of a U-shaped neural network (Unet). The Unet architecture comprises an encoder that extracts aggregated spins at various scales and learns the local interaction at each scale, followed by a decoder that accumulates the local interactions at different scales to approximate the global convolution. This divide-and-accumulate scheme achieves a time complexity of O(N), significantly enhancing the speed and feasibility of large-scale simulations. Unlike existing neural methods, NeuralMAG concentrates on the core computation rather than an end-to-end approximation for a specific task, making it inherently generalizable. To validate the new approach, we trained a single model and evaluated it on two micromagnetics tasks with various sample sizes, shapes, and material settings.

Via

Access Paper or Ask Questions

Think out Loud: Emotion Deducing Explanation in Dialogues

Jun 07, 2024

Jiangnan Li, Zheng Lin, Lanrui Wang, Qingyi Si, Yanan Cao, Mo Yu, Peng Fu, Weiping Wang, Jie Zhou

Abstract:Humans convey emotions through daily dialogues, making emotion understanding a crucial step of affective intelligence. To understand emotions in dialogues, machines are asked to recognize the emotion for an utterance (Emotion Recognition in Dialogues, ERD); based on the emotion, then find causal utterances for the emotion (Emotion Cause Extraction in Dialogues, ECED). The setting of the two tasks requires first ERD and then ECED, ignoring the mutual complement between emotion and cause. To fix this, some new tasks are proposed to extract them simultaneously. Although the current research on these tasks has excellent achievements, simply identifying emotion-related factors by classification modeling lacks realizing the specific thinking process of causes stimulating the emotion in an explainable way. This thinking process especially reflected in the reasoning ability of Large Language Models (LLMs) is under-explored. To this end, we propose a new task "Emotion Deducing Explanation in Dialogues" (EDEN). EDEN recognizes emotion and causes in an explicitly thinking way. That is, models need to generate an explanation text, which first summarizes the causes; analyzes the inner activities of the speakers triggered by the causes using common sense; then guesses the emotion accordingly. To support the study of EDEN, based on the existing resources in ECED, we construct two EDEN datasets by human effort. We further evaluate different models on EDEN and find that LLMs are more competent than conventional PLMs. Besides, EDEN can help LLMs achieve better recognition of emotions and causes, which explores a new research direction of explainable emotion understanding in dialogues.

Via

Access Paper or Ask Questions

Graph Representation of Narrative Context: Coherence Dependency via Retrospective Questions

Feb 21, 2024

Liyan Xu, Jiangnan Li, Mo Yu, Jie Zhou

Abstract:This work introduces a novel and practical paradigm for narrative comprehension, stemming from the observation that individual passages within narratives are often cohesively related than being isolated. We therefore propose to formulate a graph upon narratives dubbed NARCO that depicts a task-agnostic coherence dependency of the entire context. Especially, edges in NARCO encompass retrospective free-form questions between two context snippets reflecting high-level coherent relations, inspired by the cognitive perception of humans who constantly reinstate relevant events from prior context. Importantly, our graph is instantiated through our designed two-stage LLM prompting, thereby without reliance on human annotations. We present three unique studies on its practical utility, examining the edge efficacy via recap identification, local context augmentation via plot retrieval, and broader applications exemplified by long document QA. Experiments suggest that our approaches leveraging NARCO yield performance boost across all three tasks.

Via

Access Paper or Ask Questions

Previously on the Stories: Recap Snippet Identification for Story Reading

Feb 11, 2024

Jiangnan Li, Qiujing Wang, Liyan Xu, Wenjie Pang, Mo Yu, Zheng Lin, Weiping Wang, Jie Zhou

Figure 1 for Previously on the Stories: Recap Snippet Identification for Story Reading

Figure 2 for Previously on the Stories: Recap Snippet Identification for Story Reading

Figure 3 for Previously on the Stories: Recap Snippet Identification for Story Reading

Figure 4 for Previously on the Stories: Recap Snippet Identification for Story Reading

Abstract:Similar to the "previously-on" scenes in TV shows, recaps can help book reading by recalling the readers' memory about the important elements in previous texts to better understand the ongoing plot. Despite its usefulness, this application has not been well studied in the NLP community. We propose the first benchmark on this useful task called Recap Snippet Identification with a hand-crafted evaluation dataset. Our experiments show that the proposed task is challenging to PLMs, LLMs, and proposed methods as the task requires a deep understanding of the plot correlation between snippets.

Via

Access Paper or Ask Questions

SIG: Speaker Identification in Literature via Prompt-Based Generation

Dec 22, 2023

Zhenlin Su, Liyan Xu, Jin Xu, Jiangnan Li, Mingdu Huangfu

Figure 1 for SIG: Speaker Identification in Literature via Prompt-Based Generation

Figure 2 for SIG: Speaker Identification in Literature via Prompt-Based Generation

Figure 3 for SIG: Speaker Identification in Literature via Prompt-Based Generation

Figure 4 for SIG: Speaker Identification in Literature via Prompt-Based Generation

Abstract:Identifying speakers of quotations in narratives is an important task in literary analysis, with challenging scenarios including the out-of-domain inference for unseen speakers, and non-explicit cases where there are no speaker mentions in surrounding context. In this work, we propose a simple and effective approach SIG, a generation-based method that verbalizes the task and quotation input based on designed prompt templates, which also enables easy integration of other auxiliary tasks that further bolster the speaker identification performance. The prediction can either come from direct generation by the model, or be determined by the highest generation probability of each speaker candidate. Based on our approach design, SIG supports out-of-domain evaluation, and achieves open-world classification paradigm that is able to accept any forms of candidate input. We perform both cross-domain evaluation and in-domain evaluation on PDNC, the largest dataset of this task, where empirical results suggest that SIG outperforms previous baselines of complicated designs, as well as the zero-shot ChatGPT, especially excelling at those hard non-explicit scenarios by up to 17% improvement. Additional experiments on another dataset WP further corroborate the efficacy of SIG.

* Accepted to AAAI 2024

Via

Access Paper or Ask Questions

Enhancing Empathetic and Emotion Support Dialogue Generation with Prophetic Commonsense Inference

Nov 26, 2023

Lanrui Wang, Jiangnan Li, Chenxu Yang, Zheng Lin, Weiping Wang

Figure 1 for Enhancing Empathetic and Emotion Support Dialogue Generation with Prophetic Commonsense Inference

Figure 2 for Enhancing Empathetic and Emotion Support Dialogue Generation with Prophetic Commonsense Inference

Figure 3 for Enhancing Empathetic and Emotion Support Dialogue Generation with Prophetic Commonsense Inference

Figure 4 for Enhancing Empathetic and Emotion Support Dialogue Generation with Prophetic Commonsense Inference

Abstract:The interest in Empathetic and Emotional Support conversations among the public has significantly increased. To offer more sensitive and understanding responses, leveraging commonsense knowledge has become a common strategy to better understand psychological aspects and causality. However, such commonsense inferences can be out of context and unable to predict upcoming dialogue themes, resulting in responses that lack coherence and empathy. To remedy this issue, we present Prophetic Commonsense Inference, an innovative paradigm for inferring commonsense knowledge. By harnessing the capabilities of Large Language Models in understanding dialogue and making commonsense deductions, we train tunable models to bridge the gap between past and potential future dialogues. Extensive experiments conducted on EmpatheticDialogues and Emotion Support Conversation show that equipping dialogue agents with our proposed prophetic commonsense inference significantly enhances the quality of their responses.

Via

Access Paper or Ask Questions