Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yajing Sun

Enhancing Semantic Consistency of Large Language Models through Model Editing: An Interpretability-Oriented Approach

Jan 19, 2025

Jingyuan Yang, Dapeng Chen, Yajing Sun, Rongjun Li, Zhiyong Feng, Wei Peng

Abstract:A Large Language Model (LLM) tends to generate inconsistent and sometimes contradictory outputs when presented with a prompt that has equivalent semantics but is expressed differently from the original prompt. To achieve semantic consistency of an LLM, one of the key approaches is to finetune the model with prompt-output pairs with semantically equivalent meanings. Despite its effectiveness, a data-driven finetuning method incurs substantial computation costs in data preparation and model optimization. In this regime, an LLM is treated as a ``black box'', restricting our ability to gain deeper insights into its internal mechanism. In this paper, we are motivated to enhance the semantic consistency of LLMs through a more interpretable method (i.e., model editing) to this end. We first identify the model components (i.e., attention heads) that have a key impact on the semantic consistency of an LLM. We subsequently inject biases into the output of these model components along the semantic-consistency activation direction. It is noteworthy that these modifications are cost-effective, without reliance on mass manipulations of the original model parameters. Through comprehensive experiments on the constructed NLU and open-source NLG datasets, our method demonstrates significant improvements in the semantic consistency and task performance of LLMs. Additionally, our method exhibits promising generalization capabilities by performing well on tasks beyond the primary tasks.

Via

Access Paper or Ask Questions

P/D-Serve: Serving Disaggregated Large Language Model at Scale

Aug 15, 2024

Yibo Jin, Tao Wang, Huimin Lin, Mingyang Song, Peiyang Li, Yipeng Ma, Yicheng Shan, Zhengfan Yuan, Cailong Li, Yajing Sun(+20 more)

Abstract:Serving disaggregated large language models (LLMs) over tens of thousands of xPU devices (GPUs or NPUs) with reliable performance faces multiple challenges. 1) Ignoring the diversity (various prefixes and tidal requests), treating all the prompts in a mixed pool is inadequate. To facilitate the similarity per scenario and minimize the inner mismatch on P/D (prefill and decoding) processing, fine-grained organization is required, dynamically adjusting P/D ratios for better performance. 2) Due to inaccurate estimation on workload (queue status or maintained connections), the global scheduler easily incurs unnecessary timeouts in prefill. 3) Block-fixed device-to-device (D2D) KVCache transfer over cluster-level RDMA (remote direct memory access) fails to achieve desired D2D utilization as expected. To overcome previous problems, this paper proposes an end-to-end system P/D-Serve, complying with the paradigm of MLOps (machine learning operations), which models end-to-end (E2E) P/D performance and enables: 1) fine-grained P/D organization, mapping the service with RoCE (RDMA over converged ethernet) as needed, to facilitate similar processing and dynamic adjustments on P/D ratios; 2) on-demand forwarding upon rejections for idle prefill, decoupling the scheduler from regular inaccurate reports and local queues, to avoid timeouts in prefill; and 3) efficient KVCache transfer via optimized D2D access. P/D-Serve is implemented upon Ascend and MindSpore, has been deployed over tens of thousands of NPUs for more than eight months in commercial use, and further achieves 60\%, 42\% and 46\% improvements on E2E throughput, time-to-first-token (TTFT) SLO (service level objective) and D2D transfer time. As the E2E system with optimizations, P/D-Serve achieves 6.7x increase on throughput, compared with aggregated LLMs.

Via

Access Paper or Ask Questions

Knowledge-augmented Frame Semantic Parsing with Hybrid Prompt-tuning

Mar 25, 2023

Rui Zhang, Yajing Sun, Jingyuan Yang, Wei Peng

Figure 1 for Knowledge-augmented Frame Semantic Parsing with Hybrid Prompt-tuning

Figure 2 for Knowledge-augmented Frame Semantic Parsing with Hybrid Prompt-tuning

Figure 3 for Knowledge-augmented Frame Semantic Parsing with Hybrid Prompt-tuning

Figure 4 for Knowledge-augmented Frame Semantic Parsing with Hybrid Prompt-tuning

Abstract:Frame semantics-based approaches have been widely used in semantic parsing tasks and have become mainstream. It remains challenging to disambiguate frame representations evoked by target lexical units under different contexts. Pre-trained Language Models (PLMs) have been used in semantic parsing and significantly improve the accuracy of neural parsers. However, the PLMs-based approaches tend to favor collocated patterns presented in the training data, leading to inaccurate outcomes. The intuition here is to design a mechanism to optimally use knowledge captured in semantic frames in conjunction with PLMs to disambiguate frames. We propose a novel Knowledge-Augmented Frame Semantic Parsing Architecture (KAF-SPA) to enhance semantic representation by incorporating accurate frame knowledge into PLMs during frame semantic parsing. Specifically, a Memory-based Knowledge Extraction Module (MKEM) is devised to select accurate frame knowledge and construct the continuous templates in the high dimensional vector space. Moreover, we design a Task-oriented Knowledge Probing Module (TKPM) using hybrid prompts (in terms of continuous and discrete prompts) to incorporate the selected knowledge into the PLMs and adapt PLMs to the tasks of frame and argument identification. Experimental results on two public FrameNet datasets demonstrate that our method significantly outperforms strong baselines (by more than +3$\%$ in F1), achieving state-of-art results on the current benchmark. Ablation studies verify the effectiveness of KAF-SPA.

* accepted by ICASSP 2023

Via

Access Paper or Ask Questions

Do You Know My Emotion? Emotion-Aware Strategy Recognition towards a Persuasive Dialogue System

Jun 24, 2022

Wei Peng, Yue Hu, Luxi Xing, Yuqiang Xie, Yajing Sun

Figure 1 for Do You Know My Emotion? Emotion-Aware Strategy Recognition towards a Persuasive Dialogue System

Figure 2 for Do You Know My Emotion? Emotion-Aware Strategy Recognition towards a Persuasive Dialogue System

Figure 3 for Do You Know My Emotion? Emotion-Aware Strategy Recognition towards a Persuasive Dialogue System

Figure 4 for Do You Know My Emotion? Emotion-Aware Strategy Recognition towards a Persuasive Dialogue System

Abstract:Persuasive strategy recognition task requires the system to recognize the adopted strategy of the persuader according to the conversation. However, previous methods mainly focus on the contextual information, little is known about incorporating the psychological feedback, i.e. emotion of the persuadee, to predict the strategy. In this paper, we propose a Cross-channel Feedback memOry Network (CFO-Net) to leverage the emotional feedback to iteratively measure the potential benefits of strategies and incorporate them into the contextual-aware dialogue information. Specifically, CFO-Net designs a feedback memory module, including strategy pool and feedback pool, to obtain emotion-aware strategy representation. The strategy pool aims to store historical strategies and the feedback pool is to obtain updated strategy weight based on feedback emotional information. Furthermore, a cross-channel fusion predictor is developed to make a mutual interaction between the emotion-aware strategy representation and the contextual-aware dialogue information for strategy recognition. Experimental results on \textsc{PersuasionForGood} confirm that the proposed model CFO-Net is effective to improve the performance on M-F1 from 61.74 to 65.41.

* Accepted by ECML-PKDD 2022

Via

Access Paper or Ask Questions

CogIntAc: Modeling the Relationships between Intention, Emotion and Action in Interactive Process from Cognitive Perspective

May 16, 2022

Wei Peng, Yue Hu, Yuqiang Xie, Luxi Xing, Yajing Sun

Figure 1 for CogIntAc: Modeling the Relationships between Intention, Emotion and Action in Interactive Process from Cognitive Perspective

Figure 2 for CogIntAc: Modeling the Relationships between Intention, Emotion and Action in Interactive Process from Cognitive Perspective

Figure 3 for CogIntAc: Modeling the Relationships between Intention, Emotion and Action in Interactive Process from Cognitive Perspective

Figure 4 for CogIntAc: Modeling the Relationships between Intention, Emotion and Action in Interactive Process from Cognitive Perspective

Abstract:Intention, emotion and action are important psychological factors in human activities, which play an important role in the interaction between individuals. How to model the interaction process between individuals by analyzing the relationship of their intentions, emotions, and actions at the cognitive level is challenging. In this paper, we propose a novel cognitive framework of individual interaction. The core of the framework is that individuals achieve interaction through external action driven by their inner intention. Based on this idea, the interactions between individuals can be constructed by establishing relationships between the intention, emotion and action. Furthermore, we conduct analysis on the interaction between individuals and give a reasonable explanation for the predicting results. To verify the effectiveness of the framework, we reconstruct a dataset and propose three tasks as well as the corresponding baseline models, including action abduction, emotion prediction and action generation. The novel framework shows an interesting perspective on mimicking the mental state of human beings in cognitive science.

* Accepted by IJCNN 2022

Via

Access Paper or Ask Questions

Control Globally, Understand Locally: A Global-to-Local Hierarchical Graph Network for Emotional Support Conversation

Apr 27, 2022

Wei Peng, Yue Hu, Luxi Xing, Yuqiang Xie, Yajing Sun, Yunpeng Li

Figure 1 for Control Globally, Understand Locally: A Global-to-Local Hierarchical Graph Network for Emotional Support Conversation

Figure 2 for Control Globally, Understand Locally: A Global-to-Local Hierarchical Graph Network for Emotional Support Conversation

Figure 3 for Control Globally, Understand Locally: A Global-to-Local Hierarchical Graph Network for Emotional Support Conversation

Figure 4 for Control Globally, Understand Locally: A Global-to-Local Hierarchical Graph Network for Emotional Support Conversation

Abstract:Emotional support conversation aims at reducing the emotional distress of the help-seeker, which is a new and challenging task. It requires the system to explore the cause of help-seeker's emotional distress and understand their psychological intention to provide supportive responses. However, existing methods mainly focus on the sequential contextual information, ignoring the hierarchical relationships with the global cause and local psychological intention behind conversations, thus leads to a weak ability of emotional support. In this paper, we propose a Global-to-Local Hierarchical Graph Network to capture the multi-source information (global cause, local intentions and dialog history) and model hierarchical relationships between them, which consists of a multi-source encoder, a hierarchical graph reasoner, and a global-guide decoder. Furthermore, a novel training objective is designed to monitor semantic information of the global cause. Experimental results on the emotional support conversation dataset, ESConv, confirm that the proposed GLHG has achieved the state-of-the-art performance on the automatic and human evaluations.

* Accepted by IJCAI,2022

Via

Access Paper or Ask Questions

Modeling Intention, Emotion and External World in Dialogue Systems

Feb 14, 2022

Wei Peng, Yue Hu, Luxi Xing, Yuqiang Xie, Xingsheng Zhang, Yajing Sun

Figure 1 for Modeling Intention, Emotion and External World in Dialogue Systems

Figure 2 for Modeling Intention, Emotion and External World in Dialogue Systems

Figure 3 for Modeling Intention, Emotion and External World in Dialogue Systems

Figure 4 for Modeling Intention, Emotion and External World in Dialogue Systems

Abstract:Intention, emotion and action are important elements in human activities. Modeling the interaction process between individuals by analyzing the relationships between these elements is a challenging task. However, previous work mainly focused on modeling intention and emotion independently, and neglected of exploring the mutual relationships between intention and emotion. In this paper, we propose a RelAtion Interaction Network (RAIN), consisting of Intention Relation Module and Emotion Relation Module, to jointly model mutual relationships and explicitly integrate historical intention information. The experiments on the dataset show that our model can take full advantage of the intention, emotion and action between individuals and achieve a remarkable improvement over BERT-style baselines. Qualitative analysis verifies the importance of the mutual interaction between the intention and emotion.

* Accepted by ICASSP 2022

Via

Access Paper or Ask Questions

Know Deeper: Knowledge-Conversation Cyclic Utilization Mechanism for Open-domain Dialogue Generation

Jul 16, 2021

Yajing Sun, Yue Hu, Luxi Xing, Yuqiang Xie, Xiangpeng Wei

Figure 1 for Know Deeper: Knowledge-Conversation Cyclic Utilization Mechanism for Open-domain Dialogue Generation

Figure 2 for Know Deeper: Knowledge-Conversation Cyclic Utilization Mechanism for Open-domain Dialogue Generation

Figure 3 for Know Deeper: Knowledge-Conversation Cyclic Utilization Mechanism for Open-domain Dialogue Generation

Figure 4 for Know Deeper: Knowledge-Conversation Cyclic Utilization Mechanism for Open-domain Dialogue Generation

Abstract:End-to-End intelligent neural dialogue systems suffer from the problems of generating inconsistent and repetitive responses. Existing dialogue models pay attention to unilaterally incorporating personal knowledge into the dialog while ignoring the fact that incorporating the personality-related conversation information into personal knowledge taken as the bilateral information flow boosts the quality of the subsequent conversation. Besides, it is indispensable to control personal knowledge utilization over the conversation level. In this paper, we propose a conversation-adaption multi-view persona aware response generation model that aims at enhancing conversation consistency and alleviating the repetition from two folds. First, we consider conversation consistency from multiple views. From the view of the persona profile, we design a novel interaction module that not only iteratively incorporates personalized knowledge into each turn conversation but also captures the personality-related information from conversation to enhance personalized knowledge semantic representation. From the view of speaking style, we introduce the speaking style vector and feed it into the decoder to keep the speaking style consistency. To avoid conversation repetition, we devise a coverage mechanism to keep track of the activation of personal knowledge utilization. Experiments on both automatic and human evaluation verify the superiority of our model over previous models.

Via

Access Paper or Ask Questions

MCR-Net: A Multi-Step Co-Interactive Relation Network for Unanswerable Questions on Machine Reading Comprehension

Mar 08, 2021

Wei Peng, Yue Hu, Jing Yu, Luxi Xing, Yuqiang Xie, Zihao Zhu, Yajing Sun

Figure 1 for MCR-Net: A Multi-Step Co-Interactive Relation Network for Unanswerable Questions on Machine Reading Comprehension

Figure 2 for MCR-Net: A Multi-Step Co-Interactive Relation Network for Unanswerable Questions on Machine Reading Comprehension

Figure 3 for MCR-Net: A Multi-Step Co-Interactive Relation Network for Unanswerable Questions on Machine Reading Comprehension

Figure 4 for MCR-Net: A Multi-Step Co-Interactive Relation Network for Unanswerable Questions on Machine Reading Comprehension

Abstract:Question answering systems usually use keyword searches to retrieve potential passages related to a question, and then extract the answer from passages with the machine reading comprehension methods. However, many questions tend to be unanswerable in the real world. In this case, it is significant and challenging how the model determines when no answer is supported by the passage and abstains from answering. Most of the existing systems design a simple classifier to determine answerability implicitly without explicitly modeling mutual interaction and relation between the question and passage, leading to the poor performance for determining the unanswerable questions. To tackle this problem, we propose a Multi-Step Co-Interactive Relation Network (MCR-Net) to explicitly model the mutual interaction and locate key clues from coarse to fine by introducing a co-interactive relation module. The co-interactive relation module contains a stack of interaction and fusion blocks to continuously integrate and fuse history-guided and current-query-guided clues in an explicit way. Experiments on the SQuAD 2.0 and DuReader datasets show that our model achieves a remarkable improvement, outperforming the BERT-style baselines in literature. Visualization analysis also verifies the importance of the mutual interaction between the question and passage.

* Accepted to ICASSP 2021

Via

Access Paper or Ask Questions

Bi-directional Cognitive Thinking Network for Machine Reading Comprehension

Oct 20, 2020

Wei Peng, Yue Hu, Luxi Xing, Yuqiang Xie, Jing Yu, Yajing Sun, Xiangpeng Wei

Figure 1 for Bi-directional Cognitive Thinking Network for Machine Reading Comprehension

Figure 2 for Bi-directional Cognitive Thinking Network for Machine Reading Comprehension

Figure 3 for Bi-directional Cognitive Thinking Network for Machine Reading Comprehension

Figure 4 for Bi-directional Cognitive Thinking Network for Machine Reading Comprehension

Abstract:We propose a novel Bi-directional Cognitive Knowledge Framework (BCKF) for reading comprehension from the perspective of complementary learning systems theory. It aims to simulate two ways of thinking in the brain to answer questions, including reverse thinking and inertial thinking. To validate the effectiveness of our framework, we design a corresponding Bi-directional Cognitive Thinking Network (BCTN) to encode the passage and generate a question (answer) given an answer (question) and decouple the bi-directional knowledge. The model has the ability to reverse reasoning questions which can assist inertial thinking to generate more accurate answers. Competitive improvement is observed in DuReader dataset, confirming our hypothesis that bi-directional knowledge helps the QA task. The novel framework shows an interesting perspective on machine reading comprehension and cognitive science.

* Accepted to COLING 2020

Via

Access Paper or Ask Questions