Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Donghong Han

SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture

Oct 10, 2024

Jiayi Han, Liang Du, Hongwei Du, Xiangguo Zhou, Yiwen Wu, Weibo Zheng, Donghong Han

Figure 1 for SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture

Figure 2 for SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture

Figure 3 for SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture

Figure 4 for SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture

Abstract:Although many efforts have been made, it is still a challenge to balance the training budget, downstream performance, and the general capabilities of the LLMs in many applications. Training the whole model for downstream tasks is expensive, and could easily result in catastrophic forgetting. By introducing parameter-efficient fine-tuning (PEFT), the training cost could be reduced, but it still suffers from forgetting, and limits the learning on the downstream tasks. To efficiently fine-tune the LLMs with less limitation to their downstream performance while mitigating the forgetting of general capabilities, we propose a novel mixture of expert (MoE) framework based on Soft LoRA and Identity Mixture (SLIM), that allows dynamic routing between LoRA adapters and skipping connection, enables the suppression of forgetting. We adopt weight-yielding with sliding clustering for better out-of-domain distinguish to enhance the routing. We also propose to convert the mixture of low-rank adapters to the model merging formulation and introduce fast dynamic merging of LoRA adapters to keep the general capabilities of the base model. Extensive experiments demonstrate that the proposed SLIM is comparable to the state-of-the-art PEFT approaches on the downstream tasks while achieving the leading performance in mitigating catastrophic forgetting.

* 11 pages, 6 figures, 4 tables

Via

Access Paper or Ask Questions

CAB: Empathetic Dialogue Generation with Cognition, Affection and Behavior

Feb 03, 2023

Pan Gao, Donghong Han, Rui Zhou, Xuejiao Zhang, Zikun Wang

Figure 1 for CAB: Empathetic Dialogue Generation with Cognition, Affection and Behavior

Figure 2 for CAB: Empathetic Dialogue Generation with Cognition, Affection and Behavior

Figure 3 for CAB: Empathetic Dialogue Generation with Cognition, Affection and Behavior

Figure 4 for CAB: Empathetic Dialogue Generation with Cognition, Affection and Behavior

Abstract:Empathy is an important characteristic to be considered when building a more intelligent and humanized dialogue agent. However, existing methods did not fully comprehend empathy as a complex process involving three aspects: cognition, affection and behavior. In this paper, we propose CAB, a novel framework that takes a comprehensive perspective of cognition, affection and behavior to generate empathetic responses. For cognition, we build paths between critical keywords in the dialogue by leveraging external knowledge. This is because keywords in a dialogue are the core of sentences. Building the logic relationship between keywords, which is overlooked by the majority of existing works, can improve the understanding of keywords and contextual logic, thus enhance the cognitive ability. For affection, we capture the emotional dependencies with dual latent variables that contain both interlocutors' emotions. The reason is that considering both interlocutors' emotions simultaneously helps to learn the emotional dependencies. For behavior, we use appropriate dialogue acts to guide the dialogue generation to enhance the empathy expression. Extensive experiments demonstrate that our multi-perspective model outperforms the state-of-the-art models in both automatic and manual evaluation.

* accepted as a short paper at DASFAA 2023

Via

Access Paper or Ask Questions

Fine-Grained Emotion Classification of Chinese Microblogs Based on Graph Convolution Networks

Dec 05, 2019

Yuni Lai, Linfeng Zhang, Donghong Han, Rui Zhou, Guoren Wang

Figure 1 for Fine-Grained Emotion Classification of Chinese Microblogs Based on Graph Convolution Networks

Figure 2 for Fine-Grained Emotion Classification of Chinese Microblogs Based on Graph Convolution Networks

Figure 3 for Fine-Grained Emotion Classification of Chinese Microblogs Based on Graph Convolution Networks

Figure 4 for Fine-Grained Emotion Classification of Chinese Microblogs Based on Graph Convolution Networks

Abstract:Microblogs are widely used to express people's opinions and feelings in daily life. Sentiment analysis (SA) can timely detect personal sentiment polarities through analyzing text. Deep learning approaches have been broadly used in SA but still have not fully exploited syntax information. In this paper, we propose a syntax-based graph convolution network (GCN) model to enhance the understanding of diverse grammatical structures of Chinese microblogs. In addition, a pooling method based on percentile is proposed to improve the accuracy of the model. In experiments, for Chinese microblogs emotion classification categories including happiness, sadness, like, anger, disgust, fear, and surprise, the F-measure of our model reaches 82.32% and exceeds the state-of-the-art algorithm by 5.90%. The experimental results show that our model can effectively utilize the information of dependency parsing to improve the performance of emotion detection. What is more, we annotate a new dataset for Chinese emotion classification, which is open to other researchers.

* 20 pages, 6 figures, submitted to the World Wide Web Journal

Via

Access Paper or Ask Questions