Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Guangneng Hu

SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning

May 05, 2025

Jinpeng Chen, Runmin Cong, Yuzhi Zhao, Hongzheng Yang, Guangneng Hu, Horace Ho Shing Ip, Sam Kwong

Abstract:Multimodal Continual Instruction Tuning (MCIT) aims to enable Multimodal Large Language Models (MLLMs) to incrementally learn new tasks without catastrophic forgetting. In this paper, we explore forgetting in this context, categorizing it into superficial forgetting and essential forgetting. Superficial forgetting refers to cases where the model's knowledge may not be genuinely lost, but its responses to previous tasks deviate from expected formats due to the influence of subsequent tasks' answer styles, making the results unusable. By contrast, essential forgetting refers to situations where the model provides correctly formatted but factually inaccurate answers, indicating a true loss of knowledge. Assessing essential forgetting necessitates addressing superficial forgetting first, as severe superficial forgetting can obscure the model's knowledge state. Hence, we first introduce the Answer Style Diversification (ASD) paradigm, which defines a standardized process for transforming data styles across different tasks, unifying their training sets into similarly diversified styles to prevent superficial forgetting caused by style shifts. Building on this, we propose RegLoRA to mitigate essential forgetting. RegLoRA stabilizes key parameters where prior knowledge is primarily stored by applying regularization, enabling the model to retain existing competencies. Experimental results demonstrate that our overall method, SEFE, achieves state-of-the-art performance.

Via

Access Paper or Ask Questions

ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation

Dec 24, 2024

Mengyang Wu, Yuzhi Zhao, Jialun Cao, Mingjie Xu, Zhongming Jiang, Xuehui Wang, Qinbin Li, Guangneng Hu, Shengchao Qin, Chi-Wing Fu

Figure 1 for ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation

Figure 2 for ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation

Figure 3 for ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation

Figure 4 for ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation

Abstract:Controversial contents largely inundate the Internet, infringing various cultural norms and child protection standards. Traditional Image Content Moderation (ICM) models fall short in producing precise moderation decisions for diverse standards, while recent multimodal large language models (MLLMs), when adopted to general rule-based ICM, often produce classification and explanation results that are inconsistent with human moderators. Aiming at flexible, explainable, and accurate ICM, we design a novel rule-based dataset generation pipeline, decomposing concise human-defined rules and leveraging well-designed multi-stage prompts to enrich short explicit image annotations. Our ICM-Instruct dataset includes detailed moderation explanation and moderation Q-A pairs. Built upon it, we create our ICM-Assistant model in the framework of rule-based ICM, making it readily applicable in real practice. Our ICM-Assistant model demonstrates exceptional performance and flexibility. Specifically, it significantly outperforms existing approaches on various sources, improving both the moderation classification (36.8\% on average) and moderation explanation quality (26.6\% on average) consistently over existing MLLMs. Code/Data is available at https://github.com/zhaoyuzhi/ICM-Assistant.

* AAAI 2025

Via

Access Paper or Ask Questions

Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA

Dec 21, 2023

Chengen Lai, Shengli Song, Shiqi Meng, Jingyang Li, Sitong Yan, Guangneng Hu

Figure 1 for Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA

Figure 2 for Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA

Figure 3 for Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA

Figure 4 for Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA

Abstract:Natural language explanation in visual question answer (VQA-NLE) aims to explain the decision-making process of models by generating natural language sentences to increase users' trust in the black-box systems. Existing post-hoc methods have achieved significant progress in obtaining a plausible explanation. However, such post-hoc explanations are not always aligned with human logical inference, suffering from the issues on: 1) Deductive unsatisfiability, the generated explanations do not logically lead to the answer; 2) Factual inconsistency, the model falsifies its counterfactual explanation for answers without considering the facts in images; and 3) Semantic perturbation insensitivity, the model can not recognize the semantic changes caused by small perturbations. These problems reduce the faithfulness of explanations generated by models. To address the above issues, we propose a novel self-supervised \textbf{M}ulti-level \textbf{C}ontrastive \textbf{L}earning based natural language \textbf{E}xplanation model (MCLE) for VQA with semantic-level, image-level, and instance-level factual and counterfactual samples. MCLE extracts discriminative features and aligns the feature spaces from explanations with visual question and answer to generate more consistent explanations. We conduct extensive experiments, ablation analysis, and case study to demonstrate the effectiveness of our method on two VQA-NLE benchmarks.

* AAAI 2024

Via

Access Paper or Ask Questions

Dual Side Deep Context-aware Modulation for Social Recommendation

Mar 16, 2021

Bairan Fu, Wenming Zhang, Guangneng Hu, Xinyu Dai, Shujian Huang, Jiajun Chen

Figure 1 for Dual Side Deep Context-aware Modulation for Social Recommendation

Figure 2 for Dual Side Deep Context-aware Modulation for Social Recommendation

Figure 3 for Dual Side Deep Context-aware Modulation for Social Recommendation

Figure 4 for Dual Side Deep Context-aware Modulation for Social Recommendation

Abstract:Social recommendation is effective in improving the recommendation performance by leveraging social relations from online social networking platforms. Social relations among users provide friends' information for modeling users' interest in candidate items and help items expose to potential consumers (i.e., item attraction). However, there are two issues haven't been well-studied: Firstly, for the user interests, existing methods typically aggregate friends' information contextualized on the candidate item only, and this shallow context-aware aggregation makes them suffer from the limited friends' information. Secondly, for the item attraction, if the item's past consumers are the friends of or have a similar consumption habit to the targeted user, the item may be more attractive to the targeted user, but most existing methods neglect the relation enhanced context-aware item attraction. To address the above issues, we proposed DICER (Dual Side Deep Context-aware Modulation for SocialRecommendation). Specifically, we first proposed a novel graph neural network to model the social relation and collaborative relation, and on top of high-order relations, a dual side deep context-aware modulation is introduced to capture the friends' information and item attraction. Empirical results on two real-world datasets show the effectiveness of the proposed model and further experiments are conducted to help understand how the dual context-aware modulation works.

* Accepted by WWW2021 Conference

Via

Access Paper or Ask Questions

TrNews: Heterogeneous User-Interest Transfer Learning for News Recommendation

Jan 27, 2021

Guangneng Hu, Qiang Yang

Figure 1 for TrNews: Heterogeneous User-Interest Transfer Learning for News Recommendation

Figure 2 for TrNews: Heterogeneous User-Interest Transfer Learning for News Recommendation

Figure 3 for TrNews: Heterogeneous User-Interest Transfer Learning for News Recommendation

Figure 4 for TrNews: Heterogeneous User-Interest Transfer Learning for News Recommendation

Abstract:We investigate how to solve the cross-corpus news recommendation for unseen users in the future. This is a problem where traditional content-based recommendation techniques often fail. Luckily, in real-world recommendation services, some publisher (e.g., Daily news) may have accumulated a large corpus with lots of consumers which can be used for a newly deployed publisher (e.g., Political news). To take advantage of the existing corpus, we propose a transfer learning model (dubbed as TrNews) for news recommendation to transfer the knowledge from a source corpus to a target corpus. To tackle the heterogeneity of different user interests and of different word distributions across corpora, we design a translator-based transfer-learning strategy to learn a representation mapping between source and target corpora. The learned translator can be used to generate representations for unseen users in the future. We show through experiments on real-world datasets that TrNews is better than various baselines in terms of four metrics. We also show that our translator is effective among existing transfer strategies.

* EACL 2021

Via

Access Paper or Ask Questions

PrivNet: Safeguarding Private Attributes in Transfer Learning for Recommendation

Oct 16, 2020

Guangneng Hu, Qiang Yang

Figure 1 for PrivNet: Safeguarding Private Attributes in Transfer Learning for Recommendation

Figure 2 for PrivNet: Safeguarding Private Attributes in Transfer Learning for Recommendation

Figure 3 for PrivNet: Safeguarding Private Attributes in Transfer Learning for Recommendation

Figure 4 for PrivNet: Safeguarding Private Attributes in Transfer Learning for Recommendation

Abstract:Transfer learning is an effective technique to improve a target recommender system with the knowledge from a source domain. Existing research focuses on the recommendation performance of the target domain while ignores the privacy leakage of the source domain. The transferred knowledge, however, may unintendedly leak private information of the source domain. For example, an attacker can accurately infer user demographics from their historical purchase provided by a source domain data owner. This paper addresses the above privacy-preserving issue by learning a privacy-aware neural representation by improving target performance while protecting source privacy. The key idea is to simulate the attacks during the training for protecting unseen users' privacy in the future, modeled by an adversarial game, so that the transfer learning model becomes robust to attacks. Experiments show that the proposed PrivNet model can successfully disentangle the knowledge benefitting the transfer from leaking the privacy.

* Findings of EMNLP 2020

Via

Access Paper or Ask Questions

Personalized Neural Embeddings for Collaborative Filtering with Text

Mar 19, 2019

Guangneng Hu

Figure 1 for Personalized Neural Embeddings for Collaborative Filtering with Text

Figure 2 for Personalized Neural Embeddings for Collaborative Filtering with Text

Figure 3 for Personalized Neural Embeddings for Collaborative Filtering with Text

Figure 4 for Personalized Neural Embeddings for Collaborative Filtering with Text

Abstract:Collaborative filtering (CF) is a core technique for recommender systems. Traditional CF approaches exploit user-item relations (e.g., clicks, likes, and views) only and hence they suffer from the data sparsity issue. Items are usually associated with unstructured text such as article abstracts and product reviews. We develop a Personalized Neural Embedding (PNE) framework to exploit both interactions and words seamlessly. We learn such embeddings of users, items, and words jointly, and predict user preferences on items based on these learned representations. PNE estimates the probability that a user will like an item by two terms---behavior factors and semantic factors. On two real-world datasets, PNE shows better performance than four state-of-the-art baselines in terms of three metrics. We also show that PNE learns meaningful word embeddings by visualization.

* NAACL 2019
* NAACL 2019 short papers, oral presentation

Via

Access Paper or Ask Questions

Transfer Meets Hybrid: A Synthetic Approach for Cross-Domain Collaborative Filtering with Text

Jan 22, 2019

Guangneng Hu, Yu Zhang, Qiang Yang

Figure 1 for Transfer Meets Hybrid: A Synthetic Approach for Cross-Domain Collaborative Filtering with Text

Figure 2 for Transfer Meets Hybrid: A Synthetic Approach for Cross-Domain Collaborative Filtering with Text

Figure 3 for Transfer Meets Hybrid: A Synthetic Approach for Cross-Domain Collaborative Filtering with Text

Figure 4 for Transfer Meets Hybrid: A Synthetic Approach for Cross-Domain Collaborative Filtering with Text

Abstract:Collaborative filtering (CF) is the key technique for recommender systems (RSs). CF exploits user-item behavior interactions (e.g., clicks) only and hence suffers from the data sparsity issue. One research thread is to integrate auxiliary information such as product reviews and news titles, leading to hybrid filtering methods. Another thread is to transfer knowledge from other source domains such as improving the movie recommendation with the knowledge from the book domain, leading to transfer learning methods. In real-world life, no single service can satisfy a user's all information needs. Thus it motivates us to exploit both auxiliary and source information for RSs in this paper. We propose a novel neural model to smoothly enable Transfer Meeting Hybrid (TMH) methods for cross-domain recommendation with unstructured text in an end-to-end manner. TMH attentively extracts useful content from unstructured text via a memory module and selectively transfers knowledge from a source domain via a transfer network. On two real-world datasets, TMH shows better performance in terms of three ranking metrics by comparing with various baselines. We conduct thorough analyses to understand how the text content and transferred knowledge help the proposed model.

* WWW 2019
* 11 pages, 7 figures, a full version for the WWW 2019 short paper

Via

Access Paper or Ask Questions

CoNet: Collaborative Cross Networks for Cross-Domain Recommendation

Apr 20, 2018

Guangneng Hu, Yu Zhang, Qiang Yang

Figure 1 for CoNet: Collaborative Cross Networks for Cross-Domain Recommendation

Figure 2 for CoNet: Collaborative Cross Networks for Cross-Domain Recommendation

Figure 3 for CoNet: Collaborative Cross Networks for Cross-Domain Recommendation

Figure 4 for CoNet: Collaborative Cross Networks for Cross-Domain Recommendation

Abstract:The cross-domain recommendation technique is an effective way of alleviating the data sparsity in recommender systems by leveraging the knowledge from relevant domains. Transfer learning is a class of algorithms underlying these techniques. In this paper, we propose a novel transfer learning approach for cross-domain recommendation by using neural networks as the base model. We assume that hidden layers in two base networks are connected by cross mappings, leading to the collaborative cross networks (CoNet). CoNet enables dual knowledge transfer across domains by introducing cross connections from one base network to another and vice versa. CoNet is achieved in multi-layer feedforward networks by adding dual connections and joint loss functions, which can be trained efficiently by back-propagation. The proposed model is evaluated on two real-world datasets and it outperforms baseline models by relative improvements of 3.56\% in MRR and 8.94\% in NDCG, respectively.

Via

Access Paper or Ask Questions