Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Cam-Van Thi Nguyen

A Dual-Module Denoising Approach with Curriculum Learning for Enhancing Multimodal Aspect-Based Sentiment Analysis

Dec 11, 2024

Nguyen Van Doan, Dat Tran Nguyen, Cam-Van Thi Nguyen

Figure 1 for A Dual-Module Denoising Approach with Curriculum Learning for Enhancing Multimodal Aspect-Based Sentiment Analysis

Figure 2 for A Dual-Module Denoising Approach with Curriculum Learning for Enhancing Multimodal Aspect-Based Sentiment Analysis

Figure 3 for A Dual-Module Denoising Approach with Curriculum Learning for Enhancing Multimodal Aspect-Based Sentiment Analysis

Figure 4 for A Dual-Module Denoising Approach with Curriculum Learning for Enhancing Multimodal Aspect-Based Sentiment Analysis

Abstract:Multimodal Aspect-Based Sentiment Analysis (MABSA) combines text and images to perform sentiment analysis but often struggles with irrelevant or misleading visual information. Existing methodologies typically address either sentence-image denoising or aspect-image denoising but fail to comprehensively tackle both types of noise. To address these limitations, we propose DualDe, a novel approach comprising two distinct components: the Hybrid Curriculum Denoising Module (HCD) and the Aspect-Enhance Denoising Module (AED). The HCD module enhances sentence-image denoising by incorporating a flexible curriculum learning strategy that prioritizes training on clean data. Concurrently, the AED module mitigates aspect-image noise through an aspect-guided attention mechanism that filters out noisy visual regions which unrelated to the specific aspects of interest. Our approach demonstrates effectiveness in addressing both sentence-image and aspect-image noise, as evidenced by experimental evaluations on benchmark datasets.

* Accepted at PACLIC 2024

Via

Access Paper or Ask Questions

Comparative Opinion Mining in Product Reviews: Multi-perspective Prompt-based Learning

Dec 11, 2024

Hai-Yen Thi Nguyen, Cam-Van Thi Nguyen

Figure 1 for Comparative Opinion Mining in Product Reviews: Multi-perspective Prompt-based Learning

Figure 2 for Comparative Opinion Mining in Product Reviews: Multi-perspective Prompt-based Learning

Figure 3 for Comparative Opinion Mining in Product Reviews: Multi-perspective Prompt-based Learning

Figure 4 for Comparative Opinion Mining in Product Reviews: Multi-perspective Prompt-based Learning

Abstract:Comparative reviews are pivotal in understanding consumer preferences and influencing purchasing decisions. Comparative Quintuple Extraction (COQE) aims to identify five key components in text: the target entity, compared entities, compared aspects, opinions on these aspects, and polarity. Extracting precise comparative information from product reviews is challenging due to nuanced language and sequential task errors in traditional methods. To mitigate these problems, we propose MTP-COQE, an end-to-end model designed for COQE. Leveraging multi-perspective prompt-based learning, MTP-COQE effectively guides the generative model in comparative opinion mining tasks. Evaluation on the Camera-COQE (English) and VCOM (Vietnamese) datasets demonstrates MTP-COQE's efficacy in automating COQE, achieving superior performance with a 1.41% higher F1 score than the previous baseline models on the English dataset. Additionally, we designed a strategy to limit the generative model's creativity to ensure the output meets expectations. We also performed data augmentation to address data imbalance and to prevent the model from becoming biased towards dominant samples.

Via

Access Paper or Ask Questions

TECO: Improving Multimodal Intent Recognition with Text Enhancement through Commonsense Knowledge Extraction

Dec 11, 2024

Quynh-Mai Thi Nguyen, Lan-Nhi Thi Nguyen, Cam-Van Thi Nguyen

Abstract:The objective of multimodal intent recognition (MIR) is to leverage various modalities-such as text, video, and audio-to detect user intentions, which is crucial for understanding human language and context in dialogue systems. Despite advances in this field, two main challenges persist: (1) effectively extracting and utilizing semantic information from robust textual features; (2) aligning and fusing non-verbal modalities with verbal ones effectively. This paper proposes a Text Enhancement with CommOnsense Knowledge Extractor (TECO) to address these challenges. We begin by extracting relations from both generated and retrieved knowledge to enrich the contextual information in the text modality. Subsequently, we align and integrate visual and acoustic representations with these enhanced text features to form a cohesive multimodal representation. Our experimental results show substantial improvements over existing baseline methods.

* Accepted at PACLIC 2024

Via

Access Paper or Ask Questions

Bundle Recommendation with Item-level Causation-enhanced Multi-view Learning

Aug 13, 2024

Huy-Son Nguyen, Tuan-Nghia Bui, Long-Hai Nguyen, Hoang Manh-Hung, Cam-Van Thi Nguyen, Hoang-Quynh Le, Duc-Trong Le

Abstract:Bundle recommendation aims to enhance business profitability and user convenience by suggesting a set of interconnected items. In real-world scenarios, leveraging the impact of asymmetric item affiliations is crucial for effective bundle modeling and understanding user preferences. To address this, we present BunCa, a novel bundle recommendation approach employing item-level causation-enhanced multi-view learning. BunCa provides comprehensive representations of users and bundles through two views: the Coherent View, leveraging the Multi-Prospect Causation Network for causation-sensitive relations among items, and the Cohesive View, employing LightGCN for information propagation among users and bundles. Modeling user preferences and bundle construction combined from both views ensures rigorous cohesion in direct user-bundle interactions through the Cohesive View and captures explicit intents through the Coherent View. Simultaneously, the integration of concrete and discrete contrastive learning optimizes the consistency and self-discrimination of multi-view representations. Extensive experiments with BunCa on three benchmark datasets demonstrate the effectiveness of this novel research and validate our hypothesis.

Via

Access Paper or Ask Questions

Curriculum Learning Meets Directed Acyclic Graph for Multimodal Emotion Recognition

Mar 08, 2024

Cam-Van Thi Nguyen, Cao-Bach Nguyen, Quang-Thuy Ha, Duc-Trong Le

Figure 1 for Curriculum Learning Meets Directed Acyclic Graph for Multimodal Emotion Recognition

Figure 2 for Curriculum Learning Meets Directed Acyclic Graph for Multimodal Emotion Recognition

Figure 3 for Curriculum Learning Meets Directed Acyclic Graph for Multimodal Emotion Recognition

Figure 4 for Curriculum Learning Meets Directed Acyclic Graph for Multimodal Emotion Recognition

Abstract:Emotion recognition in conversation (ERC) is a crucial task in natural language processing and affective computing. This paper proposes MultiDAG+CL, a novel approach for Multimodal Emotion Recognition in Conversation (ERC) that employs Directed Acyclic Graph (DAG) to integrate textual, acoustic, and visual features within a unified framework. The model is enhanced by Curriculum Learning (CL) to address challenges related to emotional shifts and data imbalance. Curriculum learning facilitates the learning process by gradually presenting training samples in a meaningful order, thereby improving the model's performance in handling emotional variations and data imbalance. Experimental results on the IEMOCAP and MELD datasets demonstrate that the MultiDAG+CL models outperform baseline models. We release the code for MultiDAG+CL and experiments: https://github.com/vanntc711/MultiDAG-CL

* Accepted by LREC-COLING 2024

Via

Access Paper or Ask Questions

Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction

Nov 08, 2023

Cam-Van Thi Nguyen, Anh-Tuan Mai, The-Son Le, Hai-Dang Kieu, Duc-Trong Le

Figure 1 for Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction

Figure 2 for Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction

Figure 3 for Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction

Figure 4 for Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction

Abstract:Emotion recognition is a crucial task for human conversation understanding. It becomes more challenging with the notion of multimodal data, e.g., language, voice, and facial expressions. As a typical solution, the global- and the local context information are exploited to predict the emotional label for every single sentence, i.e., utterance, in the dialogue. Specifically, the global representation could be captured via modeling of cross-modal interactions at the conversation level. The local one is often inferred using the temporal information of speakers or emotional shifts, which neglects vital factors at the utterance level. Additionally, most existing approaches take fused features of multiple modalities in an unified input without leveraging modality-specific representations. Motivating from these problems, we propose the Relational Temporal Graph Neural Network with Auxiliary Cross-Modality Interaction (CORECT), an novel neural network framework that effectively captures conversation-level cross-modality interactions and utterance-level temporal dependencies with the modality-specific manner for conversation understanding. Extensive experiments demonstrate the effectiveness of CORECT via its state-of-the-art results on the IEMOCAP and CMU-MOSEI datasets for the multimodal ERC task.

* Accepted at EMNLP 2023

Via

Access Paper or Ask Questions

Self-MI: Efficient Multimodal Fusion via Self-Supervised Multi-Task Learning with Auxiliary Mutual Information Maximization

Nov 07, 2023

Cam-Van Thi Nguyen, Ngoc-Hoa Thi Nguyen, Duc-Trong Le, Quang-Thuy Ha

Abstract:Multimodal representation learning poses significant challenges in capturing informative and distinct features from multiple modalities. Existing methods often struggle to exploit the unique characteristics of each modality due to unified multimodal annotations. In this study, we propose Self-MI in the self-supervised learning fashion, which also leverage Contrastive Predictive Coding (CPC) as an auxiliary technique to maximize the Mutual Information (MI) between unimodal input pairs and the multimodal fusion result with unimodal inputs. Moreover, we design a label generation module, $ULG_{MI}$ for short, that enables us to create meaningful and informative labels for each modality in a self-supervised manner. By maximizing the Mutual Information, we encourage better alignment between the multimodal fusion and the individual modalities, facilitating improved multimodal fusion. Extensive experiments on three benchmark datasets including CMU-MOSI, CMU-MOSEI, and SIMS, demonstrate the effectiveness of Self-MI in enhancing the multimodal fusion task.

* Accepted at The 37th Pacific Asia Conference on Language, Information and Computation (PACLIC 37)

Via

Access Paper or Ask Questions

Vietnamese multi-document summary using subgraph selection approach -- VLSP 2022 AbMuSu Shared Task

Jun 26, 2023

Huu-Thin Nguyen, Tam Doan Thanh, Cam-Van Thi Nguyen

Abstract:Document summarization is a task to generate afluent, condensed summary for a document, andkeep important information. A cluster of documents serves as the input for multi-document summarizing (MDS), while the cluster summary serves as the output. In this paper, we focus on transforming the extractive MDS problem into subgraph selection. Approaching the problem in the form of graphs helps to capture simultaneously the relationship between sentences in the same document and between sentences in the same cluster based on exploiting the overall graph structure and selected subgraphs. Experiments have been implemented on the Vietnamese dataset published in VLSP Evaluation Campaign 2022. This model currently results in the top 10 participating teams reported on the ROUGH-2 $F\_1$ measure on the public test set.

* arXiv admin note: text overlap with arXiv:2110.12645 by other authors

Via

Access Paper or Ask Questions