Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dushyant Singh Chauhan

Target-Augmented Shared Fusion-based Multimodal Sarcasm Explanation Generation

Feb 11, 2025

Palaash Goel, Dushyant Singh Chauhan, Md Shad Akhtar

Figure 1 for Target-Augmented Shared Fusion-based Multimodal Sarcasm Explanation Generation

Figure 2 for Target-Augmented Shared Fusion-based Multimodal Sarcasm Explanation Generation

Figure 3 for Target-Augmented Shared Fusion-based Multimodal Sarcasm Explanation Generation

Figure 4 for Target-Augmented Shared Fusion-based Multimodal Sarcasm Explanation Generation

Abstract:Sarcasm is a linguistic phenomenon that intends to ridicule a target (e.g., entity, event, or person) in an inherent way. Multimodal Sarcasm Explanation (MuSE) aims at revealing the intended irony in a sarcastic post using a natural language explanation. Though important, existing systems overlooked the significance of the target of sarcasm in generating explanations. In this paper, we propose a Target-aUgmented shaRed fusion-Based sarcasm explanatiOn model, aka. TURBO. We design a novel shared-fusion mechanism to leverage the inter-modality relationships between an image and its caption. TURBO assumes the target of the sarcasm and guides the multimodal shared fusion mechanism in learning intricacies of the intended irony for explanations. We evaluate our proposed TURBO model on the MORE+ dataset. Comparison against multiple baselines and state-of-the-art models signifies the performance improvement of TURBO by an average margin of $+3.3\%$. Moreover, we explore LLMs in zero and one-shot settings for our task and observe that LLM-generated explanation, though remarkable, often fails to capture the critical nuances of the sarcasm. Furthermore, we supplement our study with extensive human evaluation on TURBO's generated explanations and find them out to be comparatively better than other systems.

Via

Access Paper or Ask Questions

M2H2: A Multimodal Multiparty Hindi Dataset For Humor Recognition in Conversations

Aug 03, 2021

Dushyant Singh Chauhan, Gopendra Vikram Singh, Navonil Majumder, Amir Zadeh, Asif Ekbal, Pushpak Bhattacharyya, Louis-philippe Morency, Soujanya Poria

Figure 1 for M2H2: A Multimodal Multiparty Hindi Dataset For Humor Recognition in Conversations

Figure 2 for M2H2: A Multimodal Multiparty Hindi Dataset For Humor Recognition in Conversations

Figure 3 for M2H2: A Multimodal Multiparty Hindi Dataset For Humor Recognition in Conversations

Figure 4 for M2H2: A Multimodal Multiparty Hindi Dataset For Humor Recognition in Conversations

Abstract:Humor recognition in conversations is a challenging task that has recently gained popularity due to its importance in dialogue understanding, including in multimodal settings (i.e., text, acoustics, and visual). The few existing datasets for humor are mostly in English. However, due to the tremendous growth in multilingual content, there is a great demand to build models and systems that support multilingual information access. To this end, we propose a dataset for Multimodal Multiparty Hindi Humor (M2H2) recognition in conversations containing 6,191 utterances from 13 episodes of a very popular TV series "Shrimaan Shrimati Phir Se". Each utterance is annotated with humor/non-humor labels and encompasses acoustic, visual, and textual modalities. We propose several strong multimodal baselines and show the importance of contextual and multimodal information for humor recognition in conversations. The empirical results on M2H2 dataset demonstrate that multimodal information complements unimodal information for humor recognition. The dataset and the baselines are available at http://www.iitp.ac.in/~ai-nlp-ml/resources.html and https://github.com/declare-lab/M2H2-dataset.

* ICMI 2021

Via

Access Paper or Ask Questions

Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis

May 14, 2019

Md Shad Akhtar, Dushyant Singh Chauhan, Deepanway Ghosal, Soujanya Poria, Asif Ekbal, Pushpak Bhattacharyya

Figure 1 for Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis

Figure 2 for Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis

Figure 3 for Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis

Figure 4 for Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis

Abstract:Related tasks often have inter-dependence on each other and perform better when solved in a joint framework. In this paper, we present a deep multi-task learning framework that jointly performs sentiment and emotion analysis both. The multi-modal inputs (i.e., text, acoustic and visual frames) of a video convey diverse and distinctive information, and usually do not have equal contribution in the decision making. We propose a context-level inter-modal attention framework for simultaneously predicting the sentiment and expressed emotions of an utterance. We evaluate our proposed approach on CMU-MOSEI dataset for multi-modal sentiment and emotion analysis. Evaluation results suggest that multi-task learning framework offers improvement over the single-task framework. The proposed approach reports new state-of-the-art performance for both sentiment analysis and emotion analysis.

* Accepted for publication in NAACL:HLT-2019

Via

Access Paper or Ask Questions