Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mahsa Shafaei

Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model

Jun 12, 2024

Elaheh Baharlouei, Mahsa Shafaei, Yigeng Zhang, Hugo Jair Escalante, Thamar Solorio

Figure 1 for Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model

Figure 2 for Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model

Figure 3 for Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model

Figure 4 for Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model

Abstract:We address the challenge of detecting questionable content in online media, specifically the subcategory of comic mischief. This type of content combines elements such as violence, adult content, or sarcasm with humor, making it difficult to detect. Employing a multimodal approach is vital to capture the subtle details inherent in comic mischief content. To tackle this problem, we propose a novel end-to-end multimodal system for the task of comic mischief detection. As part of this contribution, we release a novel dataset for the targeted task consisting of three modalities: video, text (video captions and subtitles), and audio. We also design a HIerarchical Cross-attention model with CAPtions (HICCAP) to capture the intricate relationships among these modalities. The results show that the proposed approach makes a significant improvement over robust baselines and state-of-the-art models for comic mischief detection and its type classification. This emphasizes the potential of our system to empower users, to make informed decisions about the online content they choose to see. In addition, we conduct experiments on the UCF101, HMDB51, and XD-Violence datasets, comparing our model against other state-of-the-art approaches showcasing the outstanding performance of our proposed model in various scenarios.

Via

Access Paper or Ask Questions

Positive and Risky Message Assessment for Music Products

Sep 18, 2023

Yigeng Zhang, Mahsa Shafaei, Fabio Gonzalez, Thamar Solorio

Abstract:In this work, we propose a novel research problem: assessing positive and risky messages from music products. We first establish a benchmark for multi-angle multi-level music content assessment and then present an effective multi-task prediction model with ordinality-enforcement to solve this problem. Our result shows the proposed method not only significantly outperforms strong task-specific counterparts but can concurrently evaluate multiple aspects.

Via

Access Paper or Ask Questions

From None to Severe: Predicting Severity in Movie Scripts

Oct 03, 2021

Yigeng Zhang, Mahsa Shafaei, Fabio Gonzalez, Thamar Solorio

Figure 1 for From None to Severe: Predicting Severity in Movie Scripts

Figure 2 for From None to Severe: Predicting Severity in Movie Scripts

Figure 3 for From None to Severe: Predicting Severity in Movie Scripts

Figure 4 for From None to Severe: Predicting Severity in Movie Scripts

Abstract:In this paper, we introduce the task of predicting severity of age-restricted aspects of movie content based solely on the dialogue script. We first investigate categorizing the ordinal severity of movies on 5 aspects: Sex, Violence, Profanity, Substance consumption, and Frightening scenes. The problem is handled using a siamese network-based multitask framework which concurrently improves the interpretability of the predictions. The experimental results show that our method outperforms the previous state-of-the-art model and provides useful information to interpret model predictions. The proposed dataset and source code are publicly available at our GitHub repository.

* Accepted at Findings of EMNLP 2021

Via

Access Paper or Ask Questions

A Case Study of Deep Learning Based Multi-Modal Methods for Predicting the Age-Suitability Rating of Movie Trailers

Jan 26, 2021

Mahsa Shafaei, Christos Smailis, Ioannis A. Kakadiaris, Thamar Solorio

Figure 1 for A Case Study of Deep Learning Based Multi-Modal Methods for Predicting the Age-Suitability Rating of Movie Trailers

Figure 2 for A Case Study of Deep Learning Based Multi-Modal Methods for Predicting the Age-Suitability Rating of Movie Trailers

Figure 3 for A Case Study of Deep Learning Based Multi-Modal Methods for Predicting the Age-Suitability Rating of Movie Trailers

Figure 4 for A Case Study of Deep Learning Based Multi-Modal Methods for Predicting the Age-Suitability Rating of Movie Trailers

Abstract:In this work, we explore different approaches to combine modalities for the problem of automated age-suitability rating of movie trailers. First, we introduce a new dataset containing videos of movie trailers in English downloaded from IMDB and YouTube, along with their corresponding age-suitability rating labels. Secondly, we propose a multi-modal deep learning pipeline addressing the movie trailer age suitability rating problem. This is the first attempt to combine video, audio, and speech information for this problem, and our experimental results show that multi-modal approaches significantly outperform the best mono and bimodal models in this task.

Via

Access Paper or Ask Questions

White Paper: Challenges and Considerations for the Creation of a Large Labelled Repository of Online Videos with Questionable Content

Jan 25, 2021

Thamar Solorio, Mahsa Shafaei, Christos Smailis, Mona Diab, Theodore Giannakopoulos, Heng Ji, Yang Liu, Rada Mihalcea, Smaranda Muresan, Ioannis Kakadiaris

Figure 1 for White Paper: Challenges and Considerations for the Creation of a Large Labelled Repository of Online Videos with Questionable Content

Figure 2 for White Paper: Challenges and Considerations for the Creation of a Large Labelled Repository of Online Videos with Questionable Content

Figure 3 for White Paper: Challenges and Considerations for the Creation of a Large Labelled Repository of Online Videos with Questionable Content

Abstract:This white paper presents a summary of the discussions regarding critical considerations to develop an extensive repository of online videos annotated with labels indicating questionable content. The main discussion points include: 1) the type of appropriate labels that will result in a valuable repository for the larger AI community; 2) how to design the collection and annotation process, as well as the distribution of the corpus to maximize its potential impact; and, 3) what actions we can take to reduce risk of trauma to annotators.

Via

Access Paper or Ask Questions

ParsiNLU: A Suite of Language Understanding Challenges for Persian

Dec 11, 2020

Daniel Khashabi, Arman Cohan, Siamak Shakeri, Pedram Hosseini, Pouya Pezeshkpour, Malihe Alikhani, Moin Aminnaseri, Marzieh Bitaab, Faeze Brahman, Sarik Ghazarian(+15 more)

Figure 1 for ParsiNLU: A Suite of Language Understanding Challenges for Persian

Figure 2 for ParsiNLU: A Suite of Language Understanding Challenges for Persian

Figure 3 for ParsiNLU: A Suite of Language Understanding Challenges for Persian

Figure 4 for ParsiNLU: A Suite of Language Understanding Challenges for Persian

Abstract:Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this rich language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce ParsiNLU, the first benchmark in Persian language that includes a range of high-level tasks -- Reading Comprehension, Textual Entailment, etc. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5$k$ new instances across 6 distinct NLU tasks. Besides, we present the first results on state-of-the-art monolingual and multi-lingual pre-trained language-models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope ParsiNLU fosters further research and advances in Persian language understanding.

* (work in progress)

Via

Access Paper or Ask Questions

Attending the Emotions to Detect Online Abusive Language

Sep 06, 2019

Niloofar Safi Samghabadi, Afsheen Hatami, Mahsa Shafaei, Sudipta Kar, Thamar Solorio

Figure 1 for Attending the Emotions to Detect Online Abusive Language

Figure 2 for Attending the Emotions to Detect Online Abusive Language

Figure 3 for Attending the Emotions to Detect Online Abusive Language

Figure 4 for Attending the Emotions to Detect Online Abusive Language

Abstract:In recent years, abusive behavior has become a serious issue in online social networks. In this paper, we present a new corpus from a semi-anonymous social media platform, which contains the instances of offensive and neutral classes. We introduce a single deep neural architecture that considers both local and sequential information from the text in order to detect abusive language. Along with this model, we introduce a new attention mechanism called emotion-aware attention. This mechanism utilizes the emotions behind the text to find the most important words within that text. We experiment with this model on our dataset and later present the analysis. Additionally, we evaluate our proposed method on different corpora and show new state-of-the-art results with respect to offensive language detection.

Via

Access Paper or Ask Questions

Rating for Parents: Predicting Children Suitability Rating for Movies Based on Language of the Movies

Aug 22, 2019

Mahsa Shafaei, Niloofar Safi Samghabadi, Sudipta Kar, Thamar Solorio

Figure 1 for Rating for Parents: Predicting Children Suitability Rating for Movies Based on Language of the Movies

Figure 2 for Rating for Parents: Predicting Children Suitability Rating for Movies Based on Language of the Movies

Figure 3 for Rating for Parents: Predicting Children Suitability Rating for Movies Based on Language of the Movies

Figure 4 for Rating for Parents: Predicting Children Suitability Rating for Movies Based on Language of the Movies

Abstract:The film culture has grown tremendously in recent years. The large number of streaming services put films as one of the most convenient forms of entertainment in today's world. Films can help us learn and inspire societal change. But they can also negatively affect viewers. In this paper, our goal is to predict the suitability of the movie content for children and young adults based on scripts. The criterion that we use to measure suitability is the MPAA rating that is specifically designed for this purpose. We propose an RNN based architecture with attention that jointly models the genre and the emotions in the script to predict the MPAA rating. We achieve 78% weighted F1-score for the classification model that outperforms the traditional machine learning method by 6%.

Via

Access Paper or Ask Questions