Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Paul Asente

Learning to Emphasize: Dataset and Shared Task Models for Selecting Emphasis in Presentation Slides

Jan 02, 2021

Amirreza Shirani, Giai Tran, Hieu Trinh, Franck Dernoncourt, Nedim Lipka, Paul Asente, Jose Echevarria, Thamar Solorio

Figure 1 for Learning to Emphasize: Dataset and Shared Task Models for Selecting Emphasis in Presentation Slides

Figure 2 for Learning to Emphasize: Dataset and Shared Task Models for Selecting Emphasis in Presentation Slides

Figure 3 for Learning to Emphasize: Dataset and Shared Task Models for Selecting Emphasis in Presentation Slides

Figure 4 for Learning to Emphasize: Dataset and Shared Task Models for Selecting Emphasis in Presentation Slides

Abstract:Presentation slides have become a common addition to the teaching material. Emphasizing strong leading words in presentation slides can allow the audience to direct the eye to certain focal points instead of reading the entire slide, retaining the attention to the speaker during the presentation. Despite a large volume of studies on automatic slide generation, few studies have addressed the automation of design assistance during the creation process. Motivated by this demand, we study the problem of Emphasis Selection (ES) in presentation slides, i.e., choosing candidates for emphasis, by introducing a new dataset containing presentation slides with a wide variety of topics, each is annotated with emphasis words in a crowdsourced setting. We evaluate a range of state-of-the-art models on this novel dataset by organizing a shared task and inviting multiple researchers to model emphasis in this new domain. We present the main findings and compare the results of these models, and by examining the challenges of the dataset, we provide different analysis components.

* In Proceedings of Content Authoring and Design (CAD21) workshop at the Thirty-fifth AAAI Conference on Artificial Intelligence (AAAI-21)

Via

Access Paper or Ask Questions

SemEval-2020 Task 10: Emphasis Selection for Written Text in Visual Media

Aug 07, 2020

Amirreza Shirani, Franck Dernoncourt, Nedim Lipka, Paul Asente, Jose Echevarria, Thamar Solorio

Figure 1 for SemEval-2020 Task 10: Emphasis Selection for Written Text in Visual Media

Figure 2 for SemEval-2020 Task 10: Emphasis Selection for Written Text in Visual Media

Figure 3 for SemEval-2020 Task 10: Emphasis Selection for Written Text in Visual Media

Figure 4 for SemEval-2020 Task 10: Emphasis Selection for Written Text in Visual Media

Abstract:In this paper, we present the main findings and compare the results of SemEval-2020 Task 10, Emphasis Selection for Written Text in Visual Media. The goal of this shared task is to design automatic methods for emphasis selection, i.e. choosing candidates for emphasis in textual content to enable automated design assistance in authoring. The main focus is on short text instances for social media, with a variety of examples, from social media posts to inspirational quotes. Participants were asked to model emphasis using plain text with no additional context from the user or other design considerations. SemEval-2020 Emphasis Selection shared task attracted 197 participants in the early phase and a total of 31 teams made submissions to this task. The highest-ranked submission achieved 0.823 Matchm score. The analysis of systems submitted to the task indicates that BERT and RoBERTa were the most common choice of pre-trained models used, and part of speech tag (POS) was the most useful feature. Full results can be found on the task's website.

* Accepted at Proceedings of 14th International Workshop on Semantic Evaluation (SemEval-2020)

Via

Access Paper or Ask Questions

Let Me Choose: From Verbal Context to Font Selection

May 03, 2020

Amirreza Shirani, Franck Dernoncourt, Jose Echevarria, Paul Asente, Nedim Lipka, Thamar Solorio

Figure 1 for Let Me Choose: From Verbal Context to Font Selection

Figure 2 for Let Me Choose: From Verbal Context to Font Selection

Figure 3 for Let Me Choose: From Verbal Context to Font Selection

Figure 4 for Let Me Choose: From Verbal Context to Font Selection

Abstract:In this paper, we aim to learn associations between visual attributes of fonts and the verbal context of the texts they are typically applied to. Compared to related work leveraging the surrounding visual context, we choose to focus only on the input text as this can enable new applications for which the text is the only visual element in the document. We introduce a new dataset, containing examples of different topics in social media posts and ads, labeled through crowd-sourcing. Due to the subjective nature of the task, multiple fonts might be perceived as acceptable for an input text, which makes this problem challenging. To this end, we investigate different end-to-end models to learn label distributions on crowd-sourced data and capture inter-subjectivity across all annotations.

* Accepted to ACL 2020

Via

Access Paper or Ask Questions

Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Network

Jun 07, 2017

Xiao Yang, Ersin Yumer, Paul Asente, Mike Kraley, Daniel Kifer, C. Lee Giles

Figure 1 for Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Network

Figure 2 for Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Network

Figure 3 for Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Network

Figure 4 for Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Network

Abstract:We present an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images. We consider document semantic structure extraction as a pixel-wise segmentation task, and propose a unified model that classifies pixels based not only on their visual appearance, as in the traditional page segmentation task, but also on the content of underlying text. Moreover, we propose an efficient synthetic document generation process that we use to generate pretraining data for our network. Once the network is trained on a large set of synthetic documents, we fine-tune the network on unlabeled real documents using a semi-supervised approach. We systematically study the optimum network architecture and show that both our multimodal approach and the synthetic data pretraining significantly boost the performance.

* CVPR 2017 Spotlight

Via

Access Paper or Ask Questions