Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dana Movshovitz-Attias

GoEmotions: A Dataset of Fine-Grained Emotions

Jun 03, 2020

Dorottya Demszky, Dana Movshovitz-Attias, Jeongwoo Ko, Alan Cowen, Gaurav Nemade, Sujith Ravi

Figure 1 for GoEmotions: A Dataset of Fine-Grained Emotions

Figure 2 for GoEmotions: A Dataset of Fine-Grained Emotions

Figure 3 for GoEmotions: A Dataset of Fine-Grained Emotions

Figure 4 for GoEmotions: A Dataset of Fine-Grained Emotions

Abstract:Understanding emotion expressed in language has a wide range of applications, from building empathetic chatbots to detecting harmful online behavior. Advancement in this area can be improved using large-scale datasets with a fine-grained typology, adaptable to multiple downstream tasks. We introduce GoEmotions, the largest manually annotated dataset of 58k English Reddit comments, labeled for 27 emotion categories or Neutral. We demonstrate the high quality of the annotations via Principal Preserved Component Analysis. We conduct transfer learning experiments with existing emotion benchmarks to show that our dataset generalizes well to other domains and different emotion taxonomies. Our BERT-based model achieves an average F1-score of .46 across our proposed taxonomy, leaving much room for improvement.

* Accepted to ACL 2020

Via

Access Paper or Ask Questions

Grounded Discovery of Coordinate Term Relationships between Software Entities

May 01, 2015

Dana Movshovitz-Attias, William W. Cohen

Figure 1 for Grounded Discovery of Coordinate Term Relationships between Software Entities

Figure 2 for Grounded Discovery of Coordinate Term Relationships between Software Entities

Figure 3 for Grounded Discovery of Coordinate Term Relationships between Software Entities

Figure 4 for Grounded Discovery of Coordinate Term Relationships between Software Entities

Abstract:We present an approach for the detection of coordinate-term relationships between entities from the software domain, that refer to Java classes. Usually, relations are found by examining corpus statistics associated with text entities. In some technical domains, however, we have access to additional information about the real-world objects named by the entities, suggesting that coupling information about the "grounded" entities with corpus statistics might lead to improved methods for relation discovery. To this end, we develop a similarity measure for Java classes using distributional information about how they are used in software, which we combine with corpus statistics on the distribution of contexts in which the classes appear in text. Using our approach, cross-validation accuracy on this dataset can be improved dramatically, from around 60% to 88%. Human labeling results show that our classifier has an F1 score of 86% over the top 1000 predicted pairs.

Via

Access Paper or Ask Questions