Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sharmistha Jat

Cem Mil Podcasts: A Spoken Portuguese Document Corpus

Sep 23, 2022

Edgar Tanaka, Ann Clifton, Joana Correia, Sharmistha Jat, Rosie Jones, Jussi Karlgren, Winstead Zhu

Figure 1 for Cem Mil Podcasts: A Spoken Portuguese Document Corpus

Figure 2 for Cem Mil Podcasts: A Spoken Portuguese Document Corpus

Figure 3 for Cem Mil Podcasts: A Spoken Portuguese Document Corpus

Abstract:This document describes the Portuguese language podcast dataset released by Spotify for academic research purposes. We give an overview of how the data was sampled, some basic statistics over the collection, as well as brief information of distribution over Brazilian and Portuguese dialects.

* 6 pages, 1 figure

Via

Access Paper or Ask Questions

Relating Simple Sentence Representations in Deep Neural Networks and the Brain

Jun 27, 2019

Sharmistha Jat, Hao Tang, Partha Talukdar, Tom Mitchell

Figure 1 for Relating Simple Sentence Representations in Deep Neural Networks and the Brain

Figure 2 for Relating Simple Sentence Representations in Deep Neural Networks and the Brain

Figure 3 for Relating Simple Sentence Representations in Deep Neural Networks and the Brain

Figure 4 for Relating Simple Sentence Representations in Deep Neural Networks and the Brain

Abstract:What is the relationship between sentence representations learned by deep recurrent models against those encoded by the brain? Is there any correspondence between hidden layers of these recurrent models and brain regions when processing sentences? Can these deep models be used to synthesize brain data which can then be utilized in other extrinsic tasks? We investigate these questions using sentences with simple syntax and semantics (e.g., The bone was eaten by the dog.). We consider multiple neural network architectures, including recently proposed ELMo and BERT. We use magnetoencephalography (MEG) brain recording data collected from human subjects when they were reading these simple sentences. Overall, we find that BERT's activations correlate the best with MEG brain data. We also find that the deep network representation can be used to generate brain data from new sentences to augment existing brain data. To the best of our knowledge, this is the first work showing that the MEG brain recording when reading a word in a sentence can be used to distinguish earlier words in the sentence. Our exploration is also the first to use deep neural network representations to generate synthetic brain data and to show that it helps in improving subsequent stimuli decoding task accuracy.

* Association for Computational Linguistics (ACL) 2019

Via

Access Paper or Ask Questions

Improving Distantly Supervised Relation Extraction using Word and Entity Based Attention

Apr 19, 2018

Sharmistha Jat, Siddhesh Khandelwal, Partha Talukdar

Figure 1 for Improving Distantly Supervised Relation Extraction using Word and Entity Based Attention

Figure 2 for Improving Distantly Supervised Relation Extraction using Word and Entity Based Attention

Figure 3 for Improving Distantly Supervised Relation Extraction using Word and Entity Based Attention

Figure 4 for Improving Distantly Supervised Relation Extraction using Word and Entity Based Attention

Abstract:Relation extraction is the problem of classifying the relationship between two entities in a given sentence. Distant Supervision (DS) is a popular technique for developing relation extractors starting with limited supervision. We note that most of the sentences in the distant supervision relation extraction setting are very long and may benefit from word attention for better sentence representation. Our contributions in this paper are threefold. Firstly, we propose two novel word attention models for distantly- supervised relation extraction: (1) a Bi-directional Gated Recurrent Unit (Bi-GRU) based word attention model (BGWA), (2) an entity-centric attention model (EA), and (3) a combination model which combines multiple complementary models using weighted voting method for improved relation extraction. Secondly, we introduce GDS, a new distant supervision dataset for relation extraction. GDS removes test data noise present in all previous distant- supervision benchmark datasets, making credible automatic evaluation possible. Thirdly, through extensive experiments on multiple real-world datasets, we demonstrate the effectiveness of the proposed methods.

Via

Access Paper or Ask Questions