Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wahed Hemati

Multiple Texts as a Limiting Factor in Online Learning: Quantifying similarities of Knowledge Networks across Languages

Aug 05, 2020

Alexander Mehler, Wahed Hemati, Pascal Welke, Maxim Konca, Tolga Uslu

Figure 1 for Multiple Texts as a Limiting Factor in Online Learning: Quantifying similarities of Knowledge Networks across Languages

Figure 2 for Multiple Texts as a Limiting Factor in Online Learning: Quantifying similarities of Knowledge Networks across Languages

Figure 3 for Multiple Texts as a Limiting Factor in Online Learning: Quantifying similarities of Knowledge Networks across Languages

Figure 4 for Multiple Texts as a Limiting Factor in Online Learning: Quantifying similarities of Knowledge Networks across Languages

Abstract:We test the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted. Controlling the size factor, we investigate this hypothesis for a number of 25 subject areas. Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias. The article therefore deals with the question of whether Wikipedia exhibits this kind of linguistic relativity or not. From the perspective of educational science, the article develops a computational model of the information landscape from which multiple texts are drawn as typical input of web-based reading. For this purpose, it develops a hybrid model of intra- and intertextual similarity of different parts of the information landscape and tests this model on the example of 35 languages and corresponding Wikipedias. In this way the article builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.

* 40 pages, 13 figures, 5 tables

Via

Access Paper or Ask Questions

The Frankfurt Latin Lexicon: From Morphological Expansion and Word Embeddings to SemioGraphs

May 21, 2020

Alexander Mehler, Bernhard Jussen, Tim Geelhaar, Alexander Henlein, Giuseppe Abrami, Daniel Baumartz, Tolga Uslu, Wahed Hemati

Figure 1 for The Frankfurt Latin Lexicon: From Morphological Expansion and Word Embeddings to SemioGraphs

Figure 2 for The Frankfurt Latin Lexicon: From Morphological Expansion and Word Embeddings to SemioGraphs

Figure 3 for The Frankfurt Latin Lexicon: From Morphological Expansion and Word Embeddings to SemioGraphs

Figure 4 for The Frankfurt Latin Lexicon: From Morphological Expansion and Word Embeddings to SemioGraphs

Abstract:In this article we present the Frankfurt Latin Lexicon (FLL), a lexical resource for Medieval Latin that is used both for the lemmatization of Latin texts and for the post-editing of lemmatizations. We describe recent advances in the development of lemmatizers and test them against the Capitularies corpus (comprising Frankish royal edicts, mid-6th to mid-9th century), a corpus created as a reference for processing Medieval Latin. We also consider the post-correction of lemmatizations using a limited crowdsourcing process aimed at continuous review and updating of the FLL. Starting from the texts resulting from this lemmatization process, we describe the extension of the FLL by means of word embeddings, whose interactive traversing by means of SemioGraphs completes the digital enhanced hermeneutic circle. In this way, the article argues for a more comprehensive understanding of lemmatization, encompassing classical machine learning as well as intellectual post-corrections and, in particular, human computation in the form of interpretation processes based on graph representations of the underlying lexical resources.

* 22 pages, 7 figures, 5 tabels

Via

Access Paper or Ask Questions

From Topic Networks to Distributed Cognitive Maps: Zipfian Topic Universes in the Area of Volunteered Geographic Information

Feb 04, 2020

Alexander Mehler, Rüdiger Gleim, Regina Gaitsch, Wahed Hemati, Tolga Uslu

Figure 1 for From Topic Networks to Distributed Cognitive Maps: Zipfian Topic Universes in the Area of Volunteered Geographic Information

Figure 2 for From Topic Networks to Distributed Cognitive Maps: Zipfian Topic Universes in the Area of Volunteered Geographic Information

Figure 3 for From Topic Networks to Distributed Cognitive Maps: Zipfian Topic Universes in the Area of Volunteered Geographic Information

Figure 4 for From Topic Networks to Distributed Cognitive Maps: Zipfian Topic Universes in the Area of Volunteered Geographic Information

Abstract:Are nearby places (e.g. cities) described by related words? In this article we transfer this research question in the field of lexical encoding of geographic information onto the level of intertextuality. To this end, we explore Volunteered Geographic Information (VGI) to model texts addressing places at the level of cities or regions with the help of so-called topic networks. This is done to examine how language encodes and networks geographic information on the aboutness level of texts. Our hypothesis is that the networked thematizations of places are similar - regardless of their distances and the underlying communities of authors. To investigate this we introduce Multiplex Topic Networks (MTN), which we automatically derive from Linguistic Multilayer Networks (LMN) as a novel model, especially of thematic networking in text corpora. Our study shows a Zipfian organization of the thematic universe in which geographical places (especially cities) are located in online communication. We interpret this finding in the context of cognitive maps, a notion which we extend by so-called thematic maps. According to our interpretation of this finding, the organization of thematic maps as part of cognitive maps results from a tendency of authors to generate shareable content that ensures the continued existence of the underlying media. We test our hypothesis by example of special wikis and extracts of Wikipedia. In this way we come to the conclusion: Places, whether close to each other or not, are located in neighboring places that span similar subnetworks in the topic universe.

* 65 pages, 27 figures

Via

Access Paper or Ask Questions

When Specialization Helps: Using Pooled Contextualized Embeddings to Detect Chemical and Biomedical Entities in Spanish

Oct 08, 2019

Manuel Stoeckel, Wahed Hemati, Alexander Mehler

Figure 1 for When Specialization Helps: Using Pooled Contextualized Embeddings to Detect Chemical and Biomedical Entities in Spanish

Figure 2 for When Specialization Helps: Using Pooled Contextualized Embeddings to Detect Chemical and Biomedical Entities in Spanish

Abstract:The recognition of pharmacological substances, compounds and proteins is an essential preliminary work for the recognition of relations between chemicals and other biomedically relevant units. In this paper, we describe an approach to Task 1 of the PharmaCoNER Challenge, which involves the recognition of mentions of chemicals and drugs in Spanish medical texts. We train a state-of-the-art BiLSTM-CRF sequence tagger with stacked Pooled Contextualized Embeddings, word and sub-word embeddings using the open-source framework FLAIR. We present a new corpus composed of articles and papers from Spanish health science journals, termed the Spanish Health Corpus, and use it to train domain-specific embeddings which we incorporate in our model training. We achieve a result of 89.76% F1-score using pre-trained embeddings and are able to improve these results to 90.52% F1-score using specialized embeddings.

* EMNLP-IJCNLP 2019: International Workshop on BioNLP Open Shared Tasks 2019, 5, pages, 1 figure

Via

Access Paper or Ask Questions