Abstract:In this paper we describe our post-evaluation results for SemEval-2018 Task 7 on clas- sification of semantic relations in scientific literature for clean (subtask 1.1) and noisy data (subtask 1.2). This is an extended ver- sion of our workshop paper (Hettinger et al., 2018) including further technical details (Sec- tions 3.2 and 4.3) and changes made to the preprocessing step in the post-evaluation phase (Section 2.1). Due to these changes Classification of Relations using Embeddings (ClaiRE) achieved an improved F1 score of 75.11% for the first subtask and 81.44% for the second.
Abstract:Assessing the degree of semantic relatedness between words is an important task with a variety of semantic applications, such as ontology learning for the Semantic Web, semantic search or query expansion. To accomplish this in an automated fashion, many relatedness measures have been proposed. However, most of these metrics only encode information contained in the underlying corpus and thus do not directly model human intuition. To solve this, we propose to utilize a metric learning approach to improve existing semantic relatedness measures by learning from additional information, such as explicit human feedback. For this, we argue to use word embeddings instead of traditional high-dimensional vector representations in order to leverage their semantic density and to reduce computational cost. We rigorously test our approach on several domains including tagging data as well as publicly available embeddings based on Wikipedia texts and navigation. Human feedback about semantic relatedness for learning and evaluation is extracted from publicly available datasets such as MEN or WS-353. We find that our method can significantly improve semantic relatedness measures by learning from additional information, such as explicit human feedback. For tagging data, we are the first to generate and study embeddings. Our results are of special interest for ontology and recommendation engineers, but also for any other researchers and practitioners of Semantic Web techniques.