LIPN, CNRS UMR 7030, University of Sorbonne Paris Nord
Abstract:Knowledge graph embedding techniques are widely used for knowledge graph refinement tasks such as graph completion and triple classification. These techniques aim at embedding the entities and relations of a Knowledge Graph (KG) in a low dimensional continuous feature space. This paper adopts a transformer-based triplet network creating an embedding space that clusters the information about an entity or relation in the KG. It creates textual sequences from facts and fine-tunes a triplet network of pre-trained transformer-based language models. It adheres to an evaluation paradigm that relies on an efficient spatial semantic search technique. We show that this evaluation protocol is more adapted to a few-shot setting for the relation prediction task. Our proposed GilBERT method is evaluated on triplet classification and relation prediction tasks on multiple well-known benchmark knowledge graphs such as FB13, WN11, and FB15K. We show that GilBERT achieves better or comparable results to the state-of-the-art performance on these two refinement tasks.
Abstract:In a decentralised knowledge representation system such as the Web of Data, it is common and indeed desirable for different knowledge graphs to overlap. Whenever multiple names are used to denote the same thing, owl:sameAs statements are needed in order to link the data and foster reuse. Whilst the deductive value of such identity statements can be extremely useful in enhancing various knowledge-based systems, incorrect use of identity can have wide-ranging effects in a global knowledge space like the Web of Data. With several works already proven that identity in the Web is broken, this survey investigates the current state of this "sameAs problem". An open discussion highlights the main weaknesses suffered by solutions in the literature, and draws open challenges to be faced in the future.