Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Julien Derivière

LIPN

A Robust Linguistic Platform for Efficient and Domain specific Web Content Analysis

Jun 29, 2007

Thierry Hamon, Adeline Nazarenko, Thierry Poibeau, Sophie Aubin, Julien Derivière

Figure 1 for A Robust Linguistic Platform for Efficient and Domain specific Web Content Analysis

Figure 2 for A Robust Linguistic Platform for Efficient and Domain specific Web Content Analysis

Figure 3 for A Robust Linguistic Platform for Efficient and Domain specific Web Content Analysis

Figure 4 for A Robust Linguistic Platform for Efficient and Domain specific Web Content Analysis

Abstract:Web semantic access in specific domains calls for specialized search engines with enhanced semantic querying and indexing capacities, which pertain both to information retrieval (IR) and to information extraction (IE). A rich linguistic analysis is required either to identify the relevant semantic units to index and weight them according to linguistic specific statistical distribution, or as the basis of an information extraction process. Recent developments make Natural Language Processing (NLP) techniques reliable enough to process large collections of documents and to enrich them with semantic annotations. This paper focuses on the design and the development of a text processing platform, Ogmios, which has been developed in the ALVIS project. The Ogmios platform exploits existing NLP modules and resources, which may be tuned to specific domains and produces linguistically annotated documents. We show how the three constraints of genericity, domain semantic awareness and performance can be handled all together.

* Proceedings of RIAO 2007 (30/05/2007)

Via

Access Paper or Ask Questions

The ALVIS Format for Linguistically Annotated Documents

Sep 24, 2006

Adeline Nazarenko, Erick Alphonse, Julien Derivière, Thierry Hamon, Guillaume Vauvert, Davy Weissenbacher

Figure 1 for The ALVIS Format for Linguistically Annotated Documents

Figure 2 for The ALVIS Format for Linguistically Annotated Documents

Figure 3 for The ALVIS Format for Linguistically Annotated Documents

Figure 4 for The ALVIS Format for Linguistically Annotated Documents

Abstract:The paper describes the ALVIS annotation format designed for the indexing of large collections of documents in topic-specific search engines. This paper is exemplified on the biological domain and on MedLine abstracts, as developing a specialized search engine for biologists is one of the ALVIS case studies. The ALVIS principle for linguistic annotations is based on existing works and standard propositions. We made the choice of stand-off annotations rather than inserted mark-up. Annotations are encoded as XML elements which form the linguistic subsection of the document record.

* Proceedings of the fifth international conference on Language Resources and Evaluation, LREC 2006 (2006) 1782-1786

Via

Access Paper or Ask Questions