INRIA Lorraine - LORIA
Abstract:In this article we present the design and implementation of the Logoscope, the first tool especially developed to detect new words of the French language, to document them and allow a public access through a web interface. This semi-automatic tool collects new words daily by browsing the online versions of French well known newspapers such as Le Monde, Le Figaro, L'Equipe, Lib\'eration, La Croix, Les \'Echos. In contrast to other existing tools essentially dedicated to dictionary development, the Logoscope attempts to give a more complete account of the context in which the new words occur. In addition to the commonly given morpho-syntactic information it also provides information about the textual and discursive contexts of the word creation; in particular, it automatically determines the (journalistic) topics of the text containing the new word. In this article we first give a general overview of the developed tool. We then describe the approach taken, we discuss the linguistic background which guided our design decisions and present the computational methods we used to implement it.
Abstract:We present a manually constructed seed lexicon encoding the inferential profiles of French event selecting predicates across different uses. The inferential profile (Karttunen, 1971a) of a verb is designed to capture the inferences triggered by the use of this verb in context. It reflects the influence of the clause-embedding verb on the factuality of the event described by the embedded clause. The resource developed provides evidence for the following three hypotheses: (i) French implicative verbs have an aspect dependent profile (their inferential profile varies with outer aspect), while factive verbs have an aspect independent profile (they keep the same inferential profile with both imperfective and perfective aspect); (ii) implicativity decreases with imperfective aspect: the inferences triggered by French implicative verbs combined with perfective aspect are often weakened when the same verbs are combined with imperfective aspect; (iii) implicativity decreases with an animate (deep) subject: the inferences triggered by a verb which is implicative with an inanimate subject are weakened when the same verb is used with an animate subject. The resource additionally shows that verbs with different inferential profiles display clearly distinct sub-categorisation patterns. In particular, verbs that have both factive and implicative readings are shown to prefer infinitival clauses in their implicative reading, and tensed clauses in their factive reading.
Abstract:We present a method for grouping the synonyms of a lemma according to its dictionary senses. The senses are defined by a large machine readable dictionary for French, the TLFi (Tr\'esor de la langue fran\c{c}aise informatis\'e) and the synonyms are given by 5 synonym dictionaries (also for French). To evaluate the proposed method, we manually constructed a gold standard where for each (word, definition) pair and given the set of synonyms defined for that word by the 5 synonym dictionaries, 4 lexicographers specified the set of synonyms they judge adequate. While inter-annotator agreement ranges on that task from 67% to at best 88% depending on the annotator pair and on the synonym dictionary being considered, the automatic procedure we propose scores a precision of 67% and a recall of 71%. The proposed method is compared with related work namely, word sense disambiguation, synonym lexicon acquisition and WordNet construction.