CJM, URFIST Paris
Abstract:Patents and scientific papers provide an essential source for measuring science and technology output, to be used as a basis for the most varied scientometric analyzes. Authors' and inventors' names are the key identifiers to carry out these analyses, which however, run up against the issue of disambiguation. By extension identifying inventors who are also academic authors is a non-trivial challenge. We propose a method using the International Patent Classification (IPC) and the IPCCAT API to assess the degree of similarity of patents and papers abstracts of a given inventor, in order to match both types of documents. The method is developed and manually qualified based on three corpora of patents extracted from the international EPO database Espacenet. Among a set of 4679 patents and 7720 inventors, we obtain 2501 authors. The proposed algorithm solves the general problem of disambiguation with an error rate lower than 5%.