Abstract:The academic evaluation of the publication record of researchers is relevant for identifying talented candidates for promotion and funding. A key tool for this is the use of the indexes provided by Web of Science and SCOPUS, costly databases that sometimes exceed the possibilities of academic institutions in many parts of the world. We show here how the data in one of the bases can be used to infer the main index of the other one. Methods of data analysis used in Machine Learning allow us to select just a few of the hundreds of variables in a database, which later are used in a panel regression, yielding a good approximation to the main index in the other database. Since the information of SCOPUS can be freely scraped from the Web, this approach allows to infer for free the Impact Factor of publications, the main index used in research assessments around the globe.
Abstract:The task of Event Detection (ED) is a subfield of Information Extraction (IE) that consists in recognizing event mentions in natural language texts. Several applications can take advantage of an ED system, including alert systems, text summarization, question-answering systems, and any system that needs to extract structured information about events from unstructured texts. ED is a complex task, which is hampered by two main challenges: the lack of a dataset large enough to train and test the developed models and the variety of event type definitions that exist in the literature. These problems make generalization hard to achieve, resulting in poor adaptation to different domains and targets. The main contribution of this paper is the design, implementation and evaluation of a recurrent neural network model for ED that combines several features. In particular, the paper makes the following contributions: (1) it uses BERT embeddings to define contextual word and contextual sentence embeddings as attributes, which to the best of our knowledge were never used before for the ED task; (2) the proposed model has the ability to use its first layer to learn good feature representations; (3) a new public dataset with a general definition of event; (4) an extensive empirical evaluation that includes (i) the exploration of different architectures and hyperparameters, (ii) an ablation test to study the impact of each attribute, and (iii) a comparison with a replication of a state-of-the-art model. The results offer several insights into the importance of contextual embeddings and indicate that the proposed approach is effective in the ED task, outperforming the baseline models.