Abstract:Our goal is to learn about the political interests and preferences of the Members of Parliament by mining their parliamentary activity, in order to develop a recommendation/filtering system that, given a stream of documents to be distributed among them, is able to decide which documents should receive each Member of Parliament. We propose to use positive unlabeled learning to tackle this problem, because we only have information about relevant documents (the own interventions of each Member of Parliament in the debates) but not about irrelevant documents, so that we cannot use standard binary classifiers trained with positive and negative examples. We have also developed a new algorithm of this type, which compares favourably with: a) the baseline approach assuming that all the interventions of other Members of Parliament are irrelevant, b) another well-known positive unlabeled learning method and c) an approach based on information retrieval methods that matches documents and legislators' representations. The experiments have been carried out with data from the regional Andalusian Parliament at Spain.
Abstract:In the information age we are living in today, not only are we interested in accessing multimedia objects such as documents, videos, etc. but also in searching for professional experts, people or celebrities, possibly for professional needs or just for fun. Information access systems need to be able to extract and exploit various sources of information (usually in text format) about such individuals, and to represent them in a suitable way usually in the form of a profile. In this article, we tackle the problems of profile-based expert recommendation and document filtering from a machine learning perspective by clustering expert textual sources to build profiles and capture the different hidden topics in which the experts are interested. The experts will then be represented by means of multi-faceted profiles. Our experiments show that this is a valid technique to improve the performance of expert finding and document filtering.
Abstract:A common task in many political institutions (i.e. Parliament) is to find politicians who are experts in a particular field. In order to tackle this problem, the first step is to obtain politician profiles which include their interests, and these can be automatically learned from their speeches. As a politician may have various areas of expertise, one alternative is to use a set of subprofiles, each of which covers a different subject. In this study, we propose a novel approach for this task by using latent Dirichlet allocation (LDA) to determine the main underlying topics of each political speech, and to distribute the related terms among the different topic-based subprofiles. With this objective, we propose the use of fifteen distance and similarity measures to automatically determine the optimal number of topics discussed in a document, and to demonstrate that every measure converges into five strategies: Euclidean, Dice, Sorensen, Cosine and Overlap. Our experimental results showed that the scores of the different accuracy metrics of the proposed strategies tended to be higher than those of the baselines for expert recommendation tasks, and that the use of an appropriate number of topics has proved relevant.