Abstract:Named Entity Recognition (NER) is a Natural Language Processing technique for extracting information from textual documents. However, much of the existing research on NER has been centered around English-language documents, leaving a gap in the availability of datasets tailored to the financial domain in Portuguese. This study addresses the need for NER within the financial domain, focusing on Portuguese-language texts extracted from earnings call transcriptions of Brazilian banks. By curating a comprehensive dataset comprising 384 transcriptions and leveraging weak supervision techniques for annotation, we evaluate the performance of monolingual models trained on Portuguese (BERTimbau and PTT5) and multilingual models (mBERT and mT5). Notably, we introduce a novel approach that reframes the token classification task as a text generation problem, enabling fine-tuning and evaluation of T5 models. Following the fine-tuning of the models, we conduct an evaluation on the test dataset, employing performance and error metrics. Our findings reveal that BERT-based models consistently outperform T5-based models. Furthermore, while the multilingual models exhibit comparable macro F1-scores, BERTimbau demonstrates superior performance over PTT5. A manual analysis of sentences generated by PTT5 and mT5 unveils a degree of similarity ranging from 0.89 to 1.0, between the original and generated sentences. However, critical errors emerge as both models exhibit discrepancies, such as alterations to monetary and percentage values, underscoring the importance of accuracy and consistency in the financial domain. Despite these challenges, PTT5 and mT5 achieve impressive macro F1-scores of 98.52% and 98.85%, respectively, with our proposed approach. Furthermore, our study sheds light on notable disparities in memory and time consumption for inference across the models.
Abstract:Most community detection algorithms from the literature work as optimization tools that minimize a given \textit{fitness function}, while assuming that each node belongs to a single community. Since there is no hard concept of what a community is, most proposed fitness functions focus on a particular definition. As such, these functions do not always lead to partitions that correspond to those observed in practice. This paper proposes a new flexible fitness function that allows the identification of communities with distinct characteristics. Such flexibility was evaluated through the adoption of an immune-inspired optimization algorithm, named cob-aiNet[C], to identify both disjoint and overlapping communities in a set of benchmark networks. The results have shown that the obtained partitions are much closer to the ground-truth than those obtained by the optimization of the modularity function.