Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Armin Catovic

Schibsted Media Group

CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification

Jun 18, 2023

Lele Cao, Vilhelm von Ehrenheim, Mark Granroth-Wilding, Richard Anselmo Stahl, Andrew McCornack, Armin Catovic, Dhiana Deva Cavacanti Rocha

Figure 1 for CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification

Figure 2 for CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification

Figure 3 for CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification

Figure 4 for CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification

Abstract:In the investment industry, it is often essential to carry out fine-grained company similarity quantification for a range of purposes, including market mapping, competitor analysis, and mergers and acquisitions. We propose and publish a knowledge graph, named CompanyKG, to represent and learn diverse company features and relations. Specifically, 1.17 million companies are represented as nodes enriched with company description embeddings; and 15 different inter-company relations result in 51.06 million weighted edges. To enable a comprehensive assessment of methods for company similarity quantification, we have devised and compiled three evaluation tasks with annotated test sets: similarity prediction, competitor retrieval and similarity ranking. We present extensive benchmarking results for 11 reproducible predictive methods categorized into three groups: node-only, edge-only, and node+edge. To the best of our knowledge, CompanyKG is the first large-scale heterogeneous graph dataset originating from a real-world investment platform, tailored for quantifying inter-company similarity.

* Paper (11 pages, 5 figures and 2 tables) + Appendix (17 pages, 4 figures and 5 tables)

Via

Access Paper or Ask Questions

A Scalable and Adaptive System to Infer the Industry Sectors of Companies: Prompt + Model Tuning of Generative Language Models

Jun 05, 2023

Lele Cao, Vilhelm von Ehrenheim, Astrid Berghult, Cecilia Henje, Richard Anselmo Stahl, Joar Wandborg, Sebastian Stan, Armin Catovic, Erik Ferm, Hannes Ingelhag

Figure 1 for A Scalable and Adaptive System to Infer the Industry Sectors of Companies: Prompt + Model Tuning of Generative Language Models

Figure 2 for A Scalable and Adaptive System to Infer the Industry Sectors of Companies: Prompt + Model Tuning of Generative Language Models

Figure 3 for A Scalable and Adaptive System to Infer the Industry Sectors of Companies: Prompt + Model Tuning of Generative Language Models

Figure 4 for A Scalable and Adaptive System to Infer the Industry Sectors of Companies: Prompt + Model Tuning of Generative Language Models

Abstract:The Private Equity (PE) firms operate investment funds by acquiring and managing companies to achieve a high return upon selling. Many PE funds are thematic, meaning investment professionals aim to identify trends by covering as many industry sectors as possible, and picking promising companies within these sectors. So, inferring sectors for companies is critical to the success of thematic PE funds. In this work, we standardize the sector framework and discuss the typical challenges; we then introduce our sector inference system addressing these challenges. Specifically, our system is built on a medium-sized generative language model, finetuned with a prompt + model tuning procedure. The deployed model demonstrates a superior performance than the common baselines. The system has been serving many PE professionals for over a year, showing great scalability to data volume and adaptability to any change in sector framework and/or annotation.

* Accepted by FinNLP (Financial Technology and Natural Language Processing) @ IJCAI2023 as long paper (8 pages and 8 figures)

Via

Access Paper or Ask Questions

Linnaeus: A highly reusable and adaptable ML based log classification pipeline

Mar 11, 2021

Armin Catovic, Carolyn Cartwright, Yasmin Tesfaldet Gebreyesus, Simone Ferlin

Figure 1 for Linnaeus: A highly reusable and adaptable ML based log classification pipeline

Figure 2 for Linnaeus: A highly reusable and adaptable ML based log classification pipeline

Figure 3 for Linnaeus: A highly reusable and adaptable ML based log classification pipeline

Figure 4 for Linnaeus: A highly reusable and adaptable ML based log classification pipeline

Abstract:Logs are a common way to record detailed run-time information in software. As modern software systems evolve in scale and complexity, logs have become indispensable to understanding the internal states of the system. At the same time however, manually inspecting logs has become impractical. In recent times, there has been more emphasis on statistical and machine learning (ML) based methods for analyzing logs. While the results have shown promise, most of the literature focuses on algorithms and state-of-the-art (SOTA), while largely ignoring the practical aspects. In this paper we demonstrate our end-to-end log classification pipeline, Linnaeus. Besides showing the more traditional ML flow, we also demonstrate our solutions for adaptability and re-use, integration towards large scale software development processes, and how we cope with lack of labelled data. We hope Linnaeus can serve as a blueprint for, and inspire the integration of, various ML based solutions in other large scale industrial settings.

* 8 pages, 7 figures; to be included in ICSE/WAIN'21

Via

Access Paper or Ask Questions

Traffic Flow Estimation using LTE Radio Frequency Counters and Machine Learning

Jan 22, 2021

Forough Yaghoubi, Armin Catovic, Arthur Gusmao, Jan Pieczkowski, Peter Boros

Figure 1 for Traffic Flow Estimation using LTE Radio Frequency Counters and Machine Learning

Figure 2 for Traffic Flow Estimation using LTE Radio Frequency Counters and Machine Learning

Figure 3 for Traffic Flow Estimation using LTE Radio Frequency Counters and Machine Learning

Figure 4 for Traffic Flow Estimation using LTE Radio Frequency Counters and Machine Learning

Abstract:As the demand for vehicles continues to outpace construction of new roads, it becomes imperative we implement strategies that improve utilization of existing transport infrastructure. Traffic sensors form a crucial part of many such strategies, giving us valuable insights into road utilization. However, due to cost and lead time associated with installation and maintenance of traffic sensors, municipalities and traffic authorities look toward cheaper and more scalable alternatives. Due to their ubiquitous nature and wide global deployment, cellular networks offer one such alternative. In this paper we present a novel method for traffic flow estimation using standardized LTE/4G radio frequency performance measurement counters. The problem is cast as a supervised regression task using both classical and deep learning methods. We further apply transfer learning to compensate that many locations lack traffic sensor data that could be used for training. We show that our approach benefits from applying transfer learning to generalize the solution not only in time but also in space (i.e., various parts of the city). The results are very promising and, unlike competing solutions, our approach utilizes aggregate LTE radio frequency counter data that is inherently privacy-preserving, readily available, and scales globally without any additional network impact.

* 9 pages, 5 figures; submitted to ACM SIGCOMM 2021

Via

Access Paper or Ask Questions