Abstract:Bitcoin, launched in 2008 by Satoshi Nakamoto, established a new digital economy where value can be stored and transferred in a fully decentralized manner - alleviating the need for a central authority. This paper introduces a large scale dataset in the form of a transactions graph representing transactions between Bitcoin users along with a set of tasks and baselines. The graph includes 252 million nodes and 785 million edges, covering a time span of nearly 13 years of and 670 million transactions. Each node and edge is timestamped. As for supervised tasks we provide two labeled sets i. a 33,000 nodes based on entity type and ii. nearly 100,000 Bitcoin addresses labeled with an entity name and an entity type. This is the largest publicly available data set of bitcoin transactions designed to facilitate advanced research and exploration in this domain, overcoming the limitations of existing datasets. Various graph neural network models are trained to predict node labels, establishing a baseline for future research. In addition, several use cases are presented to demonstrate the dataset's applicability beyond Bitcoin analysis. Finally, all data and source code is made publicly available to enable reproducibility of the results.
Abstract:This research delves into the intricacies of Bitcoin, a decentralized peer-to-peer network, and its associated blockchain, which records all transactions since its inception. While this ensures integrity and transparency, the transparent nature of Bitcoin potentially compromises users' privacy rights. To address this concern, users have adopted CoinJoin, a method that amalgamates multiple transaction intents into a single, larger transaction to bolster transactional privacy. This process complicates individual transaction tracing and disrupts many established blockchain analysis heuristics. Despite its significance, limited research has been conducted on identifying CoinJoin transactions. Particularly noteworthy are varied CoinJoin implementations such as JoinMarket, Wasabi, and Whirlpool, each presenting distinct challenges due to their unique transaction structures. This study delves deeply into the open-source implementations of these protocols, aiming to develop refined heuristics for identifying their transactions on the blockchain. Our exhaustive analysis covers transactions up to block 760,000, offering a comprehensive insight into CoinJoin transactions and their implications for Bitcoin blockchain analysis.
Abstract:Understanding the semantic of a collection of texts is a challenging task. Topic models are probabilistic models that aims at extracting "topics" from a corpus of documents. This task is particularly difficult when the corpus is composed of short texts, such as posts on social networks. Following several previous research papers, we explore in this paper a set of collected tweets about bitcoin. In this work, we train three topic models and evaluate their output with several scores. We also propose a concrete application of the extracted topics.
Abstract:The study of time series has motivated many researchers, particularly on the area of multivariate-analysis. The study of co-movements and dependency between random variables leads us to develop metrics to describe existing connection between assets. The most commonly used are correlation and causality. Despite the growing literature, some connections remained still undetected. The objective of this paper is to propose a new representation learning algorithm capable to integrate synchronous and asynchronous relationships.