Abstract:Context: Blockchain technology has drawn growing attention in the literature and in practice. Blockchain technology generates considerable amounts of data and has thus been a topic of interest for Machine Learning (ML). Objective: The objective of this paper is to provide a comprehensive review of the state of the art on machine learning applied to blockchain data. This work aims to systematically identify, analyze, and classify the literature on ML applied to blockchain data. This will allow us to discover the fields where more effort should be placed in future research. Method: A systematic mapping study has been conducted to identify the relevant literature. Ultimately, 159 articles were selected and classified according to various dimensions, specifically, the domain use case, the blockchain, the data, and the machine learning models. Results: The majority of the papers (49.7%) fall within the Anomaly use case. Bitcoin (47.2%) was the blockchain that drew the most attention. A dataset consisting of more than 1.000.000 data points was used by 31.4% of the papers. And Classification (46.5%) was the ML task most applied to blockchain data. Conclusion: The results confirm that ML applied to blockchain data is a relevant and a growing topic of interest both in the literature and in practice. Nevertheless, some open challenges and gaps remain, which can lead to future research directions. Specifically, we identify novel machine learning algorithms, the lack of a standardization framework, blockchain scalability issues and cross-chain interactions as areas worth exploring in the future.