Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiangshan Yu

Large Language Models for Cryptocurrency Transaction Analysis: A Bitcoin Case Study

Jan 30, 2025

Yuchen Lei, Yuexin Xiang, Qin Wang, Rafael Dowsley, Tsz Hon Yuen, Jiangshan Yu

Figure 1 for Large Language Models for Cryptocurrency Transaction Analysis: A Bitcoin Case Study

Figure 2 for Large Language Models for Cryptocurrency Transaction Analysis: A Bitcoin Case Study

Figure 3 for Large Language Models for Cryptocurrency Transaction Analysis: A Bitcoin Case Study

Figure 4 for Large Language Models for Cryptocurrency Transaction Analysis: A Bitcoin Case Study

Abstract:Cryptocurrencies are widely used, yet current methods for analyzing transactions heavily rely on opaque, black-box models. These lack interpretability and adaptability, failing to effectively capture behavioral patterns. Many researchers, including us, believe that Large Language Models (LLMs) could bridge this gap due to their robust reasoning abilities for complex tasks. In this paper, we test this hypothesis by applying LLMs to real-world cryptocurrency transaction graphs, specifically within the Bitcoin network. We introduce a three-tiered framework to assess LLM capabilities: foundational metrics, characteristic overview, and contextual interpretation. This includes a new, human-readable graph representation format, LLM4TG, and a connectivity-enhanced sampling algorithm, CETraS, which simplifies larger transaction graphs. Experimental results show that LLMs excel at foundational metrics and offer detailed characteristic overviews. Their effectiveness in contextual interpretation suggests they can provide useful explanations of transaction behaviors, even with limited labeled data.

Via

Access Paper or Ask Questions

New Bounds on the Accuracy of Majority Voting for Multi-Class Classification

Sep 18, 2023

Sina Aeeneh, Nikola Zlatanov, Jiangshan Yu

Abstract:Majority voting is a simple mathematical function that returns the value that appears most often in a set. As a popular decision fusion technique, the majority voting function (MVF) finds applications in resolving conflicts, where a number of independent voters report their opinions on a classification problem. Despite its importance and its various applications in ensemble learning, data crowd-sourcing, remote sensing, and data oracles for blockchains, the accuracy of the MVF for the general multi-class classification problem has remained unknown. In this paper, we derive a new upper bound on the accuracy of the MVF for the multi-class classification problem. More specifically, we show that under certain conditions, the error rate of the MVF exponentially decays toward zero as the number of independent voters increases. Conversely, the error rate of the MVF exponentially grows with the number of independent voters if these conditions are not met. We first explore the problem for independent and identically distributed voters where we assume that every voter follows the same conditional probability distribution of voting for different classes, given the true classification of the data point. Next, we extend our results for the case where the voters are independent but non-identically distributed. Using the derived results, we then provide a discussion on the accuracy of the truth discovery algorithms. We show that in the best-case scenarios, truth discovery algorithms operate as an amplified MVF and thereby achieve a small error rate only when the MVF achieves a small error rate, and vice versa, achieve a large error rate when the MVF also achieves a large error rate. In the worst-case scenario, the truth discovery algorithms may achieve a higher error rate than the MVF. Finally, we confirm our theoretical results using numerical simulations.

Via

Access Paper or Ask Questions

TxAllo: Dynamic Transaction Allocation in Sharded Blockchain Systems

Dec 22, 2022

Yuanzhe Zhang, Shirui Pan, Jiangshan Yu

Figure 1 for TxAllo: Dynamic Transaction Allocation in Sharded Blockchain Systems

Figure 2 for TxAllo: Dynamic Transaction Allocation in Sharded Blockchain Systems

Figure 3 for TxAllo: Dynamic Transaction Allocation in Sharded Blockchain Systems

Figure 4 for TxAllo: Dynamic Transaction Allocation in Sharded Blockchain Systems

Abstract:The scalability problem has been one of the most significant barriers limiting the adoption of blockchains. Blockchain sharding is a promising approach to this problem. However, the sharding mechanism introduces a significant number of cross-shard transactions, which are expensive to process. This paper focuses on the transaction allocation problem to reduce the number of cross-shard transactions for better scalability. In particular, we systematically formulate the transaction allocation problem and convert it to the community detection problem on a graph. A deterministic and fast allocation scheme TxAllo is proposed to dynamically infer the allocation of accounts and their associated transactions. It directly optimizes the system throughput, considering both the number of cross-shard transactions and the workload balance among shards. We evaluate the performance of TxAllo on an Ethereum dataset containing over 91 million transactions. Our evaluation results show that for a blockchain with 60 shards, TxAllo reduces the cross-shard transaction ratio from 98% (by using traditional hash-based allocation) to about 12%. In the meantime, the workload balance is well maintained. Compared with other methods, the execution time of TxAllo is almost negligible. For example, when updating the allocation every hour, the execution of TxAllo only takes 0.5 seconds on average, whereas other concurrent works, such as BrokerChain (INFOCOM'22) leveraging the classic METIS method, require 422 seconds.

* Accepted by IEEE ICDE 2023

Via

Access Paper or Ask Questions

Semantic Code Search for Smart Contracts

Nov 28, 2021

Chaochen Shi, Yong Xiang, Jiangshan Yu, Longxiang Gao

Figure 1 for Semantic Code Search for Smart Contracts

Figure 2 for Semantic Code Search for Smart Contracts

Figure 3 for Semantic Code Search for Smart Contracts

Figure 4 for Semantic Code Search for Smart Contracts

Abstract:Semantic code search technology allows searching for existing code snippets through natural language, which can greatly improve programming efficiency. Smart contracts, programs that run on the blockchain, have a code reuse rate of more than 90%, which means developers have a great demand for semantic code search tools. However, the existing code search models still have a semantic gap between code and query, and perform poorly on specialized queries of smart contracts. In this paper, we propose a Multi-Modal Smart contract Code Search (MM-SCS) model. Specifically, we construct a Contract Elements Dependency Graph (CEDG) for MM-SCS as an additional modality to capture the data-flow and control-flow information of the code. To make the model more focused on the key contextual information, we use a multi-head attention network to generate embeddings for code features. In addition, we use a fine-tuned pretrained model to ensure the model's effectiveness when the training data is small. We compared MM-SCS with four state-of-the-art models on a dataset with 470K (code, docstring) pairs collected from Github and Etherscan. Experimental results show that MM-SCS achieves an MRR (Mean Reciprocal Rank) of 0.572, outperforming four state-of-the-art models UNIF, DeepCS, CARLCS-CNN, and TAB-CS by 34.2%, 59.3%, 36.8%, and 14.1%, respectively. Additionally, the search speed of MM-SCS is second only to UNIF, reaching 0.34s/query.

Via

Access Paper or Ask Questions

How to Democratise and Protect AI: Fair and Differentially Private Decentralised Deep Learning

Jul 18, 2020

Lingjuan Lyu, Yitong Li, Karthik Nandakumar, Jiangshan Yu, Xingjun Ma

Figure 1 for How to Democratise and Protect AI: Fair and Differentially Private Decentralised Deep Learning

Figure 2 for How to Democratise and Protect AI: Fair and Differentially Private Decentralised Deep Learning

Figure 3 for How to Democratise and Protect AI: Fair and Differentially Private Decentralised Deep Learning

Figure 4 for How to Democratise and Protect AI: Fair and Differentially Private Decentralised Deep Learning

Abstract:This paper firstly considers the research problem of fairness in collaborative deep learning, while ensuring privacy. A novel reputation system is proposed through digital tokens and local credibility to ensure fairness, in combination with differential privacy to guarantee privacy. In particular, we build a fair and differentially private decentralised deep learning framework called FDPDDL, which enables parties to derive more accurate local models in a fair and private manner by using our developed two-stage scheme: during the initialisation stage, artificial samples generated by Differentially Private Generative Adversarial Network (DPGAN) are used to mutually benchmark the local credibility of each party and generate initial tokens; during the update stage, Differentially Private SGD (DPSGD) is used to facilitate collaborative privacy-preserving deep learning, and local credibility and tokens of each party are updated according to the quality and quantity of individually released gradients. Experimental results on benchmark datasets under three realistic settings demonstrate that FDPDDL achieves high fairness, yields comparable accuracy to the centralised and distributed frameworks, and delivers better accuracy than the standalone framework.

* Accepted for publication in TDSC

Via

Access Paper or Ask Questions

Towards Fair and Decentralized Privacy-Preserving Deep Learning with Blockchain

Jun 04, 2019

Lingjuan Lyu, Jiangshan Yu, Karthik Nandakumar, Yitong Li, Xingjun Ma, Jiong Jin

Figure 1 for Towards Fair and Decentralized Privacy-Preserving Deep Learning with Blockchain

Figure 2 for Towards Fair and Decentralized Privacy-Preserving Deep Learning with Blockchain

Figure 3 for Towards Fair and Decentralized Privacy-Preserving Deep Learning with Blockchain

Figure 4 for Towards Fair and Decentralized Privacy-Preserving Deep Learning with Blockchain

Abstract:In collaborative deep learning, current learning frameworks follow either a centralized architecture or a distributed architecture. Whilst centralized architecture deploys a central server to train a global model over the massive amount of joint data from all parties, distributed architecture aggregates parameter updates from participating parties' local model training, via a parameter server. These two server-based architectures present security and robustness vulnerabilities such as single-point-of-failure, single-point-of-breach, privacy leakage, and lack of fairness. To address these problems, we design, implement, and evaluate a purely decentralized privacy-preserving deep learning framework, called DPPDL. DPPDL makes the first investigation on the research problem of fairness in collaborative deep learning, and simultaneously provides fairness and privacy by proposing two novel algorithms: initial benchmarking and privacy-preserving collaborative deep learning. During initial benchmarking, each party trains a local Differentially Private Generative Adversarial Network (DPGAN) and publishes the generated privacy-preserving artificial samples for other parties to label, based on the quality of which to initialize local credibility list for other parties. The local credibility list reflects how much one party contributes to another party, and it is used and updated during collaborative learning to ensure fairness. To protect gradients transaction during privacy-preserving collaborative deep learning, we further put forward a three-layer onion-style encryption scheme. We experimentally demonstrate, on benchmark image datasets, that accuracy, privacy and fairness in collaborative deep learning can be effectively addressed at the same time by our proposed DPPDL framework. Moreover, DPPDL provides a viable solution to detect and isolate the cheating party in the system.

* 13 pages, 5 figures, 6 tables

Via

Access Paper or Ask Questions