Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bo Gao

AiRacleX: Automated Detection of Price Oracle Manipulations via LLM-Driven Knowledge Mining and Prompt Generation

Feb 10, 2025

Bo Gao, Yuan Wang, Qingsong Wei, Yong Liu, Rick Siow Mong Goh

Figure 1 for AiRacleX: Automated Detection of Price Oracle Manipulations via LLM-Driven Knowledge Mining and Prompt Generation

Figure 2 for AiRacleX: Automated Detection of Price Oracle Manipulations via LLM-Driven Knowledge Mining and Prompt Generation

Figure 3 for AiRacleX: Automated Detection of Price Oracle Manipulations via LLM-Driven Knowledge Mining and Prompt Generation

Figure 4 for AiRacleX: Automated Detection of Price Oracle Manipulations via LLM-Driven Knowledge Mining and Prompt Generation

Abstract:Decentralized finance applications depend on accurate price oracles to ensure secure transactions, yet these oracles are highly vulnerable to manipulation, enabling attackers to exploit smart contract vulnerabilities for unfair asset valuation and financial gain. Detecting such manipulations traditionally relies on the manual effort of experienced experts, presenting significant challenges. In this paper, we propose a novel LLM-driven framework that automates the detection of price oracle manipulations by leveraging the complementary strengths of different LLM models. Our approach begins with domain-specific knowledge extraction, where an LLM model synthesizes precise insights about price oracle vulnerabilities from top-tier academic papers, eliminating the need for profound expertise from developers or auditors. This knowledge forms the foundation for a second LLM model to generate structured, context-aware chain of thought prompts, which guide a third LLM model in accurately identifying manipulation patterns in smart contracts. We validate the framework effectiveness through experiments on 60 known vulnerabilities from 46 real-world DeFi attacks or projects spanning 2021 to 2023. The best performing combination of LLMs (Haiku-Haiku-4o-mini) identified by AiRacleX demonstrate a 2.58-times improvement in recall (0.667 vs 0.259) compared to the state-of-the-art tool GPTScan, while maintaining comparable precision. Furthermore, our framework demonstrates the feasibility of replacing commercial models with open-source alternatives, enhancing privacy and security for developers.

Via

Access Paper or Ask Questions

Beyond Model Scale Limits: End-Edge-Cloud Federated Learning with Self-Rectified Knowledge Agglomeration

Jan 01, 2025

Zhiyuan Wu, Sheng Sun, Yuwei Wang, Min Liu, Ke Xu, Quyang Pan, Bo Gao, Tian Wen

Figure 1 for Beyond Model Scale Limits: End-Edge-Cloud Federated Learning with Self-Rectified Knowledge Agglomeration

Figure 2 for Beyond Model Scale Limits: End-Edge-Cloud Federated Learning with Self-Rectified Knowledge Agglomeration

Figure 3 for Beyond Model Scale Limits: End-Edge-Cloud Federated Learning with Self-Rectified Knowledge Agglomeration

Figure 4 for Beyond Model Scale Limits: End-Edge-Cloud Federated Learning with Self-Rectified Knowledge Agglomeration

Abstract:The rise of End-Edge-Cloud Collaboration (EECC) offers a promising paradigm for Artificial Intelligence (AI) model training across end devices, edge servers, and cloud data centers, providing enhanced reliability and reduced latency. Hierarchical Federated Learning (HFL) can benefit from this paradigm by enabling multi-tier model aggregation across distributed computing nodes. However, the potential of HFL is significantly constrained by the inherent heterogeneity and dynamic characteristics of EECC environments. Specifically, the uniform model structure bounded by the least powerful end device across all computing nodes imposes a performance bottleneck. Meanwhile, coupled heterogeneity in data distributions and resource capabilities across tiers disrupts hierarchical knowledge transfer, leading to biased updates and degraded performance. Furthermore, the mobility and fluctuating connectivity of computing nodes in EECC environments introduce complexities in dynamic node migration, further compromising the robustness of the training process. To address multiple challenges within a unified framework, we propose End-Edge-Cloud Federated Learning with Self-Rectified Knowledge Agglomeration (FedEEC), which is a novel EECC-empowered FL framework that allows the trained models from end, edge, to cloud to grow larger in size and stronger in generalization ability. FedEEC introduces two key innovations: (1) Bridge Sample Based Online Distillation Protocol (BSBODP), which enables knowledge transfer between neighboring nodes through generated bridge samples, and (2) Self-Knowledge Rectification (SKR), which refines the transferred knowledge to prevent suboptimal cloud model optimization. The proposed framework effectively handles both cross-tier resource heterogeneity and effective knowledge transfer between neighboring nodes, while satisfying the migration-resilient requirements of EECC.

* 16 pages, 7 tables, 6 figures. arXiv admin note: text overlap with arXiv:2312.11489

Via

Access Paper or Ask Questions

Blockchain Data Analysis in the Era of Large-Language Models

Dec 09, 2024

Kentaroh Toyoda, Xiao Wang, Mingzhe Li, Bo Gao, Yuan Wang, Qingsong Wei

Figure 1 for Blockchain Data Analysis in the Era of Large-Language Models

Figure 2 for Blockchain Data Analysis in the Era of Large-Language Models

Figure 3 for Blockchain Data Analysis in the Era of Large-Language Models

Figure 4 for Blockchain Data Analysis in the Era of Large-Language Models

Abstract:Blockchain data analysis is essential for deriving insights, tracking transactions, identifying patterns, and ensuring the integrity and security of decentralized networks. It plays a key role in various areas, such as fraud detection, regulatory compliance, smart contract auditing, and decentralized finance (DeFi) risk management. However, existing blockchain data analysis tools face challenges, including data scarcity, the lack of generalizability, and the lack of reasoning capability. We believe large language models (LLMs) can mitigate these challenges; however, we have not seen papers discussing LLM integration in blockchain data analysis in a comprehensive and systematic way. This paper systematically explores potential techniques and design patterns in LLM-integrated blockchain data analysis. We also outline prospective research opportunities and challenges, emphasizing the need for further exploration in this promising field. This paper aims to benefit a diverse audience spanning academia, industry, and policy-making, offering valuable insights into the integration of LLMs in blockchain data analysis.

Via

Access Paper or Ask Questions

DiffuMask-Editor: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing to Improve Segmentation Ability

Nov 04, 2024

Bo Gao, Fangxu Xing, Daniel Tang

Abstract:Semantic segmentation models, like mask2former, often demand a substantial amount of manually annotated data, which is time-consuming and inefficient to acquire. Leveraging state-of-the-art text-to-image models like Midjourney and Stable Diffusion has emerged as an effective strategy for automatically generating synthetic data instead of human annotations. However, prior approaches have been constrained to synthesizing single-instance images due to the instability inherent in generating multiple instances with Stable Diffusion. To expand the domains and diversity of synthetic datasets, this paper introduces a novel paradigm named DiffuMask-Editor, which combines the Diffusion Model for Segmentation with Image Editing. By integrating multiple objects into images using Text2Image models, our method facilitates the creation of more realistic datasets that closely resemble open-world settings while simultaneously generating accurate masks. Our approach significantly reduces the laborious effort associated with manual annotation while ensuring precise mask generation. Experimental results demonstrate that synthetic data generated by DiffuMask-Editor enable segmentation methods to achieve superior performance compared to real data. Particularly in zero-shot backgrounds, DiffuMask-Editor achieves new state-of-the-art results on Unseen classes of VOC 2012. The code and models will be publicly available soon.

* 13 pages,4 figures

Via

Access Paper or Ask Questions

Peak-Controlled Logits Poisoning Attack in Federated Distillation

Jul 25, 2024

Yuhan Tang, Aoxu Zhang, Zhiyuan Wu, Bo Gao, Tian Wen, Yuwei Wang, Sheng Sun

Abstract:Federated Distillation (FD) offers an innovative approach to distributed machine learning, leveraging knowledge distillation for efficient and flexible cross-device knowledge transfer without necessitating the upload of extensive model parameters to a central server. While FD has gained popularity, its vulnerability to poisoning attacks remains underexplored. To address this gap, we previously introduced FDLA (Federated Distillation Logits Attack), a method that manipulates logits communication to mislead and degrade the performance of client models. However, the impact of FDLA on participants with different identities and the effects of malicious modifications at various stages of knowledge transfer remain unexplored. To this end, we present PCFDLA (Peak-Controlled Federated Distillation Logits Attack), an advanced and more stealthy logits poisoning attack method for FD. PCFDLA enhances the effectiveness of FDLA by carefully controlling the peak values of logits to create highly misleading yet inconspicuous modifications. Furthermore, we introduce a novel metric for better evaluating attack efficacy, demonstrating that PCFDLA maintains stealth while being significantly more disruptive to victim models compared to its predecessors. Experimental results across various datasets confirm the superior impact of PCFDLA on model accuracy, solidifying its potential threat in federated distillation systems.

* arXiv admin note: text overlap with arXiv:2401.03685

Via

Access Paper or Ask Questions

FedCache 2.0: Exploiting the Potential of Distilled Data in Knowledge Cache-driven Federated Learning

May 22, 2024

Quyang Pan, Sheng Sun, Zhiyuan Wu, Yuwei Wang, Min Liu, Bo Gao

Abstract:Federated Edge Learning (FEL) has emerged as a promising approach for enabling edge devices to collaboratively train machine learning models while preserving data privacy. Despite its advantages, practical FEL deployment faces significant challenges related to device constraints and device-server interactions, necessitating heterogeneous, user-adaptive model training with limited and uncertain communication. In this paper, we introduce FedCache 2.0, a novel personalized FEL architecture that simultaneously addresses these challenges. FedCache 2.0 incorporates the benefits of both dataset distillation and knowledge cache-driven federated learning by storing and organizing distilled data as knowledge in the server-side knowledge cache. Moreover, a device-centric cache sampling strategy is introduced to tailor transferred knowledge for individual devices within controlled communication bandwidth. Extensive experiments on five datasets covering image recognition, audio understanding, and mobile sensor data mining tasks demonstrate that (1) FedCache 2.0 significantly outperforms state-of-the-art methods regardless of model structures, data distributions, and modalities. (2) FedCache 2.0 can train splendid personalized on-device models with at least $\times$28.6 improvement in communication efficiency.

* 20 pages, 8 figures, 10 tables

Via

Access Paper or Ask Questions

Privacy-Enhanced Training-as-a-Service for On-Device Intelligence: Concept, Architectural Scheme, and Open Problems

Apr 27, 2024

Zhiyuan Wu, Sheng Sun, Yuwei Wang, Min Liu, Bo Gao, Tianliu He, Wen Wang

Figure 1 for Privacy-Enhanced Training-as-a-Service for On-Device Intelligence: Concept, Architectural Scheme, and Open Problems

Figure 2 for Privacy-Enhanced Training-as-a-Service for On-Device Intelligence: Concept, Architectural Scheme, and Open Problems

Figure 3 for Privacy-Enhanced Training-as-a-Service for On-Device Intelligence: Concept, Architectural Scheme, and Open Problems

Abstract:On-device intelligence (ODI) enables artificial intelligence (AI) applications to run on end devices, providing real-time and customized AI inference without relying on remote servers. However, training models for on-device deployment face significant challenges due to the decentralized and privacy-sensitive nature of users' data, along with end-side constraints related to network connectivity, computation efficiency, etc. Existing training paradigms, such as cloud-based training, federated learning, and transfer learning, fail to sufficiently address these practical constraints that are prevalent for devices. To overcome these challenges, we propose Privacy-Enhanced Training-as-a-Service (PTaaS), a novel service computing paradigm that provides privacy-friendly, customized AI model training for end devices. PTaaS outsources the core training process to remote and powerful cloud or edge servers, efficiently developing customized on-device models based on uploaded anonymous queries, enhancing data privacy while reducing the computation load on individual devices. We explore the definition, goals, and design principles of PTaaS, alongside emerging technologies that support the PTaaS paradigm. An architectural scheme for PTaaS is also presented, followed by a series of open problems that set the stage for future research directions in the field of PTaaS.

* 7 pages, 3 figures

Via

Access Paper or Ask Questions

Privacy-Preserving Training-as-a-Service for On-Device Intelligence: Concept, Architectural Scheme, and Open Problems

Apr 16, 2024

Zhiyuan Wu, Sheng Sun, Yuwei Wang, Min Liu, Bo Gao, Tianliu He, Wen Wang

Figure 1 for Privacy-Preserving Training-as-a-Service for On-Device Intelligence: Concept, Architectural Scheme, and Open Problems

Figure 2 for Privacy-Preserving Training-as-a-Service for On-Device Intelligence: Concept, Architectural Scheme, and Open Problems

Figure 3 for Privacy-Preserving Training-as-a-Service for On-Device Intelligence: Concept, Architectural Scheme, and Open Problems

Abstract:On-device intelligence (ODI) enables artificial intelligence (AI) applications to run on end devices, providing real-time and customized AI services without relying on remote servers. However, training models for on-device deployment face significant challenges due to the decentralized and privacy-sensitive nature of users' data, along with end-side constraints related to network connectivity, computation efficiency, etc. Existing training paradigms, such as cloud-based training, federated learning, and transfer learning, fail to sufficiently address these practical constraints that are prevalent for devices. To overcome these challenges, we propose Privacy-Preserving Training-as-a-Service (PTaaS), a novel service computing paradigm that provides privacy-friendly, customized AI model training for end devices. PTaaS outsources the core training process to remote and powerful cloud or edge servers, efficiently developing customized on-device models based on uploaded anonymous queries, ensuring data privacy while reducing the computation load on individual devices. We explore the definition, goals, and design principles of PTaaS, alongside emerging technologies that support the PTaaS paradigm. An architectural scheme for PTaaS is also presented, followed by a series of open problems that set the stage for future research directions in the field of PTaaS.

* 7 pages, 3 figures

Via

Access Paper or Ask Questions

Federated Class-Incremental Learning with New-Class Augmented Self-Distillation

Jan 09, 2024

Zhiyuan Wu, Tianliu He, Sheng Sun, Yuwei Wang, Min Liu, Bo Gao, Xuefeng Jiang

Abstract:Federated Learning (FL) enables collaborative model training among participants while guaranteeing the privacy of raw data. Mainstream FL methodologies overlook the dynamic nature of real-world data, particularly its tendency to grow in volume and diversify in classes over time. This oversight results in FL methods suffering from catastrophic forgetting, where the trained models inadvertently discard previously learned information upon assimilating new data. In response to this challenge, we propose a novel Federated Class-Incremental Learning (FCIL) method, named \underline{Fed}erated \underline{C}lass-Incremental \underline{L}earning with New-Class \underline{A}ugmented \underline{S}elf-Di\underline{S}tillation (FedCLASS). The core of FedCLASS is to enrich the class scores of historical models with new class scores predicted by current models and utilize the combined knowledge for self-distillation, enabling a more sufficient and precise knowledge transfer from historical models to current models. Theoretical analyses demonstrate that FedCLASS stands on reliable foundations, considering scores of old classes predicted by historical models as conditional probabilities in the absence of new classes, and the scores of new classes predicted by current models as the conditional probabilities of class scores derived from historical models. Empirical experiments demonstrate the superiority of FedCLASS over four baseline algorithms in reducing average forgetting rate and boosting global accuracy.

* 9 pages, 2 figures, 4 tables

Via

Access Paper or Ask Questions

Logits Poisoning Attack in Federated Distillation

Jan 08, 2024

Yuhan Tang, Zhiyuan Wu, Bo Gao, Tian Wen, Yuwei Wang, Sheng Sun

Abstract:Federated Distillation (FD) is a novel and promising distributed machine learning paradigm, where knowledge distillation is leveraged to facilitate a more efficient and flexible cross-device knowledge transfer in federated learning. By optimizing local models with knowledge distillation, FD circumvents the necessity of uploading large-scale model parameters to the central server, simultaneously preserving the raw data on local clients. Despite the growing popularity of FD, there is a noticeable gap in previous works concerning the exploration of poisoning attacks within this framework. This can lead to a scant understanding of the vulnerabilities to potential adversarial actions. To this end, we introduce FDLA, a poisoning attack method tailored for FD. FDLA manipulates logit communications in FD, aiming to significantly degrade model performance on clients through misleading the discrimination of private samples. Through extensive simulation experiments across a variety of datasets, attack scenarios, and FD configurations, we demonstrate that LPA effectively compromises client model accuracy, outperforming established baseline algorithms in this regard. Our findings underscore the critical need for robust defense mechanisms in FD settings to mitigate such adversarial threats.

* 13 pages, 3 figures, 5 tables

Via

Access Paper or Ask Questions