Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aydin Abadi

Impact of Data Duplication on Deep Neural Network-Based Image Classifiers: Robust vs. Standard Models

Apr 01, 2025

Alireza Aghabagherloo, Aydin Abadi, Sumanta Sarkar, Vishnu Asutosh Dasu, Bart Preneel

Abstract:The accuracy and robustness of machine learning models against adversarial attacks are significantly influenced by factors such as training data quality, model architecture, the training process, and the deployment environment. In recent years, duplicated data in training sets, especially in language models, has attracted considerable attention. It has been shown that deduplication enhances both training performance and model accuracy in language models. While the importance of data quality in training image classifier Deep Neural Networks (DNNs) is widely recognized, the impact of duplicated images in the training set on model generalization and performance has received little attention. In this paper, we address this gap and provide a comprehensive study on the effect of duplicates in image classification. Our analysis indicates that the presence of duplicated images in the training set not only negatively affects the efficiency of model training but also may result in lower accuracy of the image classifier. This negative impact of duplication on accuracy is particularly evident when duplicated data is non-uniform across classes or when duplication, whether uniform or non-uniform, occurs in the training set of an adversarially trained model. Even when duplicated samples are selected in a uniform way, increasing the amount of duplication does not lead to a significant improvement in accuracy.

Via

Access Paper or Ask Questions

Verifiable Homomorphic Linear Combinations in Multi-Instance Time-Lock Puzzles

Aug 22, 2024

Aydin Abadi

Abstract:Time-Lock Puzzles (TLPs) have been developed to securely transmit sensitive information into the future without relying on a trusted third party. Multi-instance TLP is a scalable variant of TLP that enables a server to efficiently find solutions to different puzzles provided by a client at once. Nevertheless, existing multi-instance TLPs lack support for (verifiable) homomorphic computation. To address this limitation, we introduce the "Multi-Instance partially Homomorphic TLP" (MH-TLP), a multi-instance TLP supporting efficient verifiable homomorphic linear combinations of puzzles belonging to a client. It ensures anyone can verify the correctness of computations and solutions. Building on MH-TLP, we further propose the "Multi-instance Multi-client verifiable partially Homomorphic TLP" (MMH-TLP). It not only supports all the features of MH-TLP but also allows for verifiable homomorphic linear combinations of puzzles from different clients. Our schemes refrain from using asymmetric-key cryptography for verification and, unlike most homomorphic TLPs, do not require a trusted third party. A comprehensive cost analysis demonstrates that our schemes scale linearly with the number of clients and puzzles.

* arXiv admin note: text overlap with arXiv:2406.15070

Via

Access Paper or Ask Questions

Privacy-Preserving Data Deduplication for Enhancing Federated Learning of Language Models

Jul 11, 2024

Aydin Abadi, Vishnu Asutosh Dasu, Sumanta Sarkar

Abstract:Deduplication is a vital preprocessing step that enhances machine learning model performance and saves training time and energy. However, enhancing federated learning through deduplication poses challenges, especially regarding scalability and potential privacy violations if deduplication involves sharing all clients' data. In this paper, we address the problem of deduplication in a federated setup by introducing a pioneering protocol, Efficient Privacy-Preserving Multi-Party Deduplication (EP-MPD). It efficiently removes duplicates from multiple clients' datasets without compromising data privacy. EP-MPD is constructed in a modular fashion, utilizing two novel variants of the Private Set Intersection protocol. Our extensive experiments demonstrate the significant benefits of deduplication in federated learning of large language models. For instance, we observe up to 19.61% improvement in perplexity and up to 27.95% reduction in running time. EP-MPD effectively balances privacy and performance in federated learning, making it a valuable solution for large-scale applications.

Via

Access Paper or Ask Questions

Tempora-Fusion: Time-Lock Puzzle with Efficient Verifiable Homomorphic Linear Combination

Jun 24, 2024

Aydin Abadi

Figure 1 for Tempora-Fusion: Time-Lock Puzzle with Efficient Verifiable Homomorphic Linear Combination

Figure 2 for Tempora-Fusion: Time-Lock Puzzle with Efficient Verifiable Homomorphic Linear Combination

Figure 3 for Tempora-Fusion: Time-Lock Puzzle with Efficient Verifiable Homomorphic Linear Combination

Figure 4 for Tempora-Fusion: Time-Lock Puzzle with Efficient Verifiable Homomorphic Linear Combination

Abstract:To securely transmit sensitive information into the future, Time-Lock Puzzles (TLPs) have been developed. Their applications include scheduled payments, timed commitments, e-voting, and sealed-bid auctions. Homomorphic TLP is a key variant of TLP that enables computation on puzzles from different clients. This allows a solver/server to tackle only a single puzzle encoding the computation's result. However, existing homomorphic TLPs lack support for verifying the correctness of the computation results. We address this limitation by introducing Tempora-Fusion, a TLP that allows a server to perform homomorphic linear combinations of puzzles from different clients while ensuring verification of computation correctness. This scheme avoids asymmetric-key cryptography for verification, thus paving the way for efficient implementations. We discuss our scheme's application in various domains, such as federated learning, scheduled payments in online banking, and e-voting.

Via

Access Paper or Ask Questions

Supersonic OT: Fast Unconditionally Secure Oblivious Transfer

Jun 21, 2024

Aydin Abadi, Yvo Desmedt

Abstract:Oblivious Transfer (OT) is a fundamental cryptographic protocol with applications in secure Multi-Party Computation, Federated Learning, and Private Set Intersection. With the advent of quantum computing, it is crucial to develop unconditionally secure core primitives like OT to ensure their continued security in the post-quantum era. Despite over four decades since OT's introduction, the literature has predominantly relied on computational assumptions, except in cases using unconventional methods like noisy channels or a fully trusted party. Introducing "Supersonic OT", a highly efficient and unconditionally secure OT scheme that avoids public-key-based primitives, we offer an alternative to traditional approaches. Supersonic OT enables a receiver to obtain a response of size O(1). Its simple (yet non-trivial) design facilitates easy security analysis and implementation. The protocol employs a basic secret-sharing scheme, controlled swaps, the one-time pad, and a third-party helper who may be corrupted by a semi-honest adversary. Our implementation and runtime analysis indicate that a single instance of Supersonic OT completes in 0.35 milliseconds, making it up to 2000 times faster than the state-of-the-art base OT.

* arXiv admin note: text overlap with arXiv:2406.15063

Via

Access Paper or Ask Questions

Starlit: Privacy-Preserving Federated Learning to Enhance Financial Fraud Detection

Jan 22, 2024

Aydin Abadi, Bradley Doyle, Francesco Gini, Kieron Guinamard, Sasi Kumar Murakonda, Jack Liddell, Paul Mellor, Steven J. Murdoch, Mohammad Naseri, Hector Page(+2 more)

Abstract:Federated Learning (FL) is a data-minimization approach enabling collaborative model training across diverse clients with local data, avoiding direct data exchange. However, state-of-the-art FL solutions to identify fraudulent financial transactions exhibit a subset of the following limitations. They (1) lack a formal security definition and proof, (2) assume prior freezing of suspicious customers' accounts by financial institutions (limiting the solutions' adoption), (3) scale poorly, involving either $O(n^2)$ computationally expensive modular exponentiation (where $n$ is the total number of financial institutions) or highly inefficient fully homomorphic encryption, (4) assume the parties have already completed the identity alignment phase, hence excluding it from the implementation, performance evaluation, and security analysis, and (5) struggle to resist clients' dropouts. This work introduces Starlit, a novel scalable privacy-preserving FL mechanism that overcomes these limitations. It has various applications, such as enhancing financial fraud detection, mitigating terrorism, and enhancing digital health. We implemented Starlit and conducted a thorough performance analysis using synthetic data from a key player in global financial transactions. The evaluation indicates Starlit's scalability, efficiency, and accuracy.

Via

Access Paper or Ask Questions