Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Khoa Nguyen

A Landmark-Aided Navigation Approach Using Side-Scan Sonar

Mar 10, 2025

Ellen Davenport, Khoa Nguyen, Junsu Jang, Clair Ma, Sean Fish, Luc Lenain, Florian Meyer

Abstract:Cost-effective localization methods for Autonomous Underwater Vehicle (AUV) navigation are key for ocean monitoring and data collection at high resolution in time and space. Algorithmic solutions suitable for real-time processing that handle nonlinear measurement models and different forms of measurement uncertainty will accelerate the development of field-ready technology. This paper details a Bayesian estimation method for landmark-aided navigation using a Side-scan Sonar (SSS) sensor. The method bounds navigation filter error in the GPS-denied undersea environment and captures the highly nonlinear nature of slant range measurements while remaining computationally tractable. Combining a novel measurement model with the chosen statistical framework facilitates the efficient use of SSS data and, in the future, could be used in real time. The proposed filter has two primary steps: a prediction step using an unscented transform and an update step utilizing particles. The update step performs probabilistic association of sonar detections with known landmarks. We evaluate algorithm performance and tractability using synthetic data and real data collected field experiments. Field experiments were performed using two different marine robotic platforms with two different SSS and at two different sites. Finally, we discuss the computational requirements of the proposed method and how it extends to real-time applications.

* 23 pages, 10 figures

Via

Access Paper or Ask Questions

Clustering-based Meta Bayesian Optimization with Theoretical Guarantee

Mar 08, 2025

Khoa Nguyen, Viet Huynh, Binh Tran, Tri Pham, Tin Huynh, Thin Nguyen

Abstract:Bayesian Optimization (BO) is a well-established method for addressing black-box optimization problems. In many real-world scenarios, optimization often involves multiple functions, emphasizing the importance of leveraging data and learned functions from prior tasks to enhance efficiency in the current task. To expedite convergence to the global optimum, recent studies have introduced meta-learning strategies, collectively referred to as meta-BO, to incorporate knowledge from historical tasks. However, in practical settings, the underlying functions are often heterogeneous, which can adversely affect optimization performance for the current task. Additionally, when the number of historical tasks is large, meta-BO methods face significant scalability challenges. In this work, we propose a scalable and robust meta-BO method designed to address key challenges in heterogeneous and large-scale meta-tasks. Our approach (1) effectively partitions transferred meta-functions into highly homogeneous clusters, (2) learns the geometry-based surrogate prototype that capture the structural patterns within each cluster, and (3) adaptively synthesizes meta-priors during the online phase using statistical distance-based weighting policies. Experimental results on real-world hyperparameter optimization (HPO) tasks, combined with theoretical guarantees, demonstrate the robustness and effectiveness of our method in overcoming these challenges.

* Accepted at PAKDD 2025

Via

Access Paper or Ask Questions

Sampling Foundational Transformer: A Theoretical Perspective

Aug 11, 2024

Viet Anh Nguyen, Minh Lenhat, Khoa Nguyen, Duong Duc Hieu, Dao Huu Hung, Truong Son Hy

Figure 1 for Sampling Foundational Transformer: A Theoretical Perspective

Figure 2 for Sampling Foundational Transformer: A Theoretical Perspective

Figure 3 for Sampling Foundational Transformer: A Theoretical Perspective

Figure 4 for Sampling Foundational Transformer: A Theoretical Perspective

Abstract:The versatility of self-attention mechanism earned transformers great success in almost all data modalities, with limitations on the quadratic complexity and difficulty of training. To apply transformers across different data modalities, practitioners have to make specific clever data-modality-dependent constructions. In this paper, we propose Sampling Foundational Transformer (SFT) that can work on multiple data modalities (e.g., point cloud, graph, and sequence) and constraints (e.g., rotational-invariant). The existence of such model is important as contemporary foundational modeling requires operability on multiple data sources. For efficiency on large number of tokens, our model relies on our context aware sampling-without-replacement mechanism for both linear asymptotic computational complexity and real inference time gain. For efficiency, we rely on our newly discovered pseudoconvex formulation of transformer layer to increase model's convergence rate. As a model working on multiple data modalities, SFT has achieved competitive results on many benchmarks, while being faster in inference, compared to other very specialized models.

Via

Access Paper or Ask Questions

SAMSA: Efficient Transformer for Many Data Modalities

Aug 10, 2024

Minh Lenhat, Viet Anh Nguyen, Khoa Nguyen, Duong Duc Hieu, Dao Huu Hung, Truong Son Hy

Figure 1 for SAMSA: Efficient Transformer for Many Data Modalities

Figure 2 for SAMSA: Efficient Transformer for Many Data Modalities

Figure 3 for SAMSA: Efficient Transformer for Many Data Modalities

Figure 4 for SAMSA: Efficient Transformer for Many Data Modalities

Abstract:The versatility of self-attention mechanism earned transformers great success in almost all data modalities, with limitations on the quadratic complexity and difficulty of training. Efficient transformers, on the other hand, often rely on clever data-modality-dependent construction to get over the quadratic complexity of transformers. This greatly hinders their applications on different data modalities, which is one of the pillars of contemporary foundational modeling. In this paper, we lay the groundwork for efficient foundational modeling by proposing SAMSA - SAMpling-Self-Attention, a context-aware linear complexity self-attention mechanism that works well on multiple data modalities. Our mechanism is based on a differentiable sampling without replacement method we discovered. This enables the self-attention module to attend to the most important token set, where the importance is defined by data. Moreover, as differentiability is not needed in inference, the sparse formulation of our method costs little time overhead, further lowering computational costs. In short, SAMSA achieved competitive or even SOTA results on many benchmarks, while being faster in inference, compared to other very specialized models. Against full self-attention, real inference time significantly decreases while performance ranges from negligible degradation to outperformance. We release our source code in the repository: https://github.com/HySonLab/SAMSA

Via

Access Paper or Ask Questions

Wildest Dreams: Reproducible Research in Privacy-preserving Neural Network Training

Mar 06, 2024

Tanveer Khan, Mindaugas Budzys, Khoa Nguyen, Antonis Michalas

Abstract:Machine Learning (ML), addresses a multitude of complex issues in multiple disciplines, including social sciences, finance, and medical research. ML models require substantial computing power and are only as powerful as the data utilized. Due to high computational cost of ML methods, data scientists frequently use Machine Learning-as-a-Service (MLaaS) to outsource computation to external servers. However, when working with private information, like financial data or health records, outsourcing the computation might result in privacy issues. Recent advances in Privacy-Preserving Techniques (PPTs) have enabled ML training and inference over protected data through the use of Privacy-Preserving Machine Learning (PPML). However, these techniques are still at a preliminary stage and their application in real-world situations is demanding. In order to comprehend discrepancy between theoretical research suggestions and actual applications, this work examines the past and present of PPML, focusing on Homomorphic Encryption (HE) and Secure Multi-party Computation (SMPC) applied to ML. This work primarily focuses on the ML model's training phase, where maintaining user data privacy is of utmost importance. We provide a solid theoretical background that eases the understanding of current approaches and their limitations. In addition, we present a SoK of the most recent PPML frameworks for model training and provide a comprehensive comparison in terms of the unique properties and performances on standard benchmarks. Also, we reproduce the results for some of the papers and examine at what level existing works in the field provide support for open science. We believe our work serves as a valuable contribution by raising awareness about the current gap between theoretical advancements and real-world applications in PPML, specifically regarding open-source availability, reproducibility, and usability.

Via

Access Paper or Ask Questions

GuardML: Efficient Privacy-Preserving Machine Learning Services Through Hybrid Homomorphic Encryption

Jan 26, 2024

Eugene Frimpong, Khoa Nguyen, Mindaugas Budzys, Tanveer Khan, Antonis Michalas

Abstract:Machine Learning (ML) has emerged as one of data science's most transformative and influential domains. However, the widespread adoption of ML introduces privacy-related concerns owing to the increasing number of malicious attacks targeting ML models. To address these concerns, Privacy-Preserving Machine Learning (PPML) methods have been introduced to safeguard the privacy and security of ML models. One such approach is the use of Homomorphic Encryption (HE). However, the significant drawbacks and inefficiencies of traditional HE render it impractical for highly scalable scenarios. Fortunately, a modern cryptographic scheme, Hybrid Homomorphic Encryption (HHE), has recently emerged, combining the strengths of symmetric cryptography and HE to surmount these challenges. Our work seeks to introduce HHE to ML by designing a PPML scheme tailored for end devices. We leverage HHE as the fundamental building block to enable secure learning of classification outcomes over encrypted data, all while preserving the privacy of the input data and ML model. We demonstrate the real-world applicability of our construction by developing and evaluating an HHE-based PPML application for classifying heart disease based on sensitive ECG data. Notably, our evaluations revealed a slight reduction in accuracy compared to inference on plaintext data. Additionally, both the analyst and end devices experience minimal communication and computation costs, underscoring the practical viability of our approach. The successful integration of HHE into PPML provides a glimpse into a more secure and privacy-conscious future for machine learning on relatively constrained end devices.

* 10 pages, accepted at The 39th ACM/SIGAPP Symposium on Applied Computing (SAC) conference

Via

Access Paper or Ask Questions

Love or Hate? Share or Split? Privacy-Preserving Training Using Split Learning and Homomorphic Encryption

Sep 19, 2023

Tanveer Khan, Khoa Nguyen, Antonis Michalas, Alexandros Bakas

Abstract:Split learning (SL) is a new collaborative learning technique that allows participants, e.g. a client and a server, to train machine learning models without the client sharing raw data. In this setting, the client initially applies its part of the machine learning model on the raw data to generate activation maps and then sends them to the server to continue the training process. Previous works in the field demonstrated that reconstructing activation maps could result in privacy leakage of client data. In addition to that, existing mitigation techniques that overcome the privacy leakage of SL prove to be significantly worse in terms of accuracy. In this paper, we improve upon previous works by constructing a protocol based on U-shaped SL that can operate on homomorphically encrypted data. More precisely, in our approach, the client applies homomorphic encryption on the activation maps before sending them to the server, thus protecting user privacy. This is an important improvement that reduces privacy leakage in comparison to other SL-based works. Finally, our results show that, with the optimum set of parameters, training with HE data in the U-shaped SL setting only reduces accuracy by 2.65% compared to training on plaintext. In addition, raw training data privacy is preserved.

* arXiv admin note: substantial text overlap with arXiv:2301.08778, arXiv:2309.08697

Via

Access Paper or Ask Questions

Split Without a Leak: Reducing Privacy Leakage in Split Learning

Aug 30, 2023

Khoa Nguyen, Tanveer Khan, Antonis Michalas

Abstract:The popularity of Deep Learning (DL) makes the privacy of sensitive data more imperative than ever. As a result, various privacy-preserving techniques have been implemented to preserve user data privacy in DL. Among various privacy-preserving techniques, collaborative learning techniques, such as Split Learning (SL) have been utilized to accelerate the learning and prediction process. Initially, SL was considered a promising approach to data privacy. However, subsequent research has demonstrated that SL is susceptible to many types of attacks and, therefore, it cannot serve as a privacy-preserving technique. Meanwhile, countermeasures using a combination of SL and encryption have also been introduced to achieve privacy-preserving deep learning. In this work, we propose a hybrid approach using SL and Homomorphic Encryption (HE). The idea behind it is that the client encrypts the activation map (the output of the split layer between the client and the server) before sending it to the server. Hence, during both forward and backward propagation, the server cannot reconstruct the client's input data from the intermediate activation map. This improvement is important as it reduces privacy leakage compared to other SL-based works, where the server can gain valuable information about the client's input. In addition, on the MIT-BIH dataset, our proposed hybrid approach using SL and HE yields faster training time (about 6 times) and significantly reduced communication overhead (almost 160 times) compared to other HE-based approaches, thereby offering improved privacy protection for sensitive data in DL.

Via

Access Paper or Ask Questions

Split Ways: Privacy-Preserving Training of Encrypted Data Using Split Learning

Jan 20, 2023

Tanveer Khan, Khoa Nguyen, Antonis Michalas

Abstract:Split Learning (SL) is a new collaborative learning technique that allows participants, e.g. a client and a server, to train machine learning models without the client sharing raw data. In this setting, the client initially applies its part of the machine learning model on the raw data to generate activation maps and then sends them to the server to continue the training process. Previous works in the field demonstrated that reconstructing activation maps could result in privacy leakage of client data. In addition to that, existing mitigation techniques that overcome the privacy leakage of SL prove to be significantly worse in terms of accuracy. In this paper, we improve upon previous works by constructing a protocol based on U-shaped SL that can operate on homomorphically encrypted data. More precisely, in our approach, the client applies Homomorphic Encryption (HE) on the activation maps before sending them to the server, thus protecting user privacy. This is an important improvement that reduces privacy leakage in comparison to other SL-based works. Finally, our results show that, with the optimum set of parameters, training with HE data in the U-shaped SL setting only reduces accuracy by 2.65% compared to training on plaintext. In addition, raw training data privacy is preserved.

Via

Access Paper or Ask Questions

Iterative Joint Parameters Estimation and Decoding in a Distributed Receiver for Satellite Applications and Relevant Cramer-Rao Bounds

Jan 23, 2022

Ahsan Waqas, Khoa Nguyen, Gottfried Lechner, Terence Chan

Figure 1 for Iterative Joint Parameters Estimation and Decoding in a Distributed Receiver for Satellite Applications and Relevant Cramer-Rao Bounds

Figure 2 for Iterative Joint Parameters Estimation and Decoding in a Distributed Receiver for Satellite Applications and Relevant Cramer-Rao Bounds

Figure 3 for Iterative Joint Parameters Estimation and Decoding in a Distributed Receiver for Satellite Applications and Relevant Cramer-Rao Bounds

Figure 4 for Iterative Joint Parameters Estimation and Decoding in a Distributed Receiver for Satellite Applications and Relevant Cramer-Rao Bounds

Abstract:This paper presents an algorithm for iterative joint channel parameter (carrier phase, Doppler shift and Doppler rate) estimation and decoding of transmission over channels affected by Doppler shift and Doppler rate using a distributed receiver. This algorithm is derived by applying the sum-product algorithm (SPA) to a factor graph representing the joint a posteriori distribution of the information symbols and channel parameters given the channel output. In this paper, we present two methods for dealing with intractable messages of the sum-product algorithm. In the first approach, we use particle filtering with sequential importance sampling (SIS) for the estimation of the unknown parameters. We also propose a method for fine-tuning of particles for improved convergence. In the second approach, we approximate our model with a random walk phase model, followed by a phase tracking algorithm and polynomial regression algorithm to estimate the unknown parameters. We derive the Weighted Bayesian Cramer-Rao Bounds (WBCRBs) for joint carrier phase, Doppler shift and Doppler rate estimation, which take into account the prior distribution of the estimation parameters and are accurate lower bounds for all considered Signal to Noise Ratio (SNR) values. Numerical results (of bit error rate (BER) and the mean-square error (MSE) of parameter estimation) suggest that phase tracking with the random walk model slightly outperforms particle filtering. However, particle filtering has a lower computational cost than the random walk model based method.

* 19 pages, 11 figures

Via

Access Paper or Ask Questions