Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kamer Kaya

MCMC for Bayesian estimation of Differential Privacy from Membership Inference Attacks

Apr 23, 2025

Ceren Yildirim, Kamer Kaya, Sinan Yildirim, Erkay Savas

Abstract:We propose a new framework for Bayesian estimation of differential privacy, incorporating evidence from multiple membership inference attacks (MIA). Bayesian estimation is carried out via a Markov chain Monte Carlo (MCMC) algorithm, named MCMC-DP-Est, which provides an estimate of the full posterior distribution of the privacy parameter (e.g., instead of just credible intervals). Critically, the proposed method does not assume that privacy auditing is performed with the most powerful attack on the worst-case (dataset, challenge point) pair, which is typically unrealistic. Instead, MCMC-DP-Est jointly estimates the strengths of MIAs used and the privacy of the training algorithm, yielding a more cautious privacy analysis. We also present an economical way to generate measurements for the performance of an MIA that is to be used by the MCMC method to estimate privacy. We present the use of the methods with numerical examples with both artificial and real data.

* Code available: https://github.com/cerenyildirim/MCMC_for_Bayesian_estimation

Via

Access Paper or Ask Questions

A Retrieval-Augmented Generation Framework for Academic Literature Navigation in Data Science

Dec 19, 2024

Ahmet Yasin Aytar, Kemal Kilic, Kamer Kaya

Figure 1 for A Retrieval-Augmented Generation Framework for Academic Literature Navigation in Data Science

Figure 2 for A Retrieval-Augmented Generation Framework for Academic Literature Navigation in Data Science

Figure 3 for A Retrieval-Augmented Generation Framework for Academic Literature Navigation in Data Science

Figure 4 for A Retrieval-Augmented Generation Framework for Academic Literature Navigation in Data Science

Abstract:In the rapidly evolving field of data science, efficiently navigating the expansive body of academic literature is crucial for informed decision-making and innovation. This paper presents an enhanced Retrieval-Augmented Generation (RAG) application, an artificial intelligence (AI)-based system designed to assist data scientists in accessing precise and contextually relevant academic resources. The AI-powered application integrates advanced techniques, including the GeneRation Of BIbliographic Data (GROBID) technique for extracting bibliographic information, fine-tuned embedding models, semantic chunking, and an abstract-first retrieval method, to significantly improve the relevance and accuracy of the retrieved information. This implementation of AI specifically addresses the challenge of academic literature navigation. A comprehensive evaluation using the Retrieval-Augmented Generation Assessment System (RAGAS) framework demonstrates substantial improvements in key metrics, particularly Context Relevance, underscoring the system's effectiveness in reducing information overload and enhancing decision-making processes. Our findings highlight the potential of this enhanced Retrieval-Augmented Generation system to transform academic exploration within data science, ultimately advancing the workflow of research and innovation in the field.

Via

Access Paper or Ask Questions

Boosting Graph Embedding on a Single GPU

Oct 19, 2021

Amro Alabsi Aljundi, Taha Atahan Akyıldız, Kamer Kaya

Figure 1 for Boosting Graph Embedding on a Single GPU

Figure 2 for Boosting Graph Embedding on a Single GPU

Figure 3 for Boosting Graph Embedding on a Single GPU

Figure 4 for Boosting Graph Embedding on a Single GPU

Abstract:Graphs are ubiquitous, and they can model unique characteristics and complex relations of real-life systems. Although using machine learning (ML) on graphs is promising, their raw representation is not suitable for ML algorithms. Graph embedding represents each node of a graph as a d-dimensional vector which is more suitable for ML tasks. However, the embedding process is expensive, and CPU-based tools do not scale to real-world graphs. In this work, we present GOSH, a GPU-based tool for embedding large-scale graphs with minimum hardware constraints. GOSH employs a novel graph coarsening algorithm to enhance the impact of updates and minimize the work for embedding. It also incorporates a decomposition schema that enables any arbitrarily large graph to be embedded with a single GPU. As a result, GOSH sets a new state-of-the-art in link prediction both in accuracy and speed, and delivers high-quality embeddings for node classification at a fraction of the time compared to the state-of-the-art. For instance, it can embed a graph with over 65 million vertices and 1.8 billion edges in less than 30 minutes on a single GPU.

* 12 pages, 11 tables, 6 figures, submitted for publication at Special Section on Parallel and Distributed Computing Techniques for AI, ML, and DL

Via

Access Paper or Ask Questions

Understanding Coarsening for Embedding Large-Scale Graphs

Sep 10, 2020

Taha Atahan Akyildiz, Amro Alabsi Aljundi, Kamer Kaya

Figure 1 for Understanding Coarsening for Embedding Large-Scale Graphs

Figure 2 for Understanding Coarsening for Embedding Large-Scale Graphs

Figure 3 for Understanding Coarsening for Embedding Large-Scale Graphs

Figure 4 for Understanding Coarsening for Embedding Large-Scale Graphs

Abstract:A significant portion of the data today, e.g, social networks, web connections, etc., can be modeled by graphs. A proper analysis of graphs with Machine Learning (ML) algorithms has the potential to yield far-reaching insights into many areas of research and industry. However, the irregular structure of graph data constitutes an obstacle for running ML tasks on graphs such as link prediction, node classification, and anomaly detection. Graph embedding is a compute-intensive process of representing graphs as a set of vectors in a d-dimensional space, which in turn makes it amenable to ML tasks. Many approaches have been proposed in the literature to improve the performance of graph embedding, e.g., using distributed algorithms, accelerators, and pre-processing techniques. Graph coarsening, which can be considered a pre-processing step, is a structural approximation of a given, large graph with a smaller one. As the literature suggests, the cost of embedding significantly decreases when coarsening is employed. In this work, we thoroughly analyze the impact of the coarsening quality on the embedding performance both in terms of speed and accuracy. Our experiments with a state-of-the-art, fast graph embedding tool show that there is an interplay between the coarsening decisions taken and the embedding quality.

* 10 pages, 6 figures, submitted to 2020 IEEE International Conference on Big Data

Via

Access Paper or Ask Questions

HAMSI: A Parallel Incremental Optimization Algorithm Using Quadratic Approximations for Solving Partially Separable Problems

Aug 04, 2017

Kamer Kaya, Figen Öztoprak, Ş. İlker Birbil, A. Taylan Cemgil, Umut Şimşekli, Nurdan Kuru, Hazal Koptagel, M. Kaan Öztürk

Figure 1 for HAMSI: A Parallel Incremental Optimization Algorithm Using Quadratic Approximations for Solving Partially Separable Problems

Figure 2 for HAMSI: A Parallel Incremental Optimization Algorithm Using Quadratic Approximations for Solving Partially Separable Problems

Figure 3 for HAMSI: A Parallel Incremental Optimization Algorithm Using Quadratic Approximations for Solving Partially Separable Problems

Figure 4 for HAMSI: A Parallel Incremental Optimization Algorithm Using Quadratic Approximations for Solving Partially Separable Problems

Abstract:We propose HAMSI (Hessian Approximated Multiple Subsets Iteration), which is a provably convergent, second order incremental algorithm for solving large-scale partially separable optimization problems. The algorithm is based on a local quadratic approximation, and hence, allows incorporating curvature information to speed-up the convergence. HAMSI is inherently parallel and it scales nicely with the number of processors. Combined with techniques for effectively utilizing modern parallel computer architectures, we illustrate that the proposed method converges more rapidly than a parallel stochastic gradient descent when both methods are used to solve large-scale matrix factorization problems. This performance gain comes only at the expense of using memory that scales linearly with the total size of the optimization variables. We conclude that HAMSI may be considered as a viable alternative in many large scale problems, where first order methods based on variants of stochastic gradient descent are applicable.

* The software is available at https://github.com/spartensor/hamsi-mf

Via

Access Paper or Ask Questions