Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jonathan Gallagher

Provably effective detection of effective data poisoning attacks

Jan 21, 2025

Jonathan Gallagher, Yasaman Esfandiari, Callen MacPhee, Michael Warren

Abstract:This paper establishes a mathematically precise definition of dataset poisoning attack and proves that the very act of effectively poisoning a dataset ensures that the attack can be effectively detected. On top of a mathematical guarantee that dataset poisoning is identifiable by a new statistical test that we call the Conformal Separability Test, we provide experimental evidence that we can adequately detect poisoning attempts in the real world.

Via

Access Paper or Ask Questions

Self-Satisfied: An end-to-end framework for SAT generation and prediction

Oct 18, 2024

Christopher R. Serrano, Jonathan Gallagher, Kenji Yamada, Alexei Kopylov, Michael A. Warren

Figure 1 for Self-Satisfied: An end-to-end framework for SAT generation and prediction

Figure 2 for Self-Satisfied: An end-to-end framework for SAT generation and prediction

Figure 3 for Self-Satisfied: An end-to-end framework for SAT generation and prediction

Figure 4 for Self-Satisfied: An end-to-end framework for SAT generation and prediction

Abstract:The boolean satisfiability (SAT) problem asks whether there exists an assignment of boolean values to the variables of an arbitrary boolean formula making the formula evaluate to True. It is well-known that all NP-problems can be coded as SAT problems and therefore SAT is important both practically and theoretically. From both of these perspectives, better understanding the patterns and structure implicit in SAT data is of significant value. In this paper, we describe several advances that we believe will help open the door to such understanding: we introduce hardware accelerated algorithms for fast SAT problem generation, a geometric SAT encoding that enables the use of transformer architectures typically applied to vision tasks, and a simple yet effective technique we term head slicing for reducing sequence length representation inside transformer architectures. These advances allow us to scale our approach to SAT problems with thousands of variables and tens of thousands of clauses. We validate our architecture, termed Satisfiability Transformer (SaT), on the SAT prediction task with data from the SAT Competition (SATComp) 2022 problem sets. Prior related work either leveraged a pure machine learning approach, but could not handle SATComp-sized problems, or was hybrid in the sense of integrating a machine learning component in a standard SAT solving tool. Our pure machine learning approach achieves prediction accuracies comparable to recent work, but on problems that are an order of magnitude larger than previously demonstrated. A fundamental aspect of our work concerns the very nature of SAT data and its suitability for training machine learning models. We both describe experimental results that probe the landscape of where SAT data can be successfully used for learning and position these results within the broader context of complexity and learning.

* 22 pages

Via

Access Paper or Ask Questions

Development of a Dual-Input Neural Model for Detecting AI-Generated Imagery

Jun 19, 2024

Jonathan Gallagher, William Pugsley

Figure 1 for Development of a Dual-Input Neural Model for Detecting AI-Generated Imagery

Figure 2 for Development of a Dual-Input Neural Model for Detecting AI-Generated Imagery

Figure 3 for Development of a Dual-Input Neural Model for Detecting AI-Generated Imagery

Figure 4 for Development of a Dual-Input Neural Model for Detecting AI-Generated Imagery

Abstract:Over the past years, images generated by artificial intelligence have become more prevalent and more realistic. Their advent raises ethical questions relating to misinformation, artistic expression, and identity theft, among others. The crux of many of these moral questions is the difficulty in distinguishing between real and fake images. It is important to develop tools that are able to detect AI-generated images, especially when these images are too realistic-looking for the human eye to identify as fake. This paper proposes a dual-branch neural network architecture that takes both images and their Fourier frequency decomposition as inputs. We use standard CNN-based methods for both branches as described in Stuchi et al. [7], followed by fully-connected layers. Our proposed model achieves an accuracy of 94% on the CIFAKE dataset, which significantly outperforms classic ML methods and CNNs, achieving performance comparable to some state-of-the-art architectures, such as ResNet.

Via

Access Paper or Ask Questions