Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

German Magai

Sheaf theory: from deep geometry to deep learning

Feb 21, 2025

Anton Ayzenberg, Thomas Gebhart, German Magai, Grigory Solomadin

Abstract:This paper provides an overview of the applications of sheaf theory in deep learning, data science, and computer science in general. The primary text of this work serves as a friendly introduction to applied and computational sheaf theory accessible to those with modest mathematical familiarity. We describe intuitions and motivations underlying sheaf theory shared by both theoretical researchers and practitioners, bridging classical mathematical theory and its more recent implementations within signal processing and deep learning. We observe that most notions commonly considered specific to cellular sheaves translate to sheaves on arbitrary posets, providing an interesting avenue for further generalization of these methods in applications, and we present a new algorithm to compute sheaf cohomology on arbitrary finite posets in response. By integrating classical theory with recent applications, this work reveals certain blind spots in current machine learning practices. We conclude with a list of problems related to sheaf-theoretic applications that we find mathematically insightful and practically instructive to solve. To ensure the exposition of sheaf theory is self-contained, a rigorous mathematical introduction is provided in appendices which moves from an introduction of diagrams and sheaves to the definition of derived functors, higher order cohomology, sheaf Laplacians, sheaf diffusion, and interconnections of these subjects therein.

* 117 pages, 8 figures

Via

Access Paper or Ask Questions

Robust AI-Generated Text Detection by Restricted Embeddings

Oct 10, 2024

Kristian Kuznetsov, Eduard Tulchinskii, Laida Kushnareva, German Magai, Serguei Barannikov, Sergey Nikolenko, Irina Piontkovskaya

Figure 1 for Robust AI-Generated Text Detection by Restricted Embeddings

Figure 2 for Robust AI-Generated Text Detection by Restricted Embeddings

Figure 3 for Robust AI-Generated Text Detection by Restricted Embeddings

Figure 4 for Robust AI-Generated Text Detection by Restricted Embeddings

Abstract:Growing amount and quality of AI-generated texts makes detecting such content more difficult. In most real-world scenarios, the domain (style and topic) of generated data and the generator model are not known in advance. In this work, we focus on the robustness of classifier-based detectors of AI-generated text, namely their ability to transfer to unseen generators or semantic domains. We investigate the geometry of the embedding space of Transformer-based text encoders and show that clearing out harmful linear subspaces helps to train a robust classifier, ignoring domain-specific spurious features. We investigate several subspace decomposition and feature selection strategies and achieve significant improvements over state of the art methods in cross-domain and cross-generator transfer. Our best approaches for head-wise and coordinate-based subspace removal increase the mean out-of-distribution (OOD) classification score by up to 9% and 14% in particular setups for RoBERTa and BERT embeddings respectively. We release our code and data: https://github.com/SilverSolver/RobustATD

* Accepted to Findings of EMNLP 2024

Via

Access Paper or Ask Questions

ICML Topological Deep Learning Challenge 2024: Beyond the Graph Domain

Sep 08, 2024

Guillermo Bernárdez, Lev Telyatnikov, Marco Montagna, Federica Baccini, Mathilde Papillon, Miquel Ferriol-Galmés, Mustafa Hajij, Theodore Papamarkou, Maria Sofia Bucarelli, Olga Zaghen(+63 more)

Figure 1 for ICML Topological Deep Learning Challenge 2024: Beyond the Graph Domain

Figure 2 for ICML Topological Deep Learning Challenge 2024: Beyond the Graph Domain

Figure 3 for ICML Topological Deep Learning Challenge 2024: Beyond the Graph Domain

Abstract:This paper describes the 2nd edition of the ICML Topological Deep Learning Challenge that was hosted within the ICML 2024 ELLIS Workshop on Geometry-grounded Representation Learning and Generative Modeling (GRaM). The challenge focused on the problem of representing data in different discrete topological domains in order to bridge the gap between Topological Deep Learning (TDL) and other types of structured datasets (e.g. point clouds, graphs). Specifically, participants were asked to design and implement topological liftings, i.e. mappings between different data structures and topological domains --like hypergraphs, or simplicial/cell/combinatorial complexes. The challenge received 52 submissions satisfying all the requirements. This paper introduces the main scope of the challenge, and summarizes the main results and findings.

* Proceedings of the Geometry-grounded Representation Learning and Generative Modeling Workshop (GRaM) at ICML 2024

Via

Access Paper or Ask Questions

Improving Interpretability and Robustness for the Detection of AI-Generated Images

Jun 21, 2024

Tatiana Gaintseva, Laida Kushnareva, German Magai, Irina Piontkovskaya, Sergey Nikolenko, Martin Benning, Serguei Barannikov, Gregory Slabaugh

Figure 1 for Improving Interpretability and Robustness for the Detection of AI-Generated Images

Figure 2 for Improving Interpretability and Robustness for the Detection of AI-Generated Images

Figure 3 for Improving Interpretability and Robustness for the Detection of AI-Generated Images

Figure 4 for Improving Interpretability and Robustness for the Detection of AI-Generated Images

Abstract:With growing abilities of generative models, artificial content detection becomes an increasingly important and difficult task. However, all popular approaches to this problem suffer from poor generalization across domains and generative models. In this work, we focus on the robustness of AI-generated image (AIGI) detectors. We analyze existing state-of-the-art AIGI detection methods based on frozen CLIP embeddings and show how to interpret them, shedding light on how images produced by various AI generators differ from real ones. Next we propose two ways to improve robustness: based on removing harmful components of the embedding vector and based on selecting the best performing attention heads in the image encoder model. Our methods increase the mean out-of-distribution (OOD) classification score by up to 6% for cross-model transfer. We also propose a new dataset for AIGI detection and use it in our evaluation; we believe this dataset will help boost further research. The dataset and code are provided as a supplement.

Via

Access Paper or Ask Questions

Artificial Text Boundary Detection with Topological Data Analysis and Sliding Window Techniques

Nov 14, 2023

Laida Kushnareva, Tatiana Gaintseva, German Magai, Serguei Barannikov, Dmitry Abulkhanov, Kristian Kuznetsov, Irina Piontkovskaya, Sergey Nikolenko

Abstract:Due to the rapid development of text generation models, people increasingly often encounter texts that may start out as written by a human but then continue as machine-generated results of large language models. Detecting the boundary between human-written and machine-generated parts of such texts is a very challenging problem that has not received much attention in literature. In this work, we consider and compare a number of different approaches for this artificial text boundary detection problem, comparing several predictors over features of different nature. We show that supervised fine-tuning of the RoBERTa model works well for this task in general but fails to generalize in important cross-domain and cross-generator settings, demonstrating a tendency to overfit to spurious properties of the data. Then, we propose novel approaches based on features extracted from a frozen language model's embeddings that are able to outperform both the human accuracy level and previously considered baselines on the Real or Fake Text benchmark. Moreover, we adapt perplexity-based approaches for the boundary detection task and analyze their behaviour. We analyze the robustness of all proposed classifiers in cross-domain and cross-model settings, discovering important properties of the data that can negatively influence the performance of artificial text boundary detection algorithms.

Via

Access Paper or Ask Questions

ICML 2023 Topological Deep Learning Challenge : Design and Results

Oct 02, 2023

Mathilde Papillon, Mustafa Hajij, Florian Frantzen, Josef Hoppe, Helen Jenne, Johan Mathe, Audun Myers, Theodore Papamarkou, Michael T. Schaub, Ghada Zamzmi(+55 more)

Figure 1 for ICML 2023 Topological Deep Learning Challenge : Design and Results

Figure 2 for ICML 2023 Topological Deep Learning Challenge : Design and Results

Abstract:This paper presents the computational challenge on topological deep learning that was hosted within the ICML 2023 Workshop on Topology and Geometry in Machine Learning. The competition asked participants to provide open-source implementations of topological neural networks from the literature by contributing to the python packages TopoNetX (data processing) and TopoModelX (deep learning). The challenge attracted twenty-eight qualifying submissions in its two-month duration. This paper describes the design of the challenge and summarizes its main findings.

Via

Access Paper or Ask Questions

Global cognitive graph properties dynamics of hippocampal formation

Aug 07, 2023

Konstantin Sorokin, Andrey Zaitsew, Aleksandr Levin, German Magai, Maxim Beketov, Vladimir Sotskov

Figure 1 for Global cognitive graph properties dynamics of hippocampal formation

Figure 2 for Global cognitive graph properties dynamics of hippocampal formation

Figure 3 for Global cognitive graph properties dynamics of hippocampal formation

Figure 4 for Global cognitive graph properties dynamics of hippocampal formation

Abstract:In the present study we have used a set of methods and metrics to build a graph of relative neural connections in a hippocampus of a rodent. A set of graphs was built on top of time-sequenced data and analyzed in terms of dynamics of a connection genesis. The analysis has shown that during the process of a rodent exploring a novel environment, the relations between neurons constantly change which indicates that globally memory is constantly updated even for known areas of space. Even if some neurons gain cognitive specialization, the global network though remains relatively stable. Additionally we suggest a set of methods for building a graph of cognitive neural network.

* 12 pages, 6 figures, paper for DAMDID 2023 Conference

Via

Access Paper or Ask Questions

Deep neural networks architectures from the perspective of manifold learning

Jun 06, 2023

German Magai

Abstract:Despite significant advances in the field of deep learning in ap-plications to various areas, an explanation of the learning pro-cess of neural network models remains an important open ques-tion. The purpose of this paper is a comprehensive comparison and description of neural network architectures in terms of ge-ometry and topology. We focus on the internal representation of neural networks and on the dynamics of changes in the topology and geometry of a data manifold on different layers. In this paper, we use the concepts of topological data analysis (TDA) and persistent homological fractal dimension. We present a wide range of experiments with various datasets and configurations of convolutional neural network (CNNs) architectures and Transformers in CV and NLP tasks. Our work is a contribution to the development of the important field of explainable and interpretable AI within the framework of geometrical deep learning.

* 11 pages, 12 figures, PRAI2023. arXiv admin note: substantial text overlap with arXiv:2204.08624

Via

Access Paper or Ask Questions

Topology and geometry of data manifold in deep learning

Apr 19, 2022

German Magai, Anton Ayzenberg

Figure 1 for Topology and geometry of data manifold in deep learning

Figure 2 for Topology and geometry of data manifold in deep learning

Figure 3 for Topology and geometry of data manifold in deep learning

Figure 4 for Topology and geometry of data manifold in deep learning

Abstract:Despite significant advances in the field of deep learning in applications to various fields, explaining the inner processes of deep learning models remains an important and open question. The purpose of this article is to describe and substantiate the geometric and topological view of the learning process of neural networks. Our attention is focused on the internal representation of neural networks and on the dynamics of changes in the topology and geometry of the data manifold on different layers. We also propose a method for assessing the generalizing ability of neural networks based on topological descriptors. In this paper, we use the concepts of topological data analysis and intrinsic dimension, and we present a wide range of experiments on different datasets and different configurations of convolutional neural network architectures. In addition, we consider the issue of the geometry of adversarial attacks in the classification task and spoofing attacks on face recognition systems. Our work is a contribution to the development of an important area of explainable and interpretable AI through the example of computer vision.

* 12 pages, 15 figures

Via

Access Paper or Ask Questions