Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marco Schreyer

Differentially Private Federated Learning of Diffusion Models for Synthetic Tabular Data Generation

Dec 20, 2024

Timur Sattarov, Marco Schreyer, Damian Borth

Abstract:The increasing demand for privacy-preserving data analytics in finance necessitates solutions for synthetic data generation that rigorously uphold privacy standards. We introduce DP-Fed-FinDiff framework, a novel integration of Differential Privacy, Federated Learning and Denoising Diffusion Probabilistic Models designed to generate high-fidelity synthetic tabular data. This framework ensures compliance with stringent privacy regulations while maintaining data utility. We demonstrate the effectiveness of DP-Fed-FinDiff on multiple real-world financial datasets, achieving significant improvements in privacy guarantees without compromising data quality. Our empirical evaluations reveal the optimal trade-offs between privacy budgets, client configurations, and federated optimization strategies. The results affirm the potential of DP-Fed-FinDiff to enable secure data sharing and robust analytics in highly regulated domains, paving the way for further advances in federated learning and privacy-preserving data synthesis.

* 9 pages, 9 figures, preprint version, currently under review

Via

Access Paper or Ask Questions

GraphGuard: Contrastive Self-Supervised Learning for Credit-Card Fraud Detection in Multi-Relational Dynamic Graphs

Jul 17, 2024

Kristófer Reynisson, Marco Schreyer, Damian Borth

Abstract:Credit card fraud has significant implications at both an individual and societal level, making effective prevention essential. Current methods rely heavily on feature engineering and labeled information, both of which have significant limitations. In this work, we present GraphGuard, a novel contrastive self-supervised graph-based framework for detecting fraudulent credit card transactions. We conduct experiments on a real-world dataset and a synthetic dataset. Our results provide a promising initial direction for exploring the effectiveness of graph-based self-supervised approaches for credit card fraud detection.

* 8 pages, 1 figure, 2 tables, preprint version, presented at AAAI 2024 Workshop on AI in Finance for Social Impact

Via

Access Paper or Ask Questions

FedTabDiff: Federated Learning of Diffusion Probabilistic Models for Synthetic Mixed-Type Tabular Data Generation

Jan 11, 2024

Timur Sattarov, Marco Schreyer, Damian Borth

Abstract:Realistic synthetic tabular data generation encounters significant challenges in preserving privacy, especially when dealing with sensitive information in domains like finance and healthcare. In this paper, we introduce \textit{Federated Tabular Diffusion} (FedTabDiff) for generating high-fidelity mixed-type tabular data without centralized access to the original tabular datasets. Leveraging the strengths of \textit{Denoising Diffusion Probabilistic Models} (DDPMs), our approach addresses the inherent complexities in tabular data, such as mixed attribute types and implicit relationships. More critically, FedTabDiff realizes a decentralized learning scheme that permits multiple entities to collaboratively train a generative model while respecting data privacy and locality. We extend DDPMs into the federated setting for tabular data generation, which includes a synchronous update scheme and weighted averaging for effective model aggregation. Experimental evaluations on real-world financial and medical datasets attest to the framework's capability to produce synthetic data that maintains high fidelity, utility, privacy, and coverage.

* 9 pages, 2 figures, 2 tables, preprint version, currently under review

Via

Access Paper or Ask Questions

FinDiff: Diffusion Models for Financial Tabular Data Generation

Sep 04, 2023

Timur Sattarov, Marco Schreyer, Damian Borth

Abstract:The sharing of microdata, such as fund holdings and derivative instruments, by regulatory institutions presents a unique challenge due to strict data confidentiality and privacy regulations. These challenges often hinder the ability of both academics and practitioners to conduct collaborative research effectively. The emergence of generative models, particularly diffusion models, capable of synthesizing data mimicking the underlying distributions of real-world data presents a compelling solution. This work introduces 'FinDiff', a diffusion model designed to generate real-world financial tabular data for a variety of regulatory downstream tasks, for example economic scenario modeling, stress tests, and fraud detection. The model uses embedding encodings to model mixed modality financial data, comprising both categorical and numeric attributes. The performance of FinDiff in generating synthetic tabular financial data is evaluated against state-of-the-art baseline models using three real-world financial datasets (including two publicly available datasets and one proprietary dataset). Empirical results demonstrate that FinDiff excels in generating synthetic tabular financial data with high fidelity, privacy, and utility.

* 9 pages, 5 figures, 3 tables, preprint version, currently under review

Via

Access Paper or Ask Questions

Federated Continual Learning to Detect Accounting Anomalies in Financial Auditing

Oct 26, 2022

Marco Schreyer, Hamed Hemati, Damian Borth, Miklos A. Vasarhelyi

Abstract:The International Standards on Auditing require auditors to collect reasonable assurance that financial statements are free of material misstatement. At the same time, a central objective of Continuous Assurance is the real-time assessment of digital accounting journal entries. Recently, driven by the advances in artificial intelligence, Deep Learning techniques have emerged in financial auditing to examine vast quantities of accounting data. However, learning highly adaptive audit models in decentralised and dynamic settings remains challenging. It requires the study of data distribution shifts over multiple clients and time periods. In this work, we propose a Federated Continual Learning framework enabling auditors to learn audit models from decentral clients continuously. We evaluate the framework's ability to detect accounting anomalies in common scenarios of organizational activity. Our empirical results, using real-world datasets and combined federated continual learning strategies, demonstrate the learned model's ability to detect anomalies in audit settings of data distribution shifts.

* 6 pages (excl. appendix), 5 figures, 1 table, preprint version, currently under review

Via

Access Paper or Ask Questions

RESHAPE: Explaining Accounting Anomalies in Financial Statement Audits by enhancing SHapley Additive exPlanations

Sep 19, 2022

Ricardo Müller, Marco Schreyer, Timur Sattarov, Damian Borth

Figure 1 for RESHAPE: Explaining Accounting Anomalies in Financial Statement Audits by enhancing SHapley Additive exPlanations

Figure 2 for RESHAPE: Explaining Accounting Anomalies in Financial Statement Audits by enhancing SHapley Additive exPlanations

Figure 3 for RESHAPE: Explaining Accounting Anomalies in Financial Statement Audits by enhancing SHapley Additive exPlanations

Figure 4 for RESHAPE: Explaining Accounting Anomalies in Financial Statement Audits by enhancing SHapley Additive exPlanations

Abstract:Detecting accounting anomalies is a recurrent challenge in financial statement audits. Recently, novel methods derived from Deep-Learning (DL) have been proposed to audit the large volumes of a statement's underlying accounting records. However, due to their vast number of parameters, such models exhibit the drawback of being inherently opaque. At the same time, the concealing of a model's inner workings often hinders its real-world application. This observation holds particularly true in financial audits since auditors must reasonably explain and justify their audit decisions. Nowadays, various Explainable AI (XAI) techniques have been proposed to address this challenge, e.g., SHapley Additive exPlanations (SHAP). However, in unsupervised DL as often applied in financial audits, these methods explain the model output at the level of encoded variables. As a result, the explanations of Autoencoder Neural Networks (AENNs) are often hard to comprehend by human auditors. To mitigate this drawback, we propose (RESHAPE), which explains the model output on an aggregated attribute-level. In addition, we introduce an evaluation framework to compare the versatility of XAI methods in auditing. Our experimental results show empirical evidence that RESHAPE results in versatile explanations compared to state-of-the-art baselines. We envision such attribute-level explanations as a necessary next step in the adoption of unsupervised DL techniques in financial auditing.

* 9 pages, 4 figures, 5 tables, preprint version, currently under review

Via

Access Paper or Ask Questions

Federated and Privacy-Preserving Learning of Accounting Data in Financial Statement Audits

Aug 26, 2022

Marco Schreyer, Timur Sattarov, Damian Borth

Figure 1 for Federated and Privacy-Preserving Learning of Accounting Data in Financial Statement Audits

Figure 2 for Federated and Privacy-Preserving Learning of Accounting Data in Financial Statement Audits

Figure 3 for Federated and Privacy-Preserving Learning of Accounting Data in Financial Statement Audits

Figure 4 for Federated and Privacy-Preserving Learning of Accounting Data in Financial Statement Audits

Abstract:The ongoing 'digital transformation' fundamentally changes audit evidence's nature, recording, and volume. Nowadays, the International Standards on Auditing (ISA) requires auditors to examine vast volumes of a financial statement's underlying digital accounting records. As a result, audit firms also 'digitize' their analytical capabilities and invest in Deep Learning (DL), a successful sub-discipline of Machine Learning. The application of DL offers the ability to learn specialized audit models from data of multiple clients, e.g., organizations operating in the same industry or jurisdiction. In general, regulations require auditors to adhere to strict data confidentiality measures. At the same time, recent intriguing discoveries showed that large-scale DL models are vulnerable to leaking sensitive training data information. Today, it often remains unclear how audit firms can apply DL models while complying with data protection regulations. In this work, we propose a Federated Learning framework to train DL models on auditing relevant accounting data of multiple clients. The framework encompasses Differential Privacy and Split Learning capabilities to mitigate data confidentiality risks at model inference. We evaluate our approach to detect accounting anomalies in three real-world datasets of city payments. Our results provide empirical evidence that auditors can benefit from DL models that accumulate knowledge from multiple sources of proprietary client data.

* 8 pages, 5 figures, 3 tables, preprint version, currently under review

Via

Access Paper or Ask Questions

Continual Learning for Unsupervised Anomaly Detection in Continuous Auditing of Financial Accounting Data

Dec 25, 2021

Hamed Hemati, Marco Schreyer, Damian Borth

Figure 1 for Continual Learning for Unsupervised Anomaly Detection in Continuous Auditing of Financial Accounting Data

Figure 2 for Continual Learning for Unsupervised Anomaly Detection in Continuous Auditing of Financial Accounting Data

Figure 3 for Continual Learning for Unsupervised Anomaly Detection in Continuous Auditing of Financial Accounting Data

Figure 4 for Continual Learning for Unsupervised Anomaly Detection in Continuous Auditing of Financial Accounting Data

Abstract:International audit standards require the direct assessment of a financial statement's underlying accounting journal entries. Driven by advances in artificial intelligence, deep-learning inspired audit techniques emerged to examine vast quantities of journal entry data. However, in regular audits, most of the proposed methods are applied to learn from a comparably stationary journal entry population, e.g., of a financial quarter or year. Ignoring situations where audit relevant distribution changes are not evident in the training data or become incrementally available over time. In contrast, in continuous auditing, deep-learning models are continually trained on a stream of recorded journal entries, e.g., of the last hour. Resulting in situations where previous knowledge interferes with new information and will be entirely overwritten. This work proposes a continual anomaly detection framework to overcome both challenges and designed to learn from a stream of journal entry data experiences. The framework is evaluated based on deliberately designed audit scenarios and two real-world datasets. Our experimental results provide initial evidence that such a learning scheme offers the ability to reduce false-positive alerts and false-negative decisions.

Via

Access Paper or Ask Questions

Multi-view Contrastive Self-Supervised Learning of Accounting Data Representations for Downstream Audit Tasks

Sep 23, 2021

Marco Schreyer, Timur Sattarov, Damian Borth

Figure 1 for Multi-view Contrastive Self-Supervised Learning of Accounting Data Representations for Downstream Audit Tasks

Figure 2 for Multi-view Contrastive Self-Supervised Learning of Accounting Data Representations for Downstream Audit Tasks

Figure 3 for Multi-view Contrastive Self-Supervised Learning of Accounting Data Representations for Downstream Audit Tasks

Figure 4 for Multi-view Contrastive Self-Supervised Learning of Accounting Data Representations for Downstream Audit Tasks

Abstract:International audit standards require the direct assessment of a financial statement's underlying accounting transactions, referred to as journal entries. Recently, driven by the advances in artificial intelligence, deep learning inspired audit techniques have emerged in the field of auditing vast quantities of journal entry data. Nowadays, the majority of such methods rely on a set of specialized models, each trained for a particular audit task. At the same time, when conducting a financial statement audit, audit teams are confronted with (i) challenging time-budget constraints, (ii) extensive documentation obligations, and (iii) strict model interpretability requirements. As a result, auditors prefer to harness only a single preferably `multi-purpose' model throughout an audit engagement. We propose a contrastive self-supervised learning framework designed to learn audit task invariant accounting data representations to meet this requirement. The framework encompasses deliberate interacting data augmentation policies that utilize the attribute characteristics of journal entry data. We evaluate the framework on two real-world datasets of city payments and transfer the learned representations to three downstream audit tasks: anomaly detection, audit sampling, and audit documentation. Our experimental results provide empirical evidence that the proposed framework offers the ability to increase the efficiency of audits by learning rich and interpretable `multi-task' representations.

* 8 pages (excl. appendix), 4 Figures, 3 Tables

Via

Access Paper or Ask Questions

Leaking Sensitive Financial Accounting Data in Plain Sight using Deep Autoencoder Neural Networks

Dec 13, 2020

Marco Schreyer, Chistian Schulze, Damian Borth

Figure 1 for Leaking Sensitive Financial Accounting Data in Plain Sight using Deep Autoencoder Neural Networks

Figure 2 for Leaking Sensitive Financial Accounting Data in Plain Sight using Deep Autoencoder Neural Networks

Figure 3 for Leaking Sensitive Financial Accounting Data in Plain Sight using Deep Autoencoder Neural Networks

Figure 4 for Leaking Sensitive Financial Accounting Data in Plain Sight using Deep Autoencoder Neural Networks

Abstract:Nowadays, organizations collect vast quantities of sensitive information in `Enterprise Resource Planning' (ERP) systems, such as accounting relevant transactions, customer master data, or strategic sales price information. The leakage of such information poses a severe threat for companies as the number of incidents and the reputational damage to those experiencing them continue to increase. At the same time, discoveries in deep learning research revealed that machine learning models could be maliciously misused to create new attack vectors. Understanding the nature of such attacks becomes increasingly important for the (internal) audit and fraud examination practice. The creation of such an awareness holds in particular for the fraudulent data leakage using deep learning-based steganographic techniques that might remain undetected by state-of-the-art `Computer Assisted Audit Techniques' (CAATs). In this work, we introduce a real-world `threat model' designed to leak sensitive accounting data. In addition, we show that a deep steganographic process, constituted by three neural networks, can be trained to hide such data in unobtrusive `day-to-day' images. Finally, we provide qualitative and quantitative evaluations on two publicly available real-world payment datasets.

* 8 pages (excl. appendix), 4 Figures, 2 Tables, AAAI-21 Workshop on Knowledge Discovery from Unstructured Data in Financial Services, this paper is the initial accepted version

Via

Access Paper or Ask Questions