Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Serena Elizabeth Marshall

Rethinking Synthetic Data definitions: A privacy driven approach

Mar 05, 2025

Vibeke Binz Vallevik, Serena Elizabeth Marshall, Aleksandar Babic, Jan Franz Nygaard

Figure 1 for Rethinking Synthetic Data definitions: A privacy driven approach

Figure 2 for Rethinking Synthetic Data definitions: A privacy driven approach

Abstract:Synthetic data is gaining traction as a cost-effective solution for the increasing data demands of AI development and can be generated either from existing knowledge or derived data captured from real-world events. The source of the synthetic data generation and the technique used significantly impacts its residual privacy risk and therefore its opportunity for sharing. Traditional classification of synthetic data types no longer fit the newer generation techniques and there is a need to better align the classification with practical needs. We suggest a new way of grouping synthetic data types that better supports privacy evaluations to aid regulatory policymaking. Our novel classification provides flexibility to new advancements like deep generative methods and offers a more practical framework for future applications.

Via

Access Paper or Ask Questions

Can I trust my fake data -- A comprehensive quality assessment framework for synthetic tabular data in healthcare

Jan 24, 2024

Vibeke Binz Vallevik, Aleksandar Babic, Serena Elizabeth Marshall, Severin Elvatun, Helga Brøgger, Sharmini Alagaratnam, Bjørn Edwin, Narasimha Raghavan Veeraragavan, Anne Kjersti Befring, Jan Franz Nygård

Figure 1 for Can I trust my fake data -- A comprehensive quality assessment framework for synthetic tabular data in healthcare

Figure 2 for Can I trust my fake data -- A comprehensive quality assessment framework for synthetic tabular data in healthcare

Figure 3 for Can I trust my fake data -- A comprehensive quality assessment framework for synthetic tabular data in healthcare

Figure 4 for Can I trust my fake data -- A comprehensive quality assessment framework for synthetic tabular data in healthcare

Abstract:Ensuring safe adoption of AI tools in healthcare hinges on access to sufficient data for training, testing and validation. In response to privacy concerns and regulatory requirements, using synthetic data has been suggested. Synthetic data is created by training a generator on real data to produce a dataset with similar statistical properties. Competing metrics with differing taxonomies for quality evaluation have been suggested, resulting in a complex landscape. Optimising quality entails balancing considerations that make the data fit for use, yet relevant dimensions are left out of existing frameworks. We performed a comprehensive literature review on the use of quality evaluation metrics on SD within the scope of tabular healthcare data and SD made using deep generative methods. Based on this and the collective team experiences, we developed a conceptual framework for quality assurance. The applicability was benchmarked against a practical case from the Dutch National Cancer Registry. We present a conceptual framework for quality assurance of SD for AI applications in healthcare that aligns diverging taxonomies, expands on common quality dimensions to include the dimensions of Fairness and Carbon footprint, and proposes stages necessary to support real-life applications. Building trust in synthetic data by increasing transparency and reducing the safety risk will accelerate the development and uptake of trustworthy AI tools for the benefit of patients. Despite the growing emphasis on algorithmic fairness and carbon footprint, these metrics were scarce in the literature review. The overwhelming focus was on statistical similarity using distance metrics while sequential logic detection was scarce. A consensus-backed framework that includes all relevant quality dimensions can provide assurance for safe and responsible real-life applications of SD.

Via

Access Paper or Ask Questions