Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:We Need Improved Data Curation and Attribution in AI for Scientific Discovery

Apr 03, 2025

Mara Graziani, Antonio Foncubierta, Dimitrios Christofidellis, Irina Espejo-Morales, Malina Molnar, Marvin Alberts, Matteo Manica, Jannis Born

Figure 1 for We Need Improved Data Curation and Attribution in AI for Scientific Discovery

Figure 2 for We Need Improved Data Curation and Attribution in AI for Scientific Discovery

Figure 3 for We Need Improved Data Curation and Attribution in AI for Scientific Discovery

Figure 4 for We Need Improved Data Curation and Attribution in AI for Scientific Discovery

Share this with someone who'll enjoy it:

Abstract:As the interplay between human-generated and synthetic data evolves, new challenges arise in scientific discovery concerning the integrity of the data and the stability of the models. In this work, we examine the role of synthetic data as opposed to that of real experimental data for scientific research. Our analyses indicate that nearly three-quarters of experimental datasets available on open-access platforms have relatively low adoption rates, opening new opportunities to enhance their discoverability and usability by automated methods. Additionally, we observe an increasing difficulty in distinguishing synthetic from real experimental data. We propose supplementing ongoing efforts in automating synthetic data detection by increasing the focus on watermarking real experimental data, thereby strengthening data traceability and integrity. Our estimates suggest that watermarking even less than half of the real world data generated annually could help sustain model robustness, while promoting a balanced integration of synthetic and human-generated content.

View paper on

Share this with someone who'll enjoy it:

Title:We Need Improved Data Curation and Attribution in AI for Scientific Discovery

Paper and Code